Overview
VAD determines when a user is speaking vs. silent, which is essential for:- Knowing when to start/stop recording
- Triggering transcription at the right time
- Building push-to-talk or hands-free interfaces
Automatic VAD with Voice Session
The easiest way to use VAD is through the Voice Agent pipeline:VoiceSessionConfig for VAD
| Parameter | Type | Default | Description |
|---|---|---|---|
speechThreshold | double | 0.03 | Audio level to trigger speech detection |
silenceDuration | double | 1.5 | Seconds of silence before processing |
Building a Voice Level Indicator
Voice Session with VAD Events
VAD Tuning Tips
Threshold Adjustment
Threshold Adjustment
Lower thresholds (0.01) detect quieter speech but may trigger on background noise. Higher
thresholds (0.1) require louder speech but are more robust to noise.
Silence Duration
Silence Duration
Shorter duration (0.5s) feels more responsive but may cut off pauses mid-sentence. Longer duration
(2.0s) allows natural pauses but feels slower.
Environment Calibration
Environment Calibration
Consider measuring ambient noise level and adjusting threshold dynamically.