Overview
VAD answers the question: “Is someone speaking right now?” This enables:- Wake word detection – Start listening when speech begins
- Audio trimming – Only process speech segments
- Turn-taking – Know when to respond in conversations
- Battery efficiency – Don’t process silence
Basic Usage
Setup
VADConfiguration
Threshold Tuning
| Threshold | Behavior | Use Case |
|---|---|---|
| 0.2-0.4 | Very sensitive, more false positives | Quiet environments |
| 0.4-0.6 | Balanced (recommended) | Normal rooms |
| 0.6-0.8 | Less sensitive, fewer false positives | Noisy environments |
Detection Methods
From AVAudioPCMBuffer
From Float Array
Continuous VAD with Callbacks
For real-time applications, use the callback-based API:Complete Voice Recording Example
SwiftUI Voice Activity UI
VAD State Management
Best Practices
Adjust threshold for environment
Adjust threshold for environment
Start with 0.5 and adjust based on testing. Noisy environments need higher thresholds.
Add debouncing
Add debouncing
Add a small delay before acting on speech end events to handle brief pauses.
Handle background audio
Handle background audio
Configure AVAudioSession appropriately to handle interruptions.
Clean up resources
Clean up resources
Always call
cleanupVAD() when done to free resources.Error Handling
Voice Agent
Build complete voice experiences with VAD + STT + LLM + TTS →