Documentation Index
Fetch the complete documentation index at: https://docs.runanywhere.ai/llms.txt
Use this file to discover all available pages before exploring further.
Voice Activity Detection (VAD) identifies speech segments in audio streams, enabling responsive voice interfaces.
Basic Usage
// Detect speech in audio data
val audioData: ByteArray = // ... from recording
val result = RunAnywhere.detectVoiceActivity(audioData)
if (result.hasSpeech) {
println("Speech detected!")
println("Confidence: ${result.confidence}")
}
VADResult
| Property | Type | Description |
|---|
hasSpeech | Boolean | Whether speech was detected |
confidence | Float | Detection confidence (0.0-1.0) |
speechStartMs | Long? | Start time of speech segment |
speechEndMs | Long? | End time of speech segment |
frameIndex | Int | Audio frame index |
timestamp | Long | Detection timestamp |
Customize detection sensitivity:
RunAnywhere.configureVAD(VADConfiguration(
threshold = 0.5f, // Detection threshold (0.0-1.0)
minSpeechDurationMs = 250, // Minimum speech duration
minSilenceDurationMs = 300, // Silence before speech end
sampleRate = 16000, // Audio sample rate
frameSizeMs = 30 // Frame size for processing
))
VADConfiguration
data class VADConfiguration(
val threshold: Float = 0.5f,
val minSpeechDurationMs: Int = 250,
val minSilenceDurationMs: Int = 300,
val sampleRate: Int = 16000,
val frameSizeMs: Int = 30
)
| Parameter | Default | Description |
|---|
threshold | 0.5 | Higher = less sensitive (fewer false positives) |
minSpeechDurationMs | 250 | Ignore speech shorter than this |
minSilenceDurationMs | 300 | Silence needed to end speech |
sampleRate | 16000 | Audio sample rate |
frameSizeMs | 30 | Frame duration for processing |
Streaming VAD
Process continuous audio with Kotlin Flows:
// Create a flow of audio samples from microphone
val audioSamplesFlow: Flow<FloatArray> = microphoneManager.audioFlow
// Stream VAD results
RunAnywhere.streamVAD(audioSamplesFlow)
.collect { result ->
if (result.hasSpeech) {
updateUI(state = "Speaking...")
} else {
updateUI(state = "Listening...")
}
}
Calibrate with Ambient Noise
Improve accuracy by calibrating with ambient noise:
// Record a few seconds of ambient noise
val ambientAudio = recordAmbientNoise(durationMs = 2000)
// Calibrate VAD
RunAnywhere.calibrateVAD(ambientAudio)
Example: Voice Recording with VAD
class VoiceRecorderViewModel : ViewModel() {
private val _isListening = MutableStateFlow(false)
val isListening: StateFlow<Boolean> = _isListening
private val _isSpeaking = MutableStateFlow(false)
val isSpeaking: StateFlow<Boolean> = _isSpeaking
private val audioBuffer = mutableListOf<FloatArray>()
fun startListening(audioFlow: Flow<FloatArray>) {
viewModelScope.launch {
_isListening.value = true
// Configure VAD
RunAnywhere.configureVAD(VADConfiguration(
threshold = 0.5f,
minSpeechDurationMs = 300,
minSilenceDurationMs = 500
))
// Stream VAD
RunAnywhere.streamVAD(audioFlow)
.collect { result ->
_isSpeaking.value = result.hasSpeech
if (result.hasSpeech) {
// Buffer audio during speech
// (actual audio capture would be handled separately)
} else if (audioBuffer.isNotEmpty()) {
// Speech ended - process buffered audio
processRecordedAudio()
}
}
}
}
fun stopListening() {
viewModelScope.launch {
_isListening.value = false
RunAnywhere.resetVAD()
}
}
private suspend fun processRecordedAudio() {
// Convert buffer to ByteArray and transcribe
val audioData = audioBuffer.toByteArray()
val transcription = RunAnywhere.transcribe(audioData)
_transcription.value = transcription
audioBuffer.clear()
}
}
VAD Statistics
Get current VAD performance metrics:
val stats = RunAnywhere.getVADStatistics()
println("Total frames processed: ${stats.framesProcessed}")
println("Speech frames: ${stats.speechFrames}")
println("Current state: ${stats.currentState}")
Reset VAD
Clear VAD state for a fresh start:
Best Practices
Calibration: Always calibrate VAD with ambient noise for best results in noisy environments.Threshold tuning:
- Quiet environment: threshold = 0.3-0.5
- Noisy environment: threshold = 0.6-0.8
Buffer management: Use VAD to determine when to start/stop recording, but buffer audio slightly before speech detection for complete captures.