Skip to main content
Voice Activity Detection (VAD) identifies speech segments in audio streams, enabling responsive voice interfaces.

Basic Usage

// Detect speech in audio data
val audioData: ByteArray = // ... from recording
val result = RunAnywhere.detectVoiceActivity(audioData)

if (result.hasSpeech) {
    println("Speech detected!")
    println("Confidence: ${result.confidence}")
}

VADResult

PropertyTypeDescription
hasSpeechBooleanWhether speech was detected
confidenceFloatDetection confidence (0.0-1.0)
speechStartMsLong?Start time of speech segment
speechEndMsLong?End time of speech segment
frameIndexIntAudio frame index
timestampLongDetection timestamp

Configure VAD

Customize detection sensitivity:
RunAnywhere.configureVAD(VADConfiguration(
    threshold = 0.5f,              // Detection threshold (0.0-1.0)
    minSpeechDurationMs = 250,     // Minimum speech duration
    minSilenceDurationMs = 300,    // Silence before speech end
    sampleRate = 16000,            // Audio sample rate
    frameSizeMs = 30               // Frame size for processing
))

VADConfiguration

data class VADConfiguration(
    val threshold: Float = 0.5f,
    val minSpeechDurationMs: Int = 250,
    val minSilenceDurationMs: Int = 300,
    val sampleRate: Int = 16000,
    val frameSizeMs: Int = 30
)
ParameterDefaultDescription
threshold0.5Higher = less sensitive (fewer false positives)
minSpeechDurationMs250Ignore speech shorter than this
minSilenceDurationMs300Silence needed to end speech
sampleRate16000Audio sample rate
frameSizeMs30Frame duration for processing

Streaming VAD

Process continuous audio with Kotlin Flows:
// Create a flow of audio samples from microphone
val audioSamplesFlow: Flow<FloatArray> = microphoneManager.audioFlow

// Stream VAD results
RunAnywhere.streamVAD(audioSamplesFlow)
    .collect { result ->
        if (result.hasSpeech) {
            updateUI(state = "Speaking...")
        } else {
            updateUI(state = "Listening...")
        }
    }

Calibrate with Ambient Noise

Improve accuracy by calibrating with ambient noise:
// Record a few seconds of ambient noise
val ambientAudio = recordAmbientNoise(durationMs = 2000)

// Calibrate VAD
RunAnywhere.calibrateVAD(ambientAudio)

Example: Voice Recording with VAD

class VoiceRecorderViewModel : ViewModel() {
    private val _isListening = MutableStateFlow(false)
    val isListening: StateFlow<Boolean> = _isListening

    private val _isSpeaking = MutableStateFlow(false)
    val isSpeaking: StateFlow<Boolean> = _isSpeaking

    private val audioBuffer = mutableListOf<FloatArray>()

    fun startListening(audioFlow: Flow<FloatArray>) {
        viewModelScope.launch {
            _isListening.value = true

            // Configure VAD
            RunAnywhere.configureVAD(VADConfiguration(
                threshold = 0.5f,
                minSpeechDurationMs = 300,
                minSilenceDurationMs = 500
            ))

            // Stream VAD
            RunAnywhere.streamVAD(audioFlow)
                .collect { result ->
                    _isSpeaking.value = result.hasSpeech

                    if (result.hasSpeech) {
                        // Buffer audio during speech
                        // (actual audio capture would be handled separately)
                    } else if (audioBuffer.isNotEmpty()) {
                        // Speech ended - process buffered audio
                        processRecordedAudio()
                    }
                }
        }
    }

    fun stopListening() {
        viewModelScope.launch {
            _isListening.value = false
            RunAnywhere.resetVAD()
        }
    }

    private suspend fun processRecordedAudio() {
        // Convert buffer to ByteArray and transcribe
        val audioData = audioBuffer.toByteArray()
        val transcription = RunAnywhere.transcribe(audioData)
        _transcription.value = transcription
        audioBuffer.clear()
    }
}

VAD Statistics

Get current VAD performance metrics:
val stats = RunAnywhere.getVADStatistics()
println("Total frames processed: ${stats.framesProcessed}")
println("Speech frames: ${stats.speechFrames}")
println("Current state: ${stats.currentState}")

Reset VAD

Clear VAD state for a fresh start:
RunAnywhere.resetVAD()

Best Practices

Calibration: Always calibrate VAD with ambient noise for best results in noisy environments.Threshold tuning:
  • Quiet environment: threshold = 0.3-0.5
  • Noisy environment: threshold = 0.6-0.8
Buffer management: Use VAD to determine when to start/stop recording, but buffer audio slightly before speech detection for complete captures.