Voice Activity Detection

Voice Activity Detection (VAD) identifies speech segments in audio streams, enabling responsive voice interfaces.

Basic Usage

// Detect speech in audio data
val audioData: ByteArray = // ... from recording
val result = RunAnywhere.detectVoiceActivity(audioData)

if (result.hasSpeech) {
    println("Speech detected!")
    println("Confidence: ${result.confidence}")
}

VADResult

Property	Type	Description
`hasSpeech`	`Boolean`	Whether speech was detected
`confidence`	`Float`	Detection confidence (0.0-1.0)
`speechStartMs`	`Long?`	Start time of speech segment
`speechEndMs`	`Long?`	End time of speech segment
`frameIndex`	`Int`	Audio frame index
`timestamp`	`Long`	Detection timestamp

Configure VAD

Customize detection sensitivity:

RunAnywhere.configureVAD(VADConfiguration(
    threshold = 0.5f,              // Detection threshold (0.0-1.0)
    minSpeechDurationMs = 250,     // Minimum speech duration
    minSilenceDurationMs = 300,    // Silence before speech end
    sampleRate = 16000,            // Audio sample rate
    frameSizeMs = 30               // Frame size for processing
))

VADConfiguration

data class VADConfiguration(
    val threshold: Float = 0.5f,
    val minSpeechDurationMs: Int = 250,
    val minSilenceDurationMs: Int = 300,
    val sampleRate: Int = 16000,
    val frameSizeMs: Int = 30
)

Parameter	Default	Description
`threshold`	0.5	Higher = less sensitive (fewer false positives)
`minSpeechDurationMs`	250	Ignore speech shorter than this
`minSilenceDurationMs`	300	Silence needed to end speech
`sampleRate`	16000	Audio sample rate
`frameSizeMs`	30	Frame duration for processing

Streaming VAD

Process continuous audio with Kotlin Flows:

// Create a flow of audio samples from microphone
val audioSamplesFlow: Flow<FloatArray> = microphoneManager.audioFlow

// Stream VAD results
RunAnywhere.streamVAD(audioSamplesFlow)
    .collect { result ->
        if (result.hasSpeech) {
            updateUI(state = "Speaking...")
        } else {
            updateUI(state = "Listening...")
        }
    }

Calibrate with Ambient Noise

Improve accuracy by calibrating with ambient noise:

// Record a few seconds of ambient noise
val ambientAudio = recordAmbientNoise(durationMs = 2000)

// Calibrate VAD
RunAnywhere.calibrateVAD(ambientAudio)

Example: Voice Recording with VAD

class VoiceRecorderViewModel : ViewModel() {
    private val _isListening = MutableStateFlow(false)
    val isListening: StateFlow<Boolean> = _isListening

    private val _isSpeaking = MutableStateFlow(false)
    val isSpeaking: StateFlow<Boolean> = _isSpeaking

    private val audioBuffer = mutableListOf<FloatArray>()

    fun startListening(audioFlow: Flow<FloatArray>) {
        viewModelScope.launch {
            _isListening.value = true

            // Configure VAD
            RunAnywhere.configureVAD(VADConfiguration(
                threshold = 0.5f,
                minSpeechDurationMs = 300,
                minSilenceDurationMs = 500
            ))

            // Stream VAD
            RunAnywhere.streamVAD(audioFlow)
                .collect { result ->
                    _isSpeaking.value = result.hasSpeech

                    if (result.hasSpeech) {
                        // Buffer audio during speech
                        // (actual audio capture would be handled separately)
                    } else if (audioBuffer.isNotEmpty()) {
                        // Speech ended - process buffered audio
                        processRecordedAudio()
                    }
                }
        }
    }

    fun stopListening() {
        viewModelScope.launch {
            _isListening.value = false
            RunAnywhere.resetVAD()
        }
    }

    private suspend fun processRecordedAudio() {
        // Convert buffer to ByteArray and transcribe
        val audioData = audioBuffer.toByteArray()
        val transcription = RunAnywhere.transcribe(audioData)
        _transcription.value = transcription
        audioBuffer.clear()
    }
}

VAD Statistics

Get current VAD performance metrics:

val stats = RunAnywhere.getVADStatistics()
println("Total frames processed: ${stats.framesProcessed}")
println("Speech frames: ${stats.speechFrames}")
println("Current state: ${stats.currentState}")

Reset VAD

Clear VAD state for a fresh start:

RunAnywhere.resetVAD()

Best Practices

Calibration: Always calibrate VAD with ambient noise for best results in noisy environments.Threshold tuning:

Quiet environment: threshold = 0.3-0.5
Noisy environment: threshold = 0.6-0.8

Buffer management: Use VAD to determine when to start/stop recording, but buffer audio slightly before speech detection for complete captures.

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

Basic Usage

VADResult

Configure VAD

VADConfiguration

Streaming VAD

Calibrate with Ambient Noise

Example: Voice Recording with VAD

VAD Statistics

Reset VAD

Best Practices

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

​Basic Usage

​VADResult

​Configure VAD

​VADConfiguration

​Streaming VAD

​Calibrate with Ambient Noise

​Example: Voice Recording with VAD

​VAD Statistics

​Reset VAD

​Best Practices

Basic Usage

VADResult

Configure VAD

VADConfiguration

Streaming VAD

Calibrate with Ambient Noise

Example: Voice Recording with VAD

VAD Statistics

Reset VAD

Best Practices