Skip to main content
Customize transcription behavior with STTOptions.

STTOptions

data class STTOptions(
    val language: String = "en",              // BCP-47 language code
    val detectLanguage: Boolean = false,      // Auto-detect spoken language
    val enablePunctuation: Boolean = true,    // Add punctuation
    val enableDiarization: Boolean = false,   // Identify speakers
    val maxSpeakers: Int? = null,             // Max speakers to identify
    val enableTimestamps: Boolean = true,     // Word-level timestamps
    val vocabularyFilter: List<String> = emptyList(),  // Custom vocabulary
    val audioFormat: AudioFormat = AudioFormat.PCM,
    val sampleRate: Int = 16000               // Audio sample rate
)

Configuration Examples

Basic English Transcription

val options = STTOptions(
    language = "en",
    enablePunctuation = true
)

Multi-Language Detection

val options = STTOptions(
    detectLanguage = true,
    enablePunctuation = true
)

val result = RunAnywhere.transcribeWithOptions(audioData, options)
println("Detected language: ${result.detectedLanguage}")

Transcription with Timestamps

val options = STTOptions(
    language = "en",
    enableTimestamps = true
)

val result = RunAnywhere.transcribeWithOptions(audioData, options)
result.wordTimestamps?.forEach { word ->
    println("[${word.startTime}s - ${word.endTime}s]: ${word.word}")
}

Speaker Diarization

val options = STTOptions(
    language = "en",
    enableDiarization = true,
    maxSpeakers = 2
)

Custom Vocabulary

Improve recognition for domain-specific terms:
val options = STTOptions(
    language = "en",
    vocabularyFilter = listOf(
        "RunAnywhere",
        "Kotlin",
        "ONNX",
        "llama.cpp"
    )
)

Supported Languages

CodeLanguage
enEnglish
esSpanish
frFrench
deGerman
itItalian
ptPortuguese
zhChinese
jaJapanese
koKorean
Language support depends on the STT model. Whisper models support 99+ languages.

AudioFormat

enum class AudioFormat {
    PCM,    // Raw PCM samples
    WAV,    // WAV container
    MP3,    // MP3 compressed
    AAC,    // AAC compressed
    OGG,    // Ogg Vorbis
    OPUS,   // Opus codec
    FLAC    // FLAC lossless
}

TranscriptionMetadata

Access processing metrics:
val result = RunAnywhere.transcribeWithOptions(audioData, options)

val metadata = result.metadata
println("Model: ${metadata.modelId}")
println("Processing time: ${metadata.processingTime}s")
println("Audio length: ${metadata.audioLength}s")
println("Real-time factor: ${metadata.realTimeFactor}x")
PropertyDescription
modelIdSTT model used
processingTimeTime to process (seconds)
audioLengthInput audio duration (seconds)
realTimeFactorProcessing time / audio length
A real-time factor < 1.0 means the transcription is faster than real-time. For example, 0.2x means 10 seconds of audio processed in 2 seconds.