Customize transcription behavior with STTOptions.
STTOptions
data class STTOptions(
val language: String = "en", // BCP-47 language code
val detectLanguage: Boolean = false, // Auto-detect spoken language
val enablePunctuation: Boolean = true, // Add punctuation
val enableDiarization: Boolean = false, // Identify speakers
val maxSpeakers: Int? = null, // Max speakers to identify
val enableTimestamps: Boolean = true, // Word-level timestamps
val vocabularyFilter: List<String> = emptyList(), // Custom vocabulary
val audioFormat: AudioFormat = AudioFormat.PCM,
val sampleRate: Int = 16000 // Audio sample rate
)
Configuration Examples
Basic English Transcription
val options = STTOptions(
language = "en",
enablePunctuation = true
)
Multi-Language Detection
val options = STTOptions(
detectLanguage = true,
enablePunctuation = true
)
val result = RunAnywhere.transcribeWithOptions(audioData, options)
println("Detected language: ${result.detectedLanguage}")
Transcription with Timestamps
val options = STTOptions(
language = "en",
enableTimestamps = true
)
val result = RunAnywhere.transcribeWithOptions(audioData, options)
result.wordTimestamps?.forEach { word ->
println("[${word.startTime}s - ${word.endTime}s]: ${word.word}")
}
Speaker Diarization
val options = STTOptions(
language = "en",
enableDiarization = true,
maxSpeakers = 2
)
Custom Vocabulary
Improve recognition for domain-specific terms:
val options = STTOptions(
language = "en",
vocabularyFilter = listOf(
"RunAnywhere",
"Kotlin",
"ONNX",
"llama.cpp"
)
)
Supported Languages
| Code | Language |
|---|
en | English |
es | Spanish |
fr | French |
de | German |
it | Italian |
pt | Portuguese |
zh | Chinese |
ja | Japanese |
ko | Korean |
Language support depends on the STT model. Whisper models support 99+ languages.
enum class AudioFormat {
PCM, // Raw PCM samples
WAV, // WAV container
MP3, // MP3 compressed
AAC, // AAC compressed
OGG, // Ogg Vorbis
OPUS, // Opus codec
FLAC // FLAC lossless
}
Access processing metrics:
val result = RunAnywhere.transcribeWithOptions(audioData, options)
val metadata = result.metadata
println("Model: ${metadata.modelId}")
println("Processing time: ${metadata.processingTime}s")
println("Audio length: ${metadata.audioLength}s")
println("Real-time factor: ${metadata.realTimeFactor}x")
| Property | Description |
|---|
modelId | STT model used |
processingTime | Time to process (seconds) |
audioLength | Input audio duration (seconds) |
realTimeFactor | Processing time / audio length |
A real-time factor < 1.0 means the transcription is faster than real-time. For example, 0.2x means 10 seconds of audio processed in 2 seconds.