Transcribe audio data to text using on-device speech recognition models.
Basic Transcription
// Load an STT model first
RunAnywhere.loadSTTModel("whisper-tiny")
// Transcribe audio bytes
val audioData: ByteArray = // ... from file or recording
val text = RunAnywhere.transcribe(audioData)
println(text) // "Hello, how are you today?"
Transcription with Options
Get detailed output including confidence scores and timestamps:
val output = RunAnywhere.transcribeWithOptions(
audioData = audioBytes,
options = STTOptions(
language = "en",
enableTimestamps = true,
enablePunctuation = true
)
)
println("Text: ${output.text}")
println("Confidence: ${output.confidence}")
// Access word-level timestamps
output.wordTimestamps?.forEach { word ->
println("[${word.startTime}s - ${word.endTime}s]: ${word.word}")
}
STTOutput
The detailed output object:
| Property | Type | Description |
|---|
text | String | Transcribed text |
confidence | Float | Confidence score (0.0-1.0) |
wordTimestamps | List<WordTimestamp>? | Word-level timing |
detectedLanguage | String? | Auto-detected language code |
metadata | TranscriptionMetadata | Processing metrics |
Example: Transcribe Audio File
suspend fun transcribeAudioFile(uri: Uri): String {
val audioData = contentResolver.openInputStream(uri)?.readBytes()
?: throw IllegalArgumentException("Cannot read audio file")
// Ensure STT model is loaded
if (!RunAnywhere.isSTTModelLoaded()) {
RunAnywhere.loadSTTModel("whisper-tiny")
}
return RunAnywhere.transcribe(audioData)
}
Example: Record and Transcribe
class VoiceRecorderViewModel : ViewModel() {
private var audioRecorder: AudioRecord? = null
private val audioBuffer = mutableListOf<Byte>()
fun startRecording() {
// Start recording audio...
}
fun stopAndTranscribe() {
viewModelScope.launch {
val audioData = audioBuffer.toByteArray()
val result = RunAnywhere.transcribeWithOptions(
audioData,
options = STTOptions(
language = "en",
enablePunctuation = true
)
)
_transcription.value = result.text
_confidence.value = result.confidence
}
}
}
Model Management
// Load a specific STT model
RunAnywhere.loadSTTModel("whisper-tiny")
// Check if loaded
val isLoaded = RunAnywhere.isSTTModelLoaded()
// Get current model ID
val modelId = RunAnywhere.currentSTTModelId
// Unload when done
RunAnywhere.unloadSTTModel()
| Format | Sample Rate | Notes |
|---|
| PCM | 16000 Hz | Recommended for best quality |
| WAV | 16000 Hz | Standard audio file format |
| MP3 | Any | Converted internally |
For best transcription accuracy: - Use 16kHz mono PCM audio - Keep audio clips under 30 seconds
for optimal performance - Use a smaller model (whisper-tiny) for faster results, larger models
(whisper-base) for better accuracy