Skip to main content
Transcribe audio data to text using on-device speech recognition models.

Basic Transcription

// Load an STT model first
RunAnywhere.loadSTTModel("whisper-tiny")

// Transcribe audio bytes
val audioData: ByteArray = // ... from file or recording
val text = RunAnywhere.transcribe(audioData)
println(text)  // "Hello, how are you today?"

Transcription with Options

Get detailed output including confidence scores and timestamps:
val output = RunAnywhere.transcribeWithOptions(
    audioData = audioBytes,
    options = STTOptions(
        language = "en",
        enableTimestamps = true,
        enablePunctuation = true
    )
)

println("Text: ${output.text}")
println("Confidence: ${output.confidence}")

// Access word-level timestamps
output.wordTimestamps?.forEach { word ->
    println("[${word.startTime}s - ${word.endTime}s]: ${word.word}")
}

STTOutput

The detailed output object:
PropertyTypeDescription
textStringTranscribed text
confidenceFloatConfidence score (0.0-1.0)
wordTimestampsList<WordTimestamp>?Word-level timing
detectedLanguageString?Auto-detected language code
metadataTranscriptionMetadataProcessing metrics

Example: Transcribe Audio File

suspend fun transcribeAudioFile(uri: Uri): String {
    val audioData = contentResolver.openInputStream(uri)?.readBytes()
        ?: throw IllegalArgumentException("Cannot read audio file")

    // Ensure STT model is loaded
    if (!RunAnywhere.isSTTModelLoaded()) {
        RunAnywhere.loadSTTModel("whisper-tiny")
    }

    return RunAnywhere.transcribe(audioData)
}

Example: Record and Transcribe

class VoiceRecorderViewModel : ViewModel() {
    private var audioRecorder: AudioRecord? = null
    private val audioBuffer = mutableListOf<Byte>()

    fun startRecording() {
        // Start recording audio...
    }

    fun stopAndTranscribe() {
        viewModelScope.launch {
            val audioData = audioBuffer.toByteArray()

            val result = RunAnywhere.transcribeWithOptions(
                audioData,
                options = STTOptions(
                    language = "en",
                    enablePunctuation = true
                )
            )

            _transcription.value = result.text
            _confidence.value = result.confidence
        }
    }
}

Model Management

// Load a specific STT model
RunAnywhere.loadSTTModel("whisper-tiny")

// Check if loaded
val isLoaded = RunAnywhere.isSTTModelLoaded()

// Get current model ID
val modelId = RunAnywhere.currentSTTModelId

// Unload when done
RunAnywhere.unloadSTTModel()

Supported Audio Formats

FormatSample RateNotes
PCM16000 HzRecommended for best quality
WAV16000 HzStandard audio file format
MP3AnyConverted internally
For best transcription accuracy: - Use 16kHz mono PCM audio - Keep audio clips under 30 seconds for optimal performance - Use a smaller model (whisper-tiny) for faster results, larger models (whisper-base) for better accuracy