Overview
The Speech-to-Text (STT) API allows you to transcribe audio files or raw audio data to text using on-device Whisper models. All transcription happens locally on the device for privacy and offline capability.
Basic Usage
import { RunAnywhere } from '@runanywhere/core'
// Transcribe an audio file
const result = await RunAnywhere . transcribeFile ( '/path/to/audio.wav' , {
language: 'en' ,
})
console . log ( 'Transcription:' , result . text )
console . log ( 'Confidence:' , result . confidence )
console . log ( 'Duration:' , result . duration , 'seconds' )
Setup
Before transcribing, you need to download and load an STT model:
import { RunAnywhere , ModelCategory } from '@runanywhere/core'
import { ONNX , ModelArtifactType } from '@runanywhere/onnx'
// 1. Initialize SDK and register ONNX backend
await RunAnywhere . initialize ({ environment: SDKEnvironment . Development })
ONNX . register ()
// 2. Add Whisper model
await ONNX . addModel ({
id: 'whisper-tiny-en' ,
name: 'Whisper Tiny English' ,
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz' ,
modality: ModelCategory . SpeechRecognition ,
artifactType: ModelArtifactType . TarGzArchive ,
memoryRequirement: 75_000_000 ,
})
// 3. Download model
await RunAnywhere . downloadModel ( 'whisper-tiny-en' , ( progress ) => {
console . log ( `Download: ${ ( progress . progress * 100 ). toFixed ( 1 ) } %` )
})
// 4. Load model
const modelInfo = await RunAnywhere . getModelInfo ( 'whisper-tiny-en' )
await RunAnywhere . loadSTTModel ( modelInfo . localPath , 'whisper' )
API Reference
transcribeFile
Transcribe audio from a file path.
await RunAnywhere . transcribeFile (
audioPath : string ,
options ?: STTOptions
): Promise < STTResult >
Parameters:
Parameter Type Description audioPathstringAbsolute path to the audio file optionsSTTOptionsOptional transcription settings
transcribe
Transcribe base64-encoded audio data.
await RunAnywhere . transcribe (
audioData : string ,
options ?: STTOptions
): Promise < STTResult >
Parameters:
Parameter Type Description audioDatastringBase64-encoded float32 PCM audio data optionsSTTOptionsOptional transcription settings
transcribeBuffer
Transcribe raw float32 audio samples.
await RunAnywhere . transcribeBuffer (
samples : number [],
sampleRate : number ,
options ?: STTOptions
): Promise < STTResult >
Parameters:
Parameter Type Description samplesnumber[]Float32 audio samples (-1.0 to 1.0) sampleRatenumberSample rate in Hz (e.g., 16000) optionsSTTOptionsOptional transcription settings
STT Options
interface STTOptions {
/** Language code (e.g., 'en', 'es', 'fr') */
language ?: string
/** Enable punctuation in output */
punctuation ?: boolean
/** Enable speaker diarization */
diarization ?: boolean
/** Enable word-level timestamps */
wordTimestamps ?: boolean
/** Audio sample rate (default: 16000) */
sampleRate ?: number
}
STT Result
interface STTResult {
/** Main transcription text */
text : string
/** Segments with timing information */
segments : STTSegment []
/** Detected language code */
language ?: string
/** Overall confidence (0.0 - 1.0) */
confidence : number
/** Audio duration in seconds */
duration : number
/** Alternative transcriptions */
alternatives : STTAlternative []
}
interface STTSegment {
/** Segment text */
text : string
/** Start time in seconds */
startTime : number
/** End time in seconds */
endTime : number
/** Speaker ID (if diarization enabled) */
speakerId ?: string
/** Segment confidence */
confidence : number
}
Examples
Transcribe Audio File
// Simple transcription
const result = await RunAnywhere . transcribeFile ( '/path/to/recording.wav' )
console . log ( result . text )
// With language hint
const spanish = await RunAnywhere . transcribeFile ( '/path/to/spanish.wav' , {
language: 'es' ,
})
console . log ( spanish . text )
With Word Timestamps
const result = await RunAnywhere . transcribeFile ( audioPath , {
language: 'en' ,
wordTimestamps: true ,
})
// Display words with timing
for ( const segment of result . segments ) {
console . log (
`[ ${ segment . startTime . toFixed ( 2 ) } s - ${ segment . endTime . toFixed ( 2 ) } s]: ${ segment . text } `
)
}
React Native Recording + Transcription
import React , { useState , useCallback } from 'react'
import { View , Button , Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import AudioRecord from 'react-native-audio-record' // Example audio library
export function VoiceRecorder () {
const [ isRecording , setIsRecording ] = useState ( false )
const [ transcription , setTranscription ] = useState ( '' )
const startRecording = useCallback ( async () => {
setIsRecording ( true )
setTranscription ( '' )
// Configure audio recording
AudioRecord . init ({
sampleRate: 16000 ,
channels: 1 ,
bitsPerSample: 16 ,
audioSource: 6 ,
wavFile: 'recording.wav' ,
})
AudioRecord . start ()
}, [])
const stopRecording = useCallback ( async () => {
setIsRecording ( false )
const audioPath = await AudioRecord . stop ()
// Transcribe the recording
try {
const result = await RunAnywhere . transcribeFile ( audioPath , {
language: 'en' ,
})
setTranscription ( result . text )
} catch ( error ) {
setTranscription ( 'Error: ' + ( error as Error ). message )
}
}, [])
return (
< View style = { { padding: 16 } } >
< Button
title = { isRecording ? 'Stop Recording' : 'Start Recording' }
onPress = { isRecording ? stopRecording : startRecording }
/>
{ transcription && < Text style = { { marginTop: 16 } } > { transcription } </ Text > }
</ View >
)
}
Transcribe Buffer from Microphone
// If you have raw audio samples (e.g., from a real-time stream)
const audioSamples : number [] = [] // Float32 samples from microphone
const result = await RunAnywhere . transcribeBuffer (
audioSamples ,
16000 , // Sample rate
{ language: 'en' }
)
console . log ( result . text )
Format Extension Supported WAV .wavMP3 .mp3M4A .m4aFLAC .flacRaw PCM - (via buffer methods)
For best results, use 16kHz mono WAV files. The SDK automatically handles audio conversion, but
native formats are faster.
Available Models
Model Size Languages Quality Speed whisper-tiny ~75MB English Good Very Fast whisper-tiny.en ~75MB English Better Very Fast whisper-base ~150MB Multi Better Fast whisper-small ~500MB Multi Great Medium
Error Handling
import { isSDKError , SDKErrorCode } from '@runanywhere/core'
try {
const result = await RunAnywhere . transcribeFile ( audioPath )
} catch ( error ) {
if ( isSDKError ( error )) {
switch ( error . code ) {
case SDKErrorCode . notInitialized :
console . error ( 'SDK not initialized' )
break
case SDKErrorCode . modelNotLoaded :
console . error ( 'Load an STT model first' )
break
case SDKErrorCode . sttFailed :
console . error ( 'Transcription failed:' , error . message )
break
}
}
}