Overview
The Speech-to-Text (STT) API allows you to transcribe audio files or raw audio data to text using on-device Whisper models. All transcription happens locally on the device for privacy and offline capability.
Basic Usage
import { RunAnywhere } from '@runanywhere/core'
// Transcribe raw audio samples (most common approach)
const result = await RunAnywhere . transcribeBuffer ( audioSamples , {
language: 'en' ,
})
console . log ( 'Transcription:' , result . text )
Or transcribe base64-encoded audio:
// Transcribe base64-encoded float32 PCM audio
const result = await RunAnywhere . transcribe ( audioBase64 , {
language: 'en' ,
})
console . log ( 'Transcription:' , result . text )
Setup
Before transcribing, you need to download and load an STT model:
import { RunAnywhere , ModelCategory } from '@runanywhere/core'
import { ONNX , ModelArtifactType } from '@runanywhere/onnx'
// 1. Initialize SDK and register ONNX backend
await RunAnywhere . initialize ({ environment: SDKEnvironment . Development })
ONNX . register ()
// 2. Add Whisper model
await ONNX . addModel ({
id: 'whisper-tiny-en' ,
name: 'Whisper Tiny English' ,
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz' ,
modality: ModelCategory . SpeechRecognition ,
artifactType: ModelArtifactType . TarGzArchive ,
memoryRequirement: 75_000_000 ,
})
// 3. Download model
await RunAnywhere . downloadModel ( 'whisper-tiny-en' , ( progress ) => {
console . log ( `Download: ${ ( progress . progress * 100 ). toFixed ( 1 ) } %` )
})
// 4. Load model
const modelInfo = await RunAnywhere . getModelInfo ( 'whisper-tiny-en' )
await RunAnywhere . loadSTTModel ( modelInfo . localPath , 'whisper' )
API Reference
transcribeFile
transcribeFile() is declared in the SDK but not yet implemented . Calling it will throw:
"transcribeFile not yet implemented with rac_* API". Use transcribe() or transcribeBuffer()
instead.
await RunAnywhere . transcribeFile (
audioPath : string ,
options ?: STTOptions
): Promise < STTResult >
transcribe
Transcribe base64-encoded audio data.
await RunAnywhere . transcribe (
audioData : string ,
options ?: STTOptions
): Promise < STTResult >
Parameters:
Parameter Type Description audioDatastringBase64-encoded float32 PCM audio data optionsSTTOptionsOptional transcription settings
transcribeBuffer
Transcribe raw float32 audio samples directly.
await RunAnywhere . transcribeBuffer (
samples : Float32Array ,
options ?: STTOptions
): Promise < STTResult >
Parameters:
Parameter Type Description samplesFloat32ArrayFloat32 PCM audio samples optionsSTTOptionsOptional transcription settings
transcribeBuffer converts the Float32Array to base64 internally and calls the native
transcribe() method. The sample rate defaults to 16000 Hz.
STT Options
interface STTOptions {
/** Language code (e.g., 'en', 'es', 'fr') */
language ?: string
/** Enable punctuation in output */
punctuation ?: boolean
/** Enable speaker diarization */
diarization ?: boolean
/** Enable word-level timestamps */
wordTimestamps ?: boolean
/** Audio sample rate (default: 16000) */
sampleRate ?: number
}
STT Result
interface STTResult {
/** Main transcription text */
text : string
/** Segments with timing information */
segments : STTSegment []
/** Detected language code */
language ?: string
/** Overall confidence (0.0 - 1.0) */
confidence : number
/** Audio duration in seconds */
duration : number
/** Alternative transcriptions */
alternatives : STTAlternative []
}
interface STTSegment {
/** Segment text */
text : string
/** Start time in seconds */
startTime : number
/** End time in seconds */
endTime : number
/** Speaker ID (if diarization enabled) */
speakerId ?: string
/** Segment confidence */
confidence : number
}
Examples
Transcribe Base64 Audio
// Simple transcription from base64 float32 PCM
const result = await RunAnywhere . transcribe ( audioBase64 , {
language: 'en' ,
})
console . log ( result . text )
// With language hint
const spanish = await RunAnywhere . transcribe ( spanishAudioBase64 , {
language: 'es' ,
})
console . log ( spanish . text )
Transcribe Float32 Buffer
// Transcribe from Float32Array samples
const samples = new Float32Array ( audioBuffer )
const result = await RunAnywhere . transcribeBuffer ( samples , {
language: 'en' ,
})
console . log ( result . text )
React Native Recording + Transcription
Using the SDK’s built-in RunAnywhere.Audio API for recording:
import React , { useState , useCallback } from 'react'
import { View , Button , Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import RNFS from 'react-native-fs'
export function VoiceRecorder () {
const [ isRecording , setIsRecording ] = useState ( false )
const [ transcription , setTranscription ] = useState ( '' )
const startRecording = useCallback ( async () => {
setIsRecording ( true )
setTranscription ( '' )
await RunAnywhere . Audio . requestPermission ()
await RunAnywhere . Audio . startRecording ()
}, [])
const stopRecording = useCallback ( async () => {
setIsRecording ( false )
const recording = await RunAnywhere . Audio . stopRecording ()
// recording.uri - path to the recorded WAV file
// recording.durationMs - duration in milliseconds
// Read the file and transcribe
try {
const audioBase64 = await RNFS . readFile ( recording . uri , 'base64' )
const result = await RunAnywhere . transcribe ( audioBase64 , {
language: 'en' ,
})
setTranscription ( result . text )
} catch ( error ) {
setTranscription ( 'Error: ' + ( error as Error ). message )
}
}, [])
return (
< View style = { { padding: 16 } } >
< Button
title = { isRecording ? 'Stop Recording' : 'Start Recording' }
onPress = { isRecording ? stopRecording : startRecording }
/>
{ transcription && < Text style = { { marginTop: 16 } } > { transcription } </ Text > }
</ View >
)
}
Transcribe Buffer from Microphone
// If you have raw audio samples (e.g., from a real-time stream)
const audioSamples = new Float32Array ( rawSamples )
const result = await RunAnywhere . transcribeBuffer ( audioSamples , {
language: 'en' ,
})
console . log ( result . text )
Format Extension Supported WAV .wavMP3 .mp3M4A .m4aFLAC .flacRaw PCM - (via buffer methods)
For best results, use 16kHz mono WAV files. The SDK automatically handles audio conversion, but
native formats are faster.
Available Models
Model Size Languages Quality Speed whisper-tiny ~75MB English Good Very Fast whisper-tiny.en ~75MB English Better Very Fast whisper-base ~150MB Multi Better Fast whisper-small ~500MB Multi Great Medium
Error Handling
import { isSDKError , SDKErrorCode } from '@runanywhere/core'
try {
const result = await RunAnywhere . transcribe ( audioBase64 )
} catch ( error ) {
if ( isSDKError ( error )) {
switch ( error . code ) {
case SDKErrorCode . notInitialized :
console . error ( 'SDK not initialized' )
break
case SDKErrorCode . modelNotLoaded :
console . error ( 'Load an STT model first' )
break
case SDKErrorCode . sttFailed :
console . error ( 'Transcription failed:' , error . message )
break
}
}
}