Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
The Speech-to-Text (STT) API allows you to transcribe audio data to text using on-device models compiled to WebAssembly. All transcription happens locally in the browser for privacy and offline capability.
Basic Usage
import { STT , STTModelType } from '@runanywhere/web'
// Load a Whisper model
await STT . loadModel ({
modelId: 'whisper-tiny' ,
type: STTModelType . Whisper ,
modelFiles: {
encoder: '/models/whisper-tiny-encoder.onnx' ,
decoder: '/models/whisper-tiny-decoder.onnx' ,
tokens: '/models/whisper-tiny-tokens.txt' ,
},
sampleRate: 16000 ,
})
// Transcribe audio
const result = await STT . transcribe ( audioFloat32Array )
console . log ( 'Transcription:' , result . text )
console . log ( 'Confidence:' , result . confidence )
Setup
Before transcribing, load an STT model:
import { STT , STTModelType } from '@runanywhere/web'
await STT . loadModel ({
modelId: 'whisper-tiny' ,
type: STTModelType . Whisper ,
modelFiles: {
encoder: '/models/whisper-tiny-encoder.onnx' ,
decoder: '/models/whisper-tiny-decoder.onnx' ,
tokens: '/models/whisper-tiny-tokens.txt' ,
},
sampleRate: 16000 ,
language: 'en' ,
})
API Reference
STT.loadModel
Load an STT model for transcription.
await STT . loadModel ( config : STTModelConfig ): Promise < void >
STTModelConfig
interface STTModelConfig {
/** Unique model identifier */
modelId : string
/** Model architecture type */
type : STTModelType
/** Model file paths */
modelFiles : STTWhisperFiles | STTZipformerFiles | STTParaformerFiles
/** Audio sample rate (default: 16000) */
sampleRate ?: number
/** Language code (e.g., 'en', 'es') */
language ?: string
}
enum STTModelType {
Whisper ,
Zipformer ,
Paraformer ,
}
Model File Interfaces
// Whisper models
interface STTWhisperFiles {
encoder : string
decoder : string
tokens : string
}
// Zipformer models
interface STTZipformerFiles {
encoder : string
decoder : string
joiner : string
tokens : string
}
// Paraformer models
interface STTParaformerFiles {
model : string
tokens : string
}
STT.transcribe
Transcribe audio data to text.
await STT . transcribe (
audioSamples : Float32Array ,
options ?: STTTranscribeOptions
): Promise < STTTranscriptionResult >
Parameters:
Parameter Type Description audioSamplesFloat32ArrayPCM audio samples (16kHz mono) optionsSTTTranscribeOptionsOptional transcription settings
STTTranscriptionResult
interface STTTranscriptionResult {
/** Main transcription text */
text : string
/** Overall confidence (0.0-1.0) */
confidence : number
/** Detected language code */
detectedLanguage ?: string
/** Processing time in milliseconds */
processingTimeMs : number
/** Word-level timestamps (if available) */
words ?: STTWord []
}
Examples
Transcribe from Microphone
import { STT , AudioCapture } from '@runanywhere/web'
const capture = new AudioCapture ()
const chunks : Float32Array [] = []
// Collect audio chunks
capture . onAudioChunk (( chunk ) => {
chunks . push ( chunk )
})
// Start recording
await capture . start ({ sampleRate: 16000 })
// Stop after 5 seconds
setTimeout ( async () => {
capture . stop ()
// Concatenate all chunks
const totalLength = chunks . reduce (( sum , c ) => sum + c . length , 0 )
const allSamples = new Float32Array ( totalLength )
let offset = 0
for ( const chunk of chunks ) {
allSamples . set ( chunk , offset )
offset += chunk . length
}
// Transcribe
const result = await STT . transcribe ( allSamples )
console . log ( 'You said:' , result . text )
}, 5000 )
With Language Setting
// English transcription
const english = await STT . transcribe ( audioSamples )
console . log ( english . text )
// Load a multilingual model and transcribe Spanish
await STT . loadModel ({
modelId: 'whisper-base' ,
type: STTModelType . Whisper ,
modelFiles: { encoder: '...' , decoder: '...' , tokens: '...' },
language: 'es' ,
})
const spanish = await STT . transcribe ( spanishAudioSamples )
console . log ( spanish . text )
Available Model Architectures
Architecture Use Case Speed Quality Whisper General-purpose Medium Best Zipformer Streaming Fast Good Paraformer Low-latency Fastest Good
Error Handling
import { STT , SDKError , SDKErrorCode } from '@runanywhere/web'
try {
const result = await STT . transcribe ( audioSamples )
} catch ( err ) {
if ( err instanceof SDKError ) {
switch ( err . code ) {
case SDKErrorCode . NotInitialized :
console . error ( 'Initialize the SDK first' )
break
case SDKErrorCode . ModelNotLoaded :
console . error ( 'Load an STT model first' )
break
default :
console . error ( 'STT error:' , err . message )
}
}
}
STT Streaming Real-time streaming transcription
STT Options Advanced configuration
VAD Voice Activity Detection
Voice Agent Complete voice pipeline