Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
The Speech-to-Text (STT) API allows you to transcribe audio data to text using on-device models compiled to WebAssembly. All transcription happens locally in the browser for privacy and offline capability.
Basic Usage
import { STT, STTModelType } from '@runanywhere/web'
// Load a Whisper model
await STT.loadModel({
modelId: 'whisper-tiny',
type: STTModelType.Whisper,
modelFiles: {
encoder: '/models/whisper-tiny-encoder.onnx',
decoder: '/models/whisper-tiny-decoder.onnx',
tokens: '/models/whisper-tiny-tokens.txt',
},
sampleRate: 16000,
})
// Transcribe audio
const result = await STT.transcribe(audioFloat32Array)
console.log('Transcription:', result.text)
console.log('Confidence:', result.confidence)
Setup
Before transcribing, load an STT model:
import { STT, STTModelType } from '@runanywhere/web'
await STT.loadModel({
modelId: 'whisper-tiny',
type: STTModelType.Whisper,
modelFiles: {
encoder: '/models/whisper-tiny-encoder.onnx',
decoder: '/models/whisper-tiny-decoder.onnx',
tokens: '/models/whisper-tiny-tokens.txt',
},
sampleRate: 16000,
language: 'en',
})
API Reference
STT.loadModel
Load an STT model for transcription.
await STT.loadModel(config: STTModelConfig): Promise<void>
STTModelConfig
interface STTModelConfig {
/** Unique model identifier */
modelId: string
/** Model architecture type */
type: STTModelType
/** Model file paths */
modelFiles: STTWhisperFiles | STTZipformerFiles | STTParaformerFiles
/** Audio sample rate (default: 16000) */
sampleRate?: number
/** Language code (e.g., 'en', 'es') */
language?: string
}
enum STTModelType {
Whisper,
Zipformer,
Paraformer,
}
Model File Interfaces
// Whisper models
interface STTWhisperFiles {
encoder: string
decoder: string
tokens: string
}
// Zipformer models
interface STTZipformerFiles {
encoder: string
decoder: string
joiner: string
tokens: string
}
// Paraformer models
interface STTParaformerFiles {
model: string
tokens: string
}
STT.transcribe
Transcribe audio data to text.
await STT.transcribe(
audioSamples: Float32Array,
options?: STTTranscribeOptions
): Promise<STTTranscriptionResult>
Parameters:
| Parameter | Type | Description |
|---|
audioSamples | Float32Array | PCM audio samples (16kHz mono) |
options | STTTranscribeOptions | Optional transcription settings |
STTTranscriptionResult
interface STTTranscriptionResult {
/** Main transcription text */
text: string
/** Overall confidence (0.0-1.0) */
confidence: number
/** Detected language code */
detectedLanguage?: string
/** Processing time in milliseconds */
processingTimeMs: number
/** Word-level timestamps (if available) */
words?: STTWord[]
}
Examples
Transcribe from Microphone
import { STT, AudioCapture } from '@runanywhere/web'
const capture = new AudioCapture()
const chunks: Float32Array[] = []
// Collect audio chunks
capture.onAudioChunk((chunk) => {
chunks.push(chunk)
})
// Start recording
await capture.start({ sampleRate: 16000 })
// Stop after 5 seconds
setTimeout(async () => {
capture.stop()
// Concatenate all chunks
const totalLength = chunks.reduce((sum, c) => sum + c.length, 0)
const allSamples = new Float32Array(totalLength)
let offset = 0
for (const chunk of chunks) {
allSamples.set(chunk, offset)
offset += chunk.length
}
// Transcribe
const result = await STT.transcribe(allSamples)
console.log('You said:', result.text)
}, 5000)
With Language Setting
// English transcription
const english = await STT.transcribe(audioSamples)
console.log(english.text)
// Load a multilingual model and transcribe Spanish
await STT.loadModel({
modelId: 'whisper-base',
type: STTModelType.Whisper,
modelFiles: { encoder: '...', decoder: '...', tokens: '...' },
language: 'es',
})
const spanish = await STT.transcribe(spanishAudioSamples)
console.log(spanish.text)
Available Model Architectures
| Architecture | Use Case | Speed | Quality |
|---|
| Whisper | General-purpose | Medium | Best |
| Zipformer | Streaming | Fast | Good |
| Paraformer | Low-latency | Fastest | Good |
Error Handling
import { STT, SDKError, SDKErrorCode } from '@runanywhere/web'
try {
const result = await STT.transcribe(audioSamples)
} catch (err) {
if (err instanceof SDKError) {
switch (err.code) {
case SDKErrorCode.NotInitialized:
console.error('Initialize the SDK first')
break
case SDKErrorCode.ModelNotLoaded:
console.error('Load an STT model first')
break
default:
console.error('STT error:', err.message)
}
}
}