Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
This page covers advanced configuration options for Speech-to-Text, including model selection, audio settings, and performance tuning.
Model Types
The Web SDK supports three STT model architectures:
| Architecture | Enum Value | Best For | Speed | Quality |
|---|
| Whisper | STTModelType.Whisper | General transcription | Medium | Best |
| Zipformer | STTModelType.Zipformer | Streaming / real-time | Fast | Good |
| Paraformer | STTModelType.Paraformer | Low-latency needs | Fastest | Good |
Whisper Models
import { STT, STTModelType } from '@runanywhere/web'
await STT.loadModel({
modelId: 'whisper-tiny-en',
type: STTModelType.Whisper,
modelFiles: {
encoder: '/models/whisper-tiny-encoder.onnx',
decoder: '/models/whisper-tiny-decoder.onnx',
tokens: '/models/whisper-tiny-tokens.txt',
},
sampleRate: 16000,
language: 'en',
})
await STT.loadModel({
modelId: 'zipformer-en',
type: STTModelType.Zipformer,
modelFiles: {
encoder: '/models/zipformer-encoder.onnx',
decoder: '/models/zipformer-decoder.onnx',
joiner: '/models/zipformer-joiner.onnx',
tokens: '/models/zipformer-tokens.txt',
},
})
await STT.loadModel({
modelId: 'paraformer-zh',
type: STTModelType.Paraformer,
modelFiles: {
model: '/models/paraformer.onnx',
tokens: '/models/paraformer-tokens.txt',
},
})
Audio Requirements
| Setting | Value | Description |
|---|
| Format | Float32Array | PCM audio samples |
| Sample Rate | 16000 Hz | Required for all models |
| Channels | Mono | Single channel |
| Range | -1.0 to 1.0 | Normalized float values |
Model Properties
After loading a model, check its properties:
console.log('Model loaded:', STT.isModelLoaded)
console.log('Model ID:', STT.modelId)
console.log('Model type:', STT.currentModelType)
Switching Models
Unload the current model before loading a new one:
// Unload current model
await STT.unloadModel()
// Load a different model
await STT.loadModel({
modelId: 'whisper-base',
type: STTModelType.Whisper,
modelFiles: { encoder: '...', decoder: '...', tokens: '...' },
})
Recommended Models by Use Case
| Use Case | Recommended Model | Size | Notes |
|---|
| Quick English | Whisper Tiny EN | ~75MB | Fastest, English only |
| General English | Whisper Base EN | ~150MB | Better quality |
| Multilingual | Whisper Small | ~500MB | Supports many languages |
| Real-time | Zipformer | ~30MB | Best for streaming |
Clean Up
Release STT resources when no longer needed: