Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
This page covers advanced configuration options for Speech-to-Text, including model selection, audio settings, and performance tuning.Model Types
The Web SDK supports three STT model architectures:| Architecture | Enum Value | Best For | Speed | Quality |
|---|---|---|---|---|
| Whisper | STTModelType.Whisper | General transcription | Medium | Best |
| Zipformer | STTModelType.Zipformer | Streaming / real-time | Fast | Good |
| Paraformer | STTModelType.Paraformer | Low-latency needs | Fastest | Good |
Whisper Models
Zipformer Models
Paraformer Models
Audio Requirements
| Setting | Value | Description |
|---|---|---|
| Format | Float32Array | PCM audio samples |
| Sample Rate | 16000 Hz | Required for all models |
| Channels | Mono | Single channel |
| Range | -1.0 to 1.0 | Normalized float values |
Model Properties
After loading a model, check its properties:Switching Models
Unload the current model before loading a new one:Recommended Models by Use Case
| Use Case | Recommended Model | Size | Notes |
|---|---|---|---|
| Quick English | Whisper Tiny EN | ~75MB | Fastest, English only |
| General English | Whisper Base EN | ~150MB | Better quality |
| Multilingual | Whisper Small | ~500MB | Supports many languages |
| Real-time | Zipformer | ~30MB | Best for streaming |
Clean Up
Release STT resources when no longer needed:Related
Transcribe
Batch audio transcription
Streaming STT
Real-time transcription