Skip to main content
Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

This page covers advanced configuration options for Speech-to-Text, including model selection, audio settings, and performance tuning.

Model Types

The Web SDK supports three STT model architectures:
ArchitectureEnum ValueBest ForSpeedQuality
WhisperSTTModelType.WhisperGeneral transcriptionMediumBest
ZipformerSTTModelType.ZipformerStreaming / real-timeFastGood
ParaformerSTTModelType.ParaformerLow-latency needsFastestGood

Whisper Models

import { STT, STTModelType } from '@runanywhere/web'

await STT.loadModel({
  modelId: 'whisper-tiny-en',
  type: STTModelType.Whisper,
  modelFiles: {
    encoder: '/models/whisper-tiny-encoder.onnx',
    decoder: '/models/whisper-tiny-decoder.onnx',
    tokens: '/models/whisper-tiny-tokens.txt',
  },
  sampleRate: 16000,
  language: 'en',
})

Zipformer Models

await STT.loadModel({
  modelId: 'zipformer-en',
  type: STTModelType.Zipformer,
  modelFiles: {
    encoder: '/models/zipformer-encoder.onnx',
    decoder: '/models/zipformer-decoder.onnx',
    joiner: '/models/zipformer-joiner.onnx',
    tokens: '/models/zipformer-tokens.txt',
  },
})

Paraformer Models

await STT.loadModel({
  modelId: 'paraformer-zh',
  type: STTModelType.Paraformer,
  modelFiles: {
    model: '/models/paraformer.onnx',
    tokens: '/models/paraformer-tokens.txt',
  },
})

Audio Requirements

SettingValueDescription
FormatFloat32ArrayPCM audio samples
Sample Rate16000 HzRequired for all models
ChannelsMonoSingle channel
Range-1.0 to 1.0Normalized float values

Model Properties

After loading a model, check its properties:
console.log('Model loaded:', STT.isModelLoaded)
console.log('Model ID:', STT.modelId)
console.log('Model type:', STT.currentModelType)

Switching Models

Unload the current model before loading a new one:
// Unload current model
await STT.unloadModel()

// Load a different model
await STT.loadModel({
  modelId: 'whisper-base',
  type: STTModelType.Whisper,
  modelFiles: { encoder: '...', decoder: '...', tokens: '...' },
})
Use CaseRecommended ModelSizeNotes
Quick EnglishWhisper Tiny EN~75MBFastest, English only
General EnglishWhisper Base EN~150MBBetter quality
MultilingualWhisper Small~500MBSupports many languages
Real-timeZipformer~30MBBest for streaming

Clean Up

Release STT resources when no longer needed:
STT.cleanup()