Skip to main content
Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

The Speech-to-Text (STT) API allows you to transcribe audio data to text using on-device models compiled to WebAssembly. All transcription happens locally in the browser for privacy and offline capability.

Basic Usage

import { STT, STTModelType } from '@runanywhere/web'

// Load a Whisper model
await STT.loadModel({
  modelId: 'whisper-tiny',
  type: STTModelType.Whisper,
  modelFiles: {
    encoder: '/models/whisper-tiny-encoder.onnx',
    decoder: '/models/whisper-tiny-decoder.onnx',
    tokens: '/models/whisper-tiny-tokens.txt',
  },
  sampleRate: 16000,
})

// Transcribe audio
const result = await STT.transcribe(audioFloat32Array)
console.log('Transcription:', result.text)
console.log('Confidence:', result.confidence)

Setup

Before transcribing, load an STT model:
import { STT, STTModelType } from '@runanywhere/web'

await STT.loadModel({
  modelId: 'whisper-tiny',
  type: STTModelType.Whisper,
  modelFiles: {
    encoder: '/models/whisper-tiny-encoder.onnx',
    decoder: '/models/whisper-tiny-decoder.onnx',
    tokens: '/models/whisper-tiny-tokens.txt',
  },
  sampleRate: 16000,
  language: 'en',
})

API Reference

STT.loadModel

Load an STT model for transcription.
await STT.loadModel(config: STTModelConfig): Promise<void>

STTModelConfig

interface STTModelConfig {
  /** Unique model identifier */
  modelId: string

  /** Model architecture type */
  type: STTModelType

  /** Model file paths */
  modelFiles: STTWhisperFiles | STTZipformerFiles | STTParaformerFiles

  /** Audio sample rate (default: 16000) */
  sampleRate?: number

  /** Language code (e.g., 'en', 'es') */
  language?: string
}

enum STTModelType {
  Whisper,
  Zipformer,
  Paraformer,
}

Model File Interfaces

// Whisper models
interface STTWhisperFiles {
  encoder: string
  decoder: string
  tokens: string
}

// Zipformer models
interface STTZipformerFiles {
  encoder: string
  decoder: string
  joiner: string
  tokens: string
}

// Paraformer models
interface STTParaformerFiles {
  model: string
  tokens: string
}

STT.transcribe

Transcribe audio data to text.
await STT.transcribe(
  audioSamples: Float32Array,
  options?: STTTranscribeOptions
): Promise<STTTranscriptionResult>
Parameters:
ParameterTypeDescription
audioSamplesFloat32ArrayPCM audio samples (16kHz mono)
optionsSTTTranscribeOptionsOptional transcription settings

STTTranscriptionResult

interface STTTranscriptionResult {
  /** Main transcription text */
  text: string

  /** Overall confidence (0.0-1.0) */
  confidence: number

  /** Detected language code */
  detectedLanguage?: string

  /** Processing time in milliseconds */
  processingTimeMs: number

  /** Word-level timestamps (if available) */
  words?: STTWord[]
}

Examples

Transcribe from Microphone

import { STT, AudioCapture } from '@runanywhere/web'

const capture = new AudioCapture()
const chunks: Float32Array[] = []

// Collect audio chunks
capture.onAudioChunk((chunk) => {
  chunks.push(chunk)
})

// Start recording
await capture.start({ sampleRate: 16000 })

// Stop after 5 seconds
setTimeout(async () => {
  capture.stop()

  // Concatenate all chunks
  const totalLength = chunks.reduce((sum, c) => sum + c.length, 0)
  const allSamples = new Float32Array(totalLength)
  let offset = 0
  for (const chunk of chunks) {
    allSamples.set(chunk, offset)
    offset += chunk.length
  }

  // Transcribe
  const result = await STT.transcribe(allSamples)
  console.log('You said:', result.text)
}, 5000)

With Language Setting

// English transcription
const english = await STT.transcribe(audioSamples)
console.log(english.text)

// Load a multilingual model and transcribe Spanish
await STT.loadModel({
  modelId: 'whisper-base',
  type: STTModelType.Whisper,
  modelFiles: { encoder: '...', decoder: '...', tokens: '...' },
  language: 'es',
})

const spanish = await STT.transcribe(spanishAudioSamples)
console.log(spanish.text)

Available Model Architectures

ArchitectureUse CaseSpeedQuality
WhisperGeneral-purposeMediumBest
ZipformerStreamingFastGood
ParaformerLow-latencyFastestGood

Error Handling

import { STT, SDKError, SDKErrorCode } from '@runanywhere/web'

try {
  const result = await STT.transcribe(audioSamples)
} catch (err) {
  if (err instanceof SDKError) {
    switch (err.code) {
      case SDKErrorCode.NotInitialized:
        console.error('Initialize the SDK first')
        break
      case SDKErrorCode.ModelNotLoaded:
        console.error('Load an STT model first')
        break
      default:
        console.error('STT error:', err.message)
    }
  }
}