transcribe()

Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

The Speech-to-Text (STT) API allows you to transcribe audio data to text using on-device models compiled to WebAssembly. All transcription happens locally in the browser for privacy and offline capability.

Basic Usage

import { STT, STTModelType } from '@runanywhere/web'

// Load a Whisper model
await STT.loadModel({
  modelId: 'whisper-tiny',
  type: STTModelType.Whisper,
  modelFiles: {
    encoder: '/models/whisper-tiny-encoder.onnx',
    decoder: '/models/whisper-tiny-decoder.onnx',
    tokens: '/models/whisper-tiny-tokens.txt',
  },
  sampleRate: 16000,
})

// Transcribe audio
const result = await STT.transcribe(audioFloat32Array)
console.log('Transcription:', result.text)
console.log('Confidence:', result.confidence)

Setup

Before transcribing, load an STT model:

import { STT, STTModelType } from '@runanywhere/web'

await STT.loadModel({
  modelId: 'whisper-tiny',
  type: STTModelType.Whisper,
  modelFiles: {
    encoder: '/models/whisper-tiny-encoder.onnx',
    decoder: '/models/whisper-tiny-decoder.onnx',
    tokens: '/models/whisper-tiny-tokens.txt',
  },
  sampleRate: 16000,
  language: 'en',
})

API Reference

`STT.loadModel`

Load an STT model for transcription.

await STT.loadModel(config: STTModelConfig): Promise<void>

STTModelConfig

interface STTModelConfig {
  /** Unique model identifier */
  modelId: string

  /** Model architecture type */
  type: STTModelType

  /** Model file paths */
  modelFiles: STTWhisperFiles | STTZipformerFiles | STTParaformerFiles

  /** Audio sample rate (default: 16000) */
  sampleRate?: number

  /** Language code (e.g., 'en', 'es') */
  language?: string
}

enum STTModelType {
  Whisper,
  Zipformer,
  Paraformer,
}

Model File Interfaces

// Whisper models
interface STTWhisperFiles {
  encoder: string
  decoder: string
  tokens: string
}

// Zipformer models
interface STTZipformerFiles {
  encoder: string
  decoder: string
  joiner: string
  tokens: string
}

// Paraformer models
interface STTParaformerFiles {
  model: string
  tokens: string
}

`STT.transcribe`

Transcribe audio data to text.

await STT.transcribe(
  audioSamples: Float32Array,
  options?: STTTranscribeOptions
): Promise<STTTranscriptionResult>

Parameters:

Parameter	Type	Description
`audioSamples`	`Float32Array`	PCM audio samples (16kHz mono)
`options`	`STTTranscribeOptions`	Optional transcription settings

STTTranscriptionResult

interface STTTranscriptionResult {
  /** Main transcription text */
  text: string

  /** Overall confidence (0.0-1.0) */
  confidence: number

  /** Detected language code */
  detectedLanguage?: string

  /** Processing time in milliseconds */
  processingTimeMs: number

  /** Word-level timestamps (if available) */
  words?: STTWord[]
}

Examples

Transcribe from Microphone

import { STT, AudioCapture } from '@runanywhere/web'

const capture = new AudioCapture()
const chunks: Float32Array[] = []

// Collect audio chunks
capture.onAudioChunk((chunk) => {
  chunks.push(chunk)
})

// Start recording
await capture.start({ sampleRate: 16000 })

// Stop after 5 seconds
setTimeout(async () => {
  capture.stop()

  // Concatenate all chunks
  const totalLength = chunks.reduce((sum, c) => sum + c.length, 0)
  const allSamples = new Float32Array(totalLength)
  let offset = 0
  for (const chunk of chunks) {
    allSamples.set(chunk, offset)
    offset += chunk.length
  }

  // Transcribe
  const result = await STT.transcribe(allSamples)
  console.log('You said:', result.text)
}, 5000)

With Language Setting

// English transcription
const english = await STT.transcribe(audioSamples)
console.log(english.text)

// Load a multilingual model and transcribe Spanish
await STT.loadModel({
  modelId: 'whisper-base',
  type: STTModelType.Whisper,
  modelFiles: { encoder: '...', decoder: '...', tokens: '...' },
  language: 'es',
})

const spanish = await STT.transcribe(spanishAudioSamples)
console.log(spanish.text)

Available Model Architectures

Architecture	Use Case	Speed	Quality
Whisper	General-purpose	Medium	Best
Zipformer	Streaming	Fast	Good
Paraformer	Low-latency	Fastest	Good

Error Handling

import { STT, SDKError, SDKErrorCode } from '@runanywhere/web'

try {
  const result = await STT.transcribe(audioSamples)
} catch (err) {
  if (err instanceof SDKError) {
    switch (err.code) {
      case SDKErrorCode.NotInitialized:
        console.error('Initialize the SDK first')
        break
      case SDKErrorCode.ModelNotLoaded:
        console.error('Load an STT model first')
        break
      default:
        console.error('STT error:', err.message)
    }
  }
}

STT Streaming

Real-time streaming transcription

STT Options

Advanced configuration

VAD

Voice Activity Detection

Voice Agent

Complete voice pipeline

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

Overview

Basic Usage

Setup

API Reference

`STT.loadModel`

STTModelConfig

Model File Interfaces

`STT.transcribe`

STTTranscriptionResult

Examples

Transcribe from Microphone

With Language Setting

Available Model Architectures

Error Handling

STT Streaming

STT Options

VAD

Voice Agent

​Overview

​Basic Usage

​Setup

​API Reference

​STT.loadModel

​STTModelConfig

​Model File Interfaces

​STT.transcribe

​STTTranscriptionResult

​Examples

​Transcribe from Microphone

​With Language Setting

​Available Model Architectures

​Error Handling

​Related

STT Streaming

STT Options

VAD

Voice Agent

Overview

Basic Usage

Setup

API Reference

`STT.loadModel`

STTModelConfig

Model File Interfaces

`STT.transcribe`

STTTranscriptionResult

Examples

Transcribe from Microphone

With Language Setting

Available Model Architectures

Error Handling

Related