Skip to main content

Overview

The Speech-to-Text (STT) API allows you to transcribe audio files or raw audio data to text using on-device Whisper models. All transcription happens locally on the device for privacy and offline capability.

Basic Usage

import { RunAnywhere } from '@runanywhere/core'

// Transcribe raw audio samples (most common approach)
const result = await RunAnywhere.transcribeBuffer(audioSamples, {
  language: 'en',
})

console.log('Transcription:', result.text)
Or transcribe base64-encoded audio:
// Transcribe base64-encoded float32 PCM audio
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
})

console.log('Transcription:', result.text)

Setup

Before transcribing, you need to download and load an STT model:
import { RunAnywhere, ModelCategory } from '@runanywhere/core'
import { ONNX, ModelArtifactType } from '@runanywhere/onnx'

// 1. Initialize SDK and register ONNX backend
await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
ONNX.register()

// 2. Add Whisper model
await ONNX.addModel({
  id: 'whisper-tiny-en',
  name: 'Whisper Tiny English',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
  modality: ModelCategory.SpeechRecognition,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 75_000_000,
})

// 3. Download model
await RunAnywhere.downloadModel('whisper-tiny-en', (progress) => {
  console.log(`Download: ${(progress.progress * 100).toFixed(1)}%`)
})

// 4. Load model
const modelInfo = await RunAnywhere.getModelInfo('whisper-tiny-en')
await RunAnywhere.loadSTTModel(modelInfo.localPath, 'whisper')

API Reference

transcribeFile

transcribeFile() is declared in the SDK but not yet implemented. Calling it will throw: "transcribeFile not yet implemented with rac_* API". Use transcribe() or transcribeBuffer() instead.
await RunAnywhere.transcribeFile(
  audioPath: string,
  options?: STTOptions
): Promise<STTResult>

transcribe

Transcribe base64-encoded audio data.
await RunAnywhere.transcribe(
  audioData: string,
  options?: STTOptions
): Promise<STTResult>
Parameters:
ParameterTypeDescription
audioDatastringBase64-encoded float32 PCM audio data
optionsSTTOptionsOptional transcription settings

transcribeBuffer

Transcribe raw float32 audio samples directly.
await RunAnywhere.transcribeBuffer(
  samples: Float32Array,
  options?: STTOptions
): Promise<STTResult>
Parameters:
ParameterTypeDescription
samplesFloat32ArrayFloat32 PCM audio samples
optionsSTTOptionsOptional transcription settings
transcribeBuffer converts the Float32Array to base64 internally and calls the native transcribe() method. The sample rate defaults to 16000 Hz.

STT Options

interface STTOptions {
  /** Language code (e.g., 'en', 'es', 'fr') */
  language?: string

  /** Enable punctuation in output */
  punctuation?: boolean

  /** Enable speaker diarization */
  diarization?: boolean

  /** Enable word-level timestamps */
  wordTimestamps?: boolean

  /** Audio sample rate (default: 16000) */
  sampleRate?: number
}

STT Result

interface STTResult {
  /** Main transcription text */
  text: string

  /** Segments with timing information */
  segments: STTSegment[]

  /** Detected language code */
  language?: string

  /** Overall confidence (0.0 - 1.0) */
  confidence: number

  /** Audio duration in seconds */
  duration: number

  /** Alternative transcriptions */
  alternatives: STTAlternative[]
}

interface STTSegment {
  /** Segment text */
  text: string

  /** Start time in seconds */
  startTime: number

  /** End time in seconds */
  endTime: number

  /** Speaker ID (if diarization enabled) */
  speakerId?: string

  /** Segment confidence */
  confidence: number
}

Examples

Transcribe Base64 Audio

// Simple transcription from base64 float32 PCM
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
})
console.log(result.text)

// With language hint
const spanish = await RunAnywhere.transcribe(spanishAudioBase64, {
  language: 'es',
})
console.log(spanish.text)

Transcribe Float32 Buffer

// Transcribe from Float32Array samples
const samples = new Float32Array(audioBuffer)
const result = await RunAnywhere.transcribeBuffer(samples, {
  language: 'en',
})
console.log(result.text)

React Native Recording + Transcription

Using the SDK’s built-in RunAnywhere.Audio API for recording:
VoiceRecorder.tsx
import React, { useState, useCallback } from 'react'
import { View, Button, Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import RNFS from 'react-native-fs'

export function VoiceRecorder() {
  const [isRecording, setIsRecording] = useState(false)
  const [transcription, setTranscription] = useState('')

  const startRecording = useCallback(async () => {
    setIsRecording(true)
    setTranscription('')
    await RunAnywhere.Audio.requestPermission()
    await RunAnywhere.Audio.startRecording()
  }, [])

  const stopRecording = useCallback(async () => {
    setIsRecording(false)
    const recording = await RunAnywhere.Audio.stopRecording()
    // recording.uri  - path to the recorded WAV file
    // recording.durationMs - duration in milliseconds

    // Read the file and transcribe
    try {
      const audioBase64 = await RNFS.readFile(recording.uri, 'base64')
      const result = await RunAnywhere.transcribe(audioBase64, {
        language: 'en',
      })
      setTranscription(result.text)
    } catch (error) {
      setTranscription('Error: ' + (error as Error).message)
    }
  }, [])

  return (
    <View style={{ padding: 16 }}>
      <Button
        title={isRecording ? 'Stop Recording' : 'Start Recording'}
        onPress={isRecording ? stopRecording : startRecording}
      />
      {transcription && <Text style={{ marginTop: 16 }}>{transcription}</Text>}
    </View>
  )
}

Transcribe Buffer from Microphone

// If you have raw audio samples (e.g., from a real-time stream)
const audioSamples = new Float32Array(rawSamples)

const result = await RunAnywhere.transcribeBuffer(audioSamples, {
  language: 'en',
})

console.log(result.text)

Supported Audio Formats

FormatExtensionSupported
WAV.wav
MP3.mp3
M4A.m4a
FLAC.flac
Raw PCM- (via buffer methods)
For best results, use 16kHz mono WAV files. The SDK automatically handles audio conversion, but native formats are faster.

Available Models

ModelSizeLanguagesQualitySpeed
whisper-tiny~75MBEnglishGoodVery Fast
whisper-tiny.en~75MBEnglishBetterVery Fast
whisper-base~150MBMultiBetterFast
whisper-small~500MBMultiGreatMedium

Error Handling

import { isSDKError, SDKErrorCode } from '@runanywhere/core'

try {
  const result = await RunAnywhere.transcribe(audioBase64)
} catch (error) {
  if (isSDKError(error)) {
    switch (error.code) {
      case SDKErrorCode.notInitialized:
        console.error('SDK not initialized')
        break
      case SDKErrorCode.modelNotLoaded:
        console.error('Load an STT model first')
        break
      case SDKErrorCode.sttFailed:
        console.error('Transcription failed:', error.message)
        break
    }
  }
}