Skip to main content

Overview

The Speech-to-Text (STT) API allows you to transcribe audio files or raw audio data to text using on-device Whisper models. All transcription happens locally on the device for privacy and offline capability.

Basic Usage

import { RunAnywhere } from '@runanywhere/core'

// Transcribe an audio file
const result = await RunAnywhere.transcribeFile('/path/to/audio.wav', {
  language: 'en',
})

console.log('Transcription:', result.text)
console.log('Confidence:', result.confidence)
console.log('Duration:', result.duration, 'seconds')

Setup

Before transcribing, you need to download and load an STT model:
import { RunAnywhere, ModelCategory } from '@runanywhere/core'
import { ONNX, ModelArtifactType } from '@runanywhere/onnx'

// 1. Initialize SDK and register ONNX backend
await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
ONNX.register()

// 2. Add Whisper model
await ONNX.addModel({
  id: 'whisper-tiny-en',
  name: 'Whisper Tiny English',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
  modality: ModelCategory.SpeechRecognition,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 75_000_000,
})

// 3. Download model
await RunAnywhere.downloadModel('whisper-tiny-en', (progress) => {
  console.log(`Download: ${(progress.progress * 100).toFixed(1)}%`)
})

// 4. Load model
const modelInfo = await RunAnywhere.getModelInfo('whisper-tiny-en')
await RunAnywhere.loadSTTModel(modelInfo.localPath, 'whisper')

API Reference

transcribeFile

Transcribe audio from a file path.
await RunAnywhere.transcribeFile(
  audioPath: string,
  options?: STTOptions
): Promise<STTResult>
Parameters:
ParameterTypeDescription
audioPathstringAbsolute path to the audio file
optionsSTTOptionsOptional transcription settings

transcribe

Transcribe base64-encoded audio data.
await RunAnywhere.transcribe(
  audioData: string,
  options?: STTOptions
): Promise<STTResult>
Parameters:
ParameterTypeDescription
audioDatastringBase64-encoded float32 PCM audio data
optionsSTTOptionsOptional transcription settings

transcribeBuffer

Transcribe raw float32 audio samples.
await RunAnywhere.transcribeBuffer(
  samples: number[],
  sampleRate: number,
  options?: STTOptions
): Promise<STTResult>
Parameters:
ParameterTypeDescription
samplesnumber[]Float32 audio samples (-1.0 to 1.0)
sampleRatenumberSample rate in Hz (e.g., 16000)
optionsSTTOptionsOptional transcription settings

STT Options

interface STTOptions {
  /** Language code (e.g., 'en', 'es', 'fr') */
  language?: string

  /** Enable punctuation in output */
  punctuation?: boolean

  /** Enable speaker diarization */
  diarization?: boolean

  /** Enable word-level timestamps */
  wordTimestamps?: boolean

  /** Audio sample rate (default: 16000) */
  sampleRate?: number
}

STT Result

interface STTResult {
  /** Main transcription text */
  text: string

  /** Segments with timing information */
  segments: STTSegment[]

  /** Detected language code */
  language?: string

  /** Overall confidence (0.0 - 1.0) */
  confidence: number

  /** Audio duration in seconds */
  duration: number

  /** Alternative transcriptions */
  alternatives: STTAlternative[]
}

interface STTSegment {
  /** Segment text */
  text: string

  /** Start time in seconds */
  startTime: number

  /** End time in seconds */
  endTime: number

  /** Speaker ID (if diarization enabled) */
  speakerId?: string

  /** Segment confidence */
  confidence: number
}

Examples

Transcribe Audio File

// Simple transcription
const result = await RunAnywhere.transcribeFile('/path/to/recording.wav')
console.log(result.text)

// With language hint
const spanish = await RunAnywhere.transcribeFile('/path/to/spanish.wav', {
  language: 'es',
})
console.log(spanish.text)

With Word Timestamps

const result = await RunAnywhere.transcribeFile(audioPath, {
  language: 'en',
  wordTimestamps: true,
})

// Display words with timing
for (const segment of result.segments) {
  console.log(
    `[${segment.startTime.toFixed(2)}s - ${segment.endTime.toFixed(2)}s]: ${segment.text}`
  )
}

React Native Recording + Transcription

VoiceRecorder.tsx
import React, { useState, useCallback } from 'react'
import { View, Button, Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import AudioRecord from 'react-native-audio-record' // Example audio library

export function VoiceRecorder() {
  const [isRecording, setIsRecording] = useState(false)
  const [transcription, setTranscription] = useState('')

  const startRecording = useCallback(async () => {
    setIsRecording(true)
    setTranscription('')

    // Configure audio recording
    AudioRecord.init({
      sampleRate: 16000,
      channels: 1,
      bitsPerSample: 16,
      audioSource: 6,
      wavFile: 'recording.wav',
    })

    AudioRecord.start()
  }, [])

  const stopRecording = useCallback(async () => {
    setIsRecording(false)
    const audioPath = await AudioRecord.stop()

    // Transcribe the recording
    try {
      const result = await RunAnywhere.transcribeFile(audioPath, {
        language: 'en',
      })
      setTranscription(result.text)
    } catch (error) {
      setTranscription('Error: ' + (error as Error).message)
    }
  }, [])

  return (
    <View style={{ padding: 16 }}>
      <Button
        title={isRecording ? 'Stop Recording' : 'Start Recording'}
        onPress={isRecording ? stopRecording : startRecording}
      />
      {transcription && <Text style={{ marginTop: 16 }}>{transcription}</Text>}
    </View>
  )
}

Transcribe Buffer from Microphone

// If you have raw audio samples (e.g., from a real-time stream)
const audioSamples: number[] = [] // Float32 samples from microphone

const result = await RunAnywhere.transcribeBuffer(
  audioSamples,
  16000, // Sample rate
  { language: 'en' }
)

console.log(result.text)

Supported Audio Formats

FormatExtensionSupported
WAV.wav
MP3.mp3
M4A.m4a
FLAC.flac
Raw PCM- (via buffer methods)
For best results, use 16kHz mono WAV files. The SDK automatically handles audio conversion, but native formats are faster.

Available Models

ModelSizeLanguagesQualitySpeed
whisper-tiny~75MBEnglishGoodVery Fast
whisper-tiny.en~75MBEnglishBetterVery Fast
whisper-base~150MBMultiBetterFast
whisper-small~500MBMultiGreatMedium

Error Handling

import { isSDKError, SDKErrorCode } from '@runanywhere/core'

try {
  const result = await RunAnywhere.transcribeFile(audioPath)
} catch (error) {
  if (isSDKError(error)) {
    switch (error.code) {
      case SDKErrorCode.notInitialized:
        console.error('SDK not initialized')
        break
      case SDKErrorCode.modelNotLoaded:
        console.error('Load an STT model first')
        break
      case SDKErrorCode.sttFailed:
        console.error('Transcription failed:', error.message)
        break
    }
  }
}