transcribe()

Overview

The Speech-to-Text (STT) API allows you to transcribe audio files or raw audio data to text using on-device Whisper models. All transcription happens locally on the device for privacy and offline capability.

Basic Usage

import { RunAnywhere } from '@runanywhere/core'

// Transcribe raw audio samples (most common approach)
const result = await RunAnywhere.transcribeBuffer(audioSamples, {
  language: 'en',
})

console.log('Transcription:', result.text)

Or transcribe base64-encoded audio:

// Transcribe base64-encoded float32 PCM audio
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
})

console.log('Transcription:', result.text)

Setup

Before transcribing, you need to download and load an STT model:

import { RunAnywhere, ModelCategory } from '@runanywhere/core'
import { ONNX, ModelArtifactType } from '@runanywhere/onnx'

// 1. Initialize SDK and register ONNX backend
await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
ONNX.register()

// 2. Add Whisper model
await ONNX.addModel({
  id: 'whisper-tiny-en',
  name: 'Whisper Tiny English',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
  modality: ModelCategory.SpeechRecognition,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 75_000_000,
})

// 3. Download model
await RunAnywhere.downloadModel('whisper-tiny-en', (progress) => {
  console.log(`Download: ${(progress.progress * 100).toFixed(1)}%`)
})

// 4. Load model
const modelInfo = await RunAnywhere.getModelInfo('whisper-tiny-en')
await RunAnywhere.loadSTTModel(modelInfo.localPath, 'whisper')

API Reference

`transcribeFile`

transcribeFile() is declared in the SDK but not yet implemented. Calling it will throw: "transcribeFile not yet implemented with rac_* API". Use transcribe() or transcribeBuffer() instead.

await RunAnywhere.transcribeFile(
  audioPath: string,
  options?: STTOptions
): Promise<STTResult>

`transcribe`

Transcribe base64-encoded audio data.

await RunAnywhere.transcribe(
  audioData: string,
  options?: STTOptions
): Promise<STTResult>

Parameters:

Parameter	Type	Description
`audioData`	`string`	Base64-encoded float32 PCM audio data
`options`	`STTOptions`	Optional transcription settings

`transcribeBuffer`

Transcribe raw float32 audio samples directly.

await RunAnywhere.transcribeBuffer(
  samples: Float32Array,
  options?: STTOptions
): Promise<STTResult>

Parameters:

Parameter	Type	Description
`samples`	`Float32Array`	Float32 PCM audio samples
`options`	`STTOptions`	Optional transcription settings

transcribeBuffer converts the Float32Array to base64 internally and calls the native transcribe() method. The sample rate defaults to 16000 Hz.

STT Options

interface STTOptions {
  /** Language code (e.g., 'en', 'es', 'fr') */
  language?: string

  /** Enable punctuation in output */
  punctuation?: boolean

  /** Enable speaker diarization */
  diarization?: boolean

  /** Enable word-level timestamps */
  wordTimestamps?: boolean

  /** Audio sample rate (default: 16000) */
  sampleRate?: number
}

STT Result

interface STTResult {
  /** Main transcription text */
  text: string

  /** Segments with timing information */
  segments: STTSegment[]

  /** Detected language code */
  language?: string

  /** Overall confidence (0.0 - 1.0) */
  confidence: number

  /** Audio duration in seconds */
  duration: number

  /** Alternative transcriptions */
  alternatives: STTAlternative[]
}

interface STTSegment {
  /** Segment text */
  text: string

  /** Start time in seconds */
  startTime: number

  /** End time in seconds */
  endTime: number

  /** Speaker ID (if diarization enabled) */
  speakerId?: string

  /** Segment confidence */
  confidence: number
}

Examples

Transcribe Base64 Audio

// Simple transcription from base64 float32 PCM
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
})
console.log(result.text)

// With language hint
const spanish = await RunAnywhere.transcribe(spanishAudioBase64, {
  language: 'es',
})
console.log(spanish.text)

Transcribe Float32 Buffer

// Transcribe from Float32Array samples
const samples = new Float32Array(audioBuffer)
const result = await RunAnywhere.transcribeBuffer(samples, {
  language: 'en',
})
console.log(result.text)

React Native Recording + Transcription

Using the SDK’s built-in RunAnywhere.Audio API for recording:

VoiceRecorder.tsx

import React, { useState, useCallback } from 'react'
import { View, Button, Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import RNFS from 'react-native-fs'

export function VoiceRecorder() {
  const [isRecording, setIsRecording] = useState(false)
  const [transcription, setTranscription] = useState('')

  const startRecording = useCallback(async () => {
    setIsRecording(true)
    setTranscription('')
    await RunAnywhere.Audio.requestPermission()
    await RunAnywhere.Audio.startRecording()
  }, [])

  const stopRecording = useCallback(async () => {
    setIsRecording(false)
    const recording = await RunAnywhere.Audio.stopRecording()
    // recording.uri  - path to the recorded WAV file
    // recording.durationMs - duration in milliseconds

    // Read the file and transcribe
    try {
      const audioBase64 = await RNFS.readFile(recording.uri, 'base64')
      const result = await RunAnywhere.transcribe(audioBase64, {
        language: 'en',
      })
      setTranscription(result.text)
    } catch (error) {
      setTranscription('Error: ' + (error as Error).message)
    }
  }, [])

  return (
    <View style={{ padding: 16 }}>
      <Button
        title={isRecording ? 'Stop Recording' : 'Start Recording'}
        onPress={isRecording ? stopRecording : startRecording}
      />
      {transcription && <Text style={{ marginTop: 16 }}>{transcription}</Text>}
    </View>
  )
}

Transcribe Buffer from Microphone

// If you have raw audio samples (e.g., from a real-time stream)
const audioSamples = new Float32Array(rawSamples)

const result = await RunAnywhere.transcribeBuffer(audioSamples, {
  language: 'en',
})

console.log(result.text)

Supported Audio Formats

Format	Extension	Supported
WAV	`.wav`
MP3	`.mp3`
M4A	`.m4a`
FLAC	`.flac`
Raw PCM	-	(via buffer methods)

For best results, use 16kHz mono WAV files. The SDK automatically handles audio conversion, but native formats are faster.

Available Models

Model	Size	Languages	Quality	Speed
whisper-tiny	~75MB	English	Good	Very Fast
whisper-tiny.en	~75MB	English	Better	Very Fast
whisper-base	~150MB	Multi	Better	Fast
whisper-small	~500MB	Multi	Great	Medium

Error Handling

import { isSDKError, SDKErrorCode } from '@runanywhere/core'

try {
  const result = await RunAnywhere.transcribe(audioBase64)
} catch (error) {
  if (isSDKError(error)) {
    switch (error.code) {
      case SDKErrorCode.notInitialized:
        console.error('SDK not initialized')
        break
      case SDKErrorCode.modelNotLoaded:
        console.error('Load an STT model first')
        break
      case SDKErrorCode.sttFailed:
        console.error('Transcription failed:', error.message)
        break
    }
  }
}

STT Streaming

Real-time transcription

STT Options

Advanced configuration

VAD

Voice Activity Detection

Voice Agent

Complete voice pipeline

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

Overview

Basic Usage

Setup

API Reference

`transcribeFile`

`transcribe`

`transcribeBuffer`

STT Options

STT Result

Examples

Transcribe Base64 Audio

Transcribe Float32 Buffer

React Native Recording + Transcription

Transcribe Buffer from Microphone

Supported Audio Formats

Available Models

Error Handling

STT Streaming

STT Options

VAD

Voice Agent

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

​Overview

​Basic Usage

​Setup

​API Reference

​transcribeFile

​transcribe

​transcribeBuffer

​STT Options

​STT Result

​Examples

​Transcribe Base64 Audio

​Transcribe Float32 Buffer

​React Native Recording + Transcription

​Transcribe Buffer from Microphone

​Supported Audio Formats

​Available Models

​Error Handling

​Related

STT Streaming

STT Options

VAD

Voice Agent

Overview

Basic Usage

Setup

API Reference

`transcribeFile`

`transcribe`

`transcribeBuffer`

STT Options

STT Result

Examples

Transcribe Base64 Audio

Transcribe Float32 Buffer

React Native Recording + Transcription

Transcribe Buffer from Microphone

Supported Audio Formats

Available Models

Error Handling

Related