STT Options

Overview

The STT API provides various options to customize transcription behavior, from language selection to word-level timestamps.

Options Reference

interface STTOptions {
  /** Language code for transcription */
  language?: string

  /** Enable punctuation in output */
  punctuation?: boolean

  /** Enable speaker diarization (multi-speaker) */
  diarization?: boolean

  /** Enable word-level timestamps */
  wordTimestamps?: boolean

  /** Audio sample rate (default: 16000) */
  sampleRate?: number
}

Language Support

Specify the language code to improve accuracy:

// English
const english = await RunAnywhere.transcribe(audioBase64, { language: 'en' })

// Spanish
const spanish = await RunAnywhere.transcribe(audioBase64, { language: 'es' })

// French
const french = await RunAnywhere.transcribe(audioBase64, { language: 'fr' })

// Auto-detect (slower)
const auto = await RunAnywhere.transcribe(audioBase64)

Supported Languages

Code	Language	Code	Language
`en`	English	`ja`	Japanese
`es`	Spanish	`ko`	Korean
`fr`	French	`pt`	Portuguese
`de`	German	`ru`	Russian
`it`	Italian	`zh`	Chinese
`nl`	Dutch	`ar`	Arabic
`pl`	Polish	`hi`	Hindi

Language-specific models (e.g., whisper-tiny.en) only support that language but are more accurate and faster.

Punctuation

Add punctuation to transcription output:

// Without punctuation (default for some models)
const noPunct = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  punctuation: false,
})
// "hello how are you today"

// With punctuation
const withPunct = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  punctuation: true,
})
// "Hello, how are you today?"

Word Timestamps

Get timing information for each word:

const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  wordTimestamps: true,
})

console.log('Transcription:', result.text)

// Each segment contains word-level timing
for (const segment of result.segments) {
  console.log(`[${segment.startTime.toFixed(2)}s - ${segment.endTime.toFixed(2)}s] ${segment.text}`)
}

Use Cases

Subtitles/Captions: Sync text with video
Karaoke: Highlight words as they’re spoken
Search: Jump to specific moments in audio
Accessibility: Show words as they’re spoken

Example: Subtitle Generator

interface Subtitle {
  start: number
  end: number
  text: string
}

async function generateSubtitles(audioBase64: string): Promise<Subtitle[]> {
  const result = await RunAnywhere.transcribe(audioBase64, {
    language: 'en',
    wordTimestamps: true,
  })

  return result.segments.map((segment) => ({
    start: segment.startTime,
    end: segment.endTime,
    text: segment.text.trim(),
  }))
}

// Convert to SRT format
function toSRT(subtitles: Subtitle[]): string {
  return subtitles
    .map((sub, i) => {
      const start = formatSRTTime(sub.start)
      const end = formatSRTTime(sub.end)
      return `${i + 1}\n${start} --> ${end}\n${sub.text}\n`
    })
    .join('\n')
}

function formatSRTTime(seconds: number): string {
  const h = Math.floor(seconds / 3600)
  const m = Math.floor((seconds % 3600) / 60)
  const s = Math.floor(seconds % 60)
  const ms = Math.floor((seconds % 1) * 1000)
  return `${h.toString().padStart(2, '0')}:${m.toString().padStart(2, '0')}:${s.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`
}

Speaker Diarization

Identify different speakers in the audio:

const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  diarization: true,
})

// Segments include speaker IDs
for (const segment of result.segments) {
  console.log(`[Speaker ${segment.speakerId}]: ${segment.text}`)
}

Speaker diarization is computationally expensive and may not be available on all models. Check model documentation for support.

Sample Rate

Specify the audio sample rate if different from the default:

// Standard 16kHz (default, recommended for STT)
const standard = await RunAnywhere.transcribeBuffer(samples)

// Higher quality 44.1kHz (will be downsampled internally)
const highQuality = await RunAnywhere.transcribeBuffer(samples, {
  sampleRate: 44100,
})

transcribeBuffer() accepts a Float32Array of PCM audio samples. The sample rate defaults to 16000 Hz if not specified in options.

For best results, record audio at 16kHz mono. Higher sample rates will be downsampled, which adds processing overhead.

Model Loading Options

Configure model loading:

// Load STT model with specific type
await RunAnywhere.loadSTTModel(modelPath, 'whisper')

// Check if model is loaded
const isLoaded = await RunAnywhere.isSTTModelLoaded()

// Unload when done
await RunAnywhere.unloadSTTModel()

Combining Options

// Full-featured transcription
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  punctuation: true,
  wordTimestamps: true,
})

// Access all result data
console.log('Text:', result.text)
console.log('Language:', result.language)
console.log('Confidence:', result.confidence)
console.log('Duration:', result.duration, 'seconds')
console.log('Segments:', result.segments.length)

// Alternatives (if available)
if (result.alternatives.length > 0) {
  console.log('Alternatives:')
  result.alternatives.forEach((alt, i) => {
    console.log(`  ${i + 1}. ${alt.text} (confidence: ${alt.confidence})`)
  })
}

Performance vs Accuracy

Option	Impact on Speed	Impact on Accuracy
`language` specified	Faster	Better
`wordTimestamps`	Slower	Same
`diarization`	Much slower	Same
`punctuation`	Minimal	Same

For best performance, always specify the language option rather than relying on auto-detection.

Transcribe

Basic transcription

STT Streaming

Real-time transcription

VAD

Voice Activity Detection

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

Overview

Options Reference

Language Support

Supported Languages

Punctuation

Word Timestamps

Use Cases

Example: Subtitle Generator

Speaker Diarization

Sample Rate

Model Loading Options

Combining Options

Performance vs Accuracy

Transcribe

STT Streaming

VAD

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

​Overview

​Options Reference

​Language Support

​Supported Languages

​Punctuation

​Word Timestamps

​Use Cases

​Example: Subtitle Generator

​Speaker Diarization

​Sample Rate

​Model Loading Options

​Combining Options

​Performance vs Accuracy

​Related

Transcribe

STT Streaming

VAD

Overview

Options Reference

Language Support

Supported Languages

Punctuation

Word Timestamps

Use Cases

Example: Subtitle Generator

Speaker Diarization

Sample Rate

Model Loading Options

Combining Options

Performance vs Accuracy

Related