> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runanywhere.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# STT Options

> Configure Speech-to-Text transcription

## Overview

The STT API provides various options to customize transcription behavior, from language selection to word-level timestamps.

## Options Reference

```typescript theme={null}
interface STTOptions {
  /** Language code for transcription */
  language?: string

  /** Enable punctuation in output */
  punctuation?: boolean

  /** Enable speaker diarization (multi-speaker) */
  diarization?: boolean

  /** Enable word-level timestamps */
  wordTimestamps?: boolean

  /** Audio sample rate (default: 16000) */
  sampleRate?: number
}
```

## Language Support

Specify the language code to improve accuracy:

```typescript theme={null}
// English
const english = await RunAnywhere.transcribe(audioBase64, { language: 'en' })

// Spanish
const spanish = await RunAnywhere.transcribe(audioBase64, { language: 'es' })

// French
const french = await RunAnywhere.transcribe(audioBase64, { language: 'fr' })

// Auto-detect (slower)
const auto = await RunAnywhere.transcribe(audioBase64)
```

### Supported Languages

| Code | Language | Code | Language   |
| ---- | -------- | ---- | ---------- |
| `en` | English  | `ja` | Japanese   |
| `es` | Spanish  | `ko` | Korean     |
| `fr` | French   | `pt` | Portuguese |
| `de` | German   | `ru` | Russian    |
| `it` | Italian  | `zh` | Chinese    |
| `nl` | Dutch    | `ar` | Arabic     |
| `pl` | Polish   | `hi` | Hindi      |

<Note>
  Language-specific models (e.g., `whisper-tiny.en`) only support that language but are more
  accurate and faster.
</Note>

## Punctuation

Add punctuation to transcription output:

```typescript theme={null}
// Without punctuation (default for some models)
const noPunct = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  punctuation: false,
})
// "hello how are you today"

// With punctuation
const withPunct = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  punctuation: true,
})
// "Hello, how are you today?"
```

## Word Timestamps

Get timing information for each word:

```typescript theme={null}
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  wordTimestamps: true,
})

console.log('Transcription:', result.text)

// Each segment contains word-level timing
for (const segment of result.segments) {
  console.log(`[${segment.startTime.toFixed(2)}s - ${segment.endTime.toFixed(2)}s] ${segment.text}`)
}
```

### Use Cases

* **Subtitles/Captions**: Sync text with video
* **Karaoke**: Highlight words as they're spoken
* **Search**: Jump to specific moments in audio
* **Accessibility**: Show words as they're spoken

### Example: Subtitle Generator

```typescript theme={null}
interface Subtitle {
  start: number
  end: number
  text: string
}

async function generateSubtitles(audioBase64: string): Promise<Subtitle[]> {
  const result = await RunAnywhere.transcribe(audioBase64, {
    language: 'en',
    wordTimestamps: true,
  })

  return result.segments.map((segment) => ({
    start: segment.startTime,
    end: segment.endTime,
    text: segment.text.trim(),
  }))
}

// Convert to SRT format
function toSRT(subtitles: Subtitle[]): string {
  return subtitles
    .map((sub, i) => {
      const start = formatSRTTime(sub.start)
      const end = formatSRTTime(sub.end)
      return `${i + 1}\n${start} --> ${end}\n${sub.text}\n`
    })
    .join('\n')
}

function formatSRTTime(seconds: number): string {
  const h = Math.floor(seconds / 3600)
  const m = Math.floor((seconds % 3600) / 60)
  const s = Math.floor(seconds % 60)
  const ms = Math.floor((seconds % 1) * 1000)
  return `${h.toString().padStart(2, '0')}:${m.toString().padStart(2, '0')}:${s.toString().padStart(2, '0')},${ms.toString().padStart(3, '0')}`
}
```

## Speaker Diarization

Identify different speakers in the audio:

```typescript theme={null}
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  diarization: true,
})

// Segments include speaker IDs
for (const segment of result.segments) {
  console.log(`[Speaker ${segment.speakerId}]: ${segment.text}`)
}
```

<Warning>
  Speaker diarization is computationally expensive and may not be available on all models. Check
  model documentation for support.
</Warning>

## Sample Rate

Specify the audio sample rate if different from the default:

```typescript theme={null}
// Standard 16kHz (default, recommended for STT)
const standard = await RunAnywhere.transcribeBuffer(samples)

// Higher quality 44.1kHz (will be downsampled internally)
const highQuality = await RunAnywhere.transcribeBuffer(samples, {
  sampleRate: 44100,
})
```

<Note>
  `transcribeBuffer()` accepts a `Float32Array` of PCM audio samples. The sample rate defaults to
  16000 Hz if not specified in options.
</Note>

<Tip>
  For best results, record audio at 16kHz mono. Higher sample rates will be downsampled, which adds
  processing overhead.
</Tip>

## Model Loading Options

Configure model loading:

```typescript theme={null}
// Load STT model with specific type
await RunAnywhere.loadSTTModel(modelPath, 'whisper')

// Check if model is loaded
const isLoaded = await RunAnywhere.isSTTModelLoaded()

// Unload when done
await RunAnywhere.unloadSTTModel()
```

## Combining Options

```typescript theme={null}
// Full-featured transcription
const result = await RunAnywhere.transcribe(audioBase64, {
  language: 'en',
  punctuation: true,
  wordTimestamps: true,
})

// Access all result data
console.log('Text:', result.text)
console.log('Language:', result.language)
console.log('Confidence:', result.confidence)
console.log('Duration:', result.duration, 'seconds')
console.log('Segments:', result.segments.length)

// Alternatives (if available)
if (result.alternatives.length > 0) {
  console.log('Alternatives:')
  result.alternatives.forEach((alt, i) => {
    console.log(`  ${i + 1}. ${alt.text} (confidence: ${alt.confidence})`)
  })
}
```

## Performance vs Accuracy

| Option               | Impact on Speed                                          | Impact on Accuracy                                  |
| -------------------- | -------------------------------------------------------- | --------------------------------------------------- |
| `language` specified | <Icon icon="circle-check" color="#22c55e" /> Faster      | <Icon icon="circle-check" color="#22c55e" /> Better |
| `wordTimestamps`     | <Icon icon="circle-xmark" color="#ef4444" /> Slower      | <Icon icon="minus" color="#6b7280" /> Same          |
| `diarization`        | <Icon icon="circle-xmark" color="#ef4444" /> Much slower | <Icon icon="minus" color="#6b7280" /> Same          |
| `punctuation`        | <Icon icon="minus" color="#6b7280" /> Minimal            | <Icon icon="minus" color="#6b7280" /> Same          |

<Tip>
  For best performance, always specify the `language` option rather than relying on auto-detection.
</Tip>

## Related

<CardGroup cols={2}>
  <Card title="Transcribe" icon="file-audio" href="/react-native/stt/transcribe">
    Basic transcription
  </Card>

  <Card title="STT Streaming" icon="microphone" href="/react-native/stt/stream">
    Real-time transcription
  </Card>

  <Card title="VAD" icon="waveform-lines" href="/react-native/vad">
    Voice Activity Detection
  </Card>
</CardGroup>
