Overview
The STT API provides various options to customize transcription behavior, from language selection to word-level timestamps.
Options Reference
interface STTOptions {
/** Language code for transcription */
language ?: string
/** Enable punctuation in output */
punctuation ?: boolean
/** Enable speaker diarization (multi-speaker) */
diarization ?: boolean
/** Enable word-level timestamps */
wordTimestamps ?: boolean
/** Audio sample rate (default: 16000) */
sampleRate ?: number
}
Language Support
Specify the language code to improve accuracy:
// English
const english = await RunAnywhere . transcribeFile ( path , { language: 'en' })
// Spanish
const spanish = await RunAnywhere . transcribeFile ( path , { language: 'es' })
// French
const french = await RunAnywhere . transcribeFile ( path , { language: 'fr' })
// Auto-detect (slower)
const auto = await RunAnywhere . transcribeFile ( path )
Supported Languages
Code Language Code Language enEnglish jaJapanese esSpanish koKorean frFrench ptPortuguese deGerman ruRussian itItalian zhChinese nlDutch arArabic plPolish hiHindi
Language-specific models (e.g., whisper-tiny.en) only support that language but are more
accurate and faster.
Punctuation
Add punctuation to transcription output:
// Without punctuation (default for some models)
const noPunct = await RunAnywhere . transcribeFile ( path , {
language: 'en' ,
punctuation: false ,
})
// "hello how are you today"
// With punctuation
const withPunct = await RunAnywhere . transcribeFile ( path , {
language: 'en' ,
punctuation: true ,
})
// "Hello, how are you today?"
Word Timestamps
Get timing information for each word:
const result = await RunAnywhere . transcribeFile ( audioPath , {
language: 'en' ,
wordTimestamps: true ,
})
console . log ( 'Transcription:' , result . text )
// Each segment contains word-level timing
for ( const segment of result . segments ) {
console . log ( `[ ${ segment . startTime . toFixed ( 2 ) } s - ${ segment . endTime . toFixed ( 2 ) } s] ${ segment . text } ` )
}
Use Cases
Subtitles/Captions : Sync text with video
Karaoke : Highlight words as they’re spoken
Search : Jump to specific moments in audio
Accessibility : Show words as they’re spoken
Example: Subtitle Generator
interface Subtitle {
start : number
end : number
text : string
}
async function generateSubtitles ( audioPath : string ) : Promise < Subtitle []> {
const result = await RunAnywhere . transcribeFile ( audioPath , {
language: 'en' ,
wordTimestamps: true ,
})
return result . segments . map (( segment ) => ({
start: segment . startTime ,
end: segment . endTime ,
text: segment . text . trim (),
}))
}
// Convert to SRT format
function toSRT ( subtitles : Subtitle []) : string {
return subtitles
. map (( sub , i ) => {
const start = formatSRTTime ( sub . start )
const end = formatSRTTime ( sub . end )
return ` ${ i + 1 } \n ${ start } --> ${ end } \n ${ sub . text } \n `
})
. join ( ' \n ' )
}
function formatSRTTime ( seconds : number ) : string {
const h = Math . floor ( seconds / 3600 )
const m = Math . floor (( seconds % 3600 ) / 60 )
const s = Math . floor ( seconds % 60 )
const ms = Math . floor (( seconds % 1 ) * 1000 )
return ` ${ h . toString (). padStart ( 2 , '0' ) } : ${ m . toString (). padStart ( 2 , '0' ) } : ${ s . toString (). padStart ( 2 , '0' ) } , ${ ms . toString (). padStart ( 3 , '0' ) } `
}
Speaker Diarization
Identify different speakers in the audio:
const result = await RunAnywhere . transcribeFile ( audioPath , {
language: 'en' ,
diarization: true ,
})
// Segments include speaker IDs
for ( const segment of result . segments ) {
console . log ( `[Speaker ${ segment . speakerId } ]: ${ segment . text } ` )
}
Speaker diarization is computationally expensive and may not be available on all models. Check
model documentation for support.
Sample Rate
Specify the audio sample rate if different from the default:
// Standard 16kHz (recommended for STT)
const standard = await RunAnywhere . transcribeBuffer ( samples , 16000 )
// Higher quality 44.1kHz (will be downsampled internally)
const highQuality = await RunAnywhere . transcribeBuffer ( samples , 44100 , {
sampleRate: 44100 ,
})
For best results, record audio at 16kHz mono. Higher sample rates will be downsampled, which adds
processing overhead.
Model Loading Options
Configure model loading:
// Load STT model with specific type
await RunAnywhere . loadSTTModel ( modelPath , 'whisper' )
// Check if model is loaded
const isLoaded = await RunAnywhere . isSTTModelLoaded ()
// Unload when done
await RunAnywhere . unloadSTTModel ()
Combining Options
// Full-featured transcription
const result = await RunAnywhere . transcribeFile ( audioPath , {
language: 'en' ,
punctuation: true ,
wordTimestamps: true ,
})
// Access all result data
console . log ( 'Text:' , result . text )
console . log ( 'Language:' , result . language )
console . log ( 'Confidence:' , result . confidence )
console . log ( 'Duration:' , result . duration , 'seconds' )
console . log ( 'Segments:' , result . segments . length )
// Alternatives (if available)
if ( result . alternatives . length > 0 ) {
console . log ( 'Alternatives:' )
result . alternatives . forEach (( alt , i ) => {
console . log ( ` ${ i + 1 } . ${ alt . text } (confidence: ${ alt . confidence } )` )
})
}
Option Impact on Speed Impact on Accuracy language specified Faster BetterwordTimestamps Slower Samediarization Much slower Samepunctuation Minimal Same
For best performance, always specify the language option rather than relying on auto-detection.