Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
Streaming STT provides real-time transcription as audio is being captured, without waiting for the full recording to complete. This enables live captioning, real-time voice interfaces, and interactive dictation.
Basic Usage
import { STT } from '@runanywhere/web'
// Create a streaming session
const session = STT.createStreamingSession()
// Feed audio chunks as they arrive from the microphone
function onAudioChunk(samples: Float32Array) {
session.acceptWaveform(samples)
// Check for partial results
const result = session.getResult()
if (result.text) {
console.log('Partial:', result.text)
}
}
// When done speaking
session.inputFinished()
const finalResult = session.getResult()
console.log('Final:', finalResult.text)
// Clean up
session.destroy()
API Reference
STT.createStreamingSession
Create a new streaming transcription session.
STT.createStreamingSession(options?: STTTranscribeOptions): STTStreamingSession
STTStreamingSession
interface STTStreamingSession {
/** Feed audio samples into the session */
acceptWaveform(samples: Float32Array, sampleRate?: number): void
/** Signal that no more audio will be provided */
inputFinished(): void
/** Get the current transcription result */
getResult(): { text: string; isEndpoint: boolean }
/** Reset the session for a new utterance */
reset(): void
/** Release all resources */
destroy(): void
}
Examples
Live Microphone Transcription
import { STT, AudioCapture } from '@runanywhere/web'
const capture = new AudioCapture()
const session = STT.createStreamingSession()
// Feed microphone audio into the streaming session
capture.onAudioChunk((samples) => {
session.acceptWaveform(samples, 16000)
const result = session.getResult()
if (result.text) {
document.getElementById('transcript').textContent = result.text
}
if (result.isEndpoint) {
console.log('Endpoint detected:', result.text)
session.reset() // Ready for next utterance
}
})
// Start capturing
await capture.start({ sampleRate: 16000 })
// Stop when done
// capture.stop()
// session.destroy()
React Component
import { useState, useCallback, useRef, useEffect } from 'react'
import { STT, AudioCapture, STTStreamingSession } from '@runanywhere/web'
export function LiveTranscription() {
const [transcript, setTranscript] = useState('')
const [isListening, setIsListening] = useState(false)
const captureRef = useRef<AudioCapture | null>(null)
const sessionRef = useRef<STTStreamingSession | null>(null)
const startListening = useCallback(async () => {
const capture = new AudioCapture()
const session = STT.createStreamingSession()
captureRef.current = capture
sessionRef.current = session
capture.onAudioChunk((samples) => {
session.acceptWaveform(samples, 16000)
const result = session.getResult()
if (result.text) {
setTranscript(result.text)
}
})
await capture.start({ sampleRate: 16000 })
setIsListening(true)
}, [])
const stopListening = useCallback(() => {
captureRef.current?.stop()
sessionRef.current?.inputFinished()
const finalResult = sessionRef.current?.getResult()
if (finalResult?.text) {
setTranscript(finalResult.text)
}
sessionRef.current?.destroy()
setIsListening(false)
}, [])
return (
<div>
<button onClick={isListening ? stopListening : startListening}>
{isListening ? 'Stop' : 'Start Listening'}
</button>
<p>{transcript || 'Speak to see transcription...'}</p>
</div>
)
}
Session Lifecycle
Create session
Call STT.createStreamingSession() to create a new session.
Feed audio
Call acceptWaveform() with each audio chunk from the microphone.
Read results
Call getResult() to get partial transcription at any time.
Reset or finish
Call reset() to start a new utterance, or inputFinished() when done.
Clean up
Call destroy() to release all resources.