Skip to main content
Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

Streaming STT provides real-time transcription as audio is being captured, without waiting for the full recording to complete. This enables live captioning, real-time voice interfaces, and interactive dictation.

Basic Usage

import { STT } from '@runanywhere/web'

// Create a streaming session
const session = STT.createStreamingSession()

// Feed audio chunks as they arrive from the microphone
function onAudioChunk(samples: Float32Array) {
  session.acceptWaveform(samples)

  // Check for partial results
  const result = session.getResult()
  if (result.text) {
    console.log('Partial:', result.text)
  }
}

// When done speaking
session.inputFinished()
const finalResult = session.getResult()
console.log('Final:', finalResult.text)

// Clean up
session.destroy()

API Reference

STT.createStreamingSession

Create a new streaming transcription session.
STT.createStreamingSession(options?: STTTranscribeOptions): STTStreamingSession

STTStreamingSession

interface STTStreamingSession {
  /** Feed audio samples into the session */
  acceptWaveform(samples: Float32Array, sampleRate?: number): void

  /** Signal that no more audio will be provided */
  inputFinished(): void

  /** Get the current transcription result */
  getResult(): { text: string; isEndpoint: boolean }

  /** Reset the session for a new utterance */
  reset(): void

  /** Release all resources */
  destroy(): void
}

Examples

Live Microphone Transcription

import { STT, AudioCapture } from '@runanywhere/web'

const capture = new AudioCapture()
const session = STT.createStreamingSession()

// Feed microphone audio into the streaming session
capture.onAudioChunk((samples) => {
  session.acceptWaveform(samples, 16000)

  const result = session.getResult()
  if (result.text) {
    document.getElementById('transcript').textContent = result.text
  }

  if (result.isEndpoint) {
    console.log('Endpoint detected:', result.text)
    session.reset() // Ready for next utterance
  }
})

// Start capturing
await capture.start({ sampleRate: 16000 })

// Stop when done
// capture.stop()
// session.destroy()

React Component

LiveTranscription.tsx
import { useState, useCallback, useRef, useEffect } from 'react'
import { STT, AudioCapture, STTStreamingSession } from '@runanywhere/web'

export function LiveTranscription() {
  const [transcript, setTranscript] = useState('')
  const [isListening, setIsListening] = useState(false)
  const captureRef = useRef<AudioCapture | null>(null)
  const sessionRef = useRef<STTStreamingSession | null>(null)

  const startListening = useCallback(async () => {
    const capture = new AudioCapture()
    const session = STT.createStreamingSession()
    captureRef.current = capture
    sessionRef.current = session

    capture.onAudioChunk((samples) => {
      session.acceptWaveform(samples, 16000)
      const result = session.getResult()
      if (result.text) {
        setTranscript(result.text)
      }
    })

    await capture.start({ sampleRate: 16000 })
    setIsListening(true)
  }, [])

  const stopListening = useCallback(() => {
    captureRef.current?.stop()
    sessionRef.current?.inputFinished()

    const finalResult = sessionRef.current?.getResult()
    if (finalResult?.text) {
      setTranscript(finalResult.text)
    }

    sessionRef.current?.destroy()
    setIsListening(false)
  }, [])

  return (
    <div>
      <button onClick={isListening ? stopListening : startListening}>
        {isListening ? 'Stop' : 'Start Listening'}
      </button>
      <p>{transcript || 'Speak to see transcription...'}</p>
    </div>
  )
}

Session Lifecycle

1

Create session

Call STT.createStreamingSession() to create a new session.
2

Feed audio

Call acceptWaveform() with each audio chunk from the microphone.
3

Read results

Call getResult() to get partial transcription at any time.
4

Reset or finish

Call reset() to start a new utterance, or inputFinished() when done.
5

Clean up

Call destroy() to release all resources.