Skip to main content
Stream audio for real-time transcription as the user speaks. Ideal for voice assistants and live captioning.

Overview

Streaming STT processes audio in chunks, providing partial transcriptions that update as more audio arrives. This creates a responsive experience where users see their words appear in real-time.

Basic Usage

// Ensure STT model is loaded
await RunAnywhere.loadSTTModel('sherpa-onnx-whisper-tiny.en');

// Create audio stream from microphone
final audioStream = microphoneManager.audioStream;

// Process audio chunks
await for (final audioChunk in audioStream) {
  // Feed audio to STT (returns partial transcription)
  final partial = await RunAnywhere.transcribeChunk(audioChunk);
  print('Partial: $partial');
}

With Voice Session

The easiest way to use streaming STT is through the Voice Agent, which handles VAD and audio capture automatically:
final session = await RunAnywhere.startVoiceSession();

session.events.listen((event) {
  if (event is VoiceSessionTranscribed) {
    print('Transcription: ${event.text}');
  }
});
See Voice Agent for the complete voice pipeline.

Real-Time Transcription Widget

class LiveTranscriptionWidget extends StatefulWidget {
  @override
  _LiveTranscriptionWidgetState createState() => _LiveTranscriptionWidgetState();
}

class _LiveTranscriptionWidgetState extends State<LiveTranscriptionWidget> {
  final _recorder = AudioRecorder();
  String _partialText = '';
  String _finalText = '';
  bool _isListening = false;
  StreamSubscription? _audioSubscription;

  Future<void> _startListening() async {
    if (!await _recorder.hasPermission()) return;

    setState(() {
      _isListening = true;
      _partialText = '';
    });

    // Start recording with streaming
    final stream = await _recorder.startStream(
      const RecordConfig(
        encoder: AudioEncoder.pcm16bits,
        sampleRate: 16000,
        numChannels: 1,
      ),
    );

    // Process audio chunks
    _audioSubscription = stream.listen((chunk) async {
      // This would use your streaming transcription implementation
      // The exact API depends on how you've set up streaming
    });
  }

  Future<void> _stopListening() async {
    await _audioSubscription?.cancel();
    await _recorder.stop();

    setState(() {
      _isListening = false;
      _finalText = _partialText;
      _partialText = '';
    });
  }

  @override
  Widget build(BuildContext context) {
    return Column(
      children: [
        // Show partial transcription with typing indicator
        Container(
          padding: EdgeInsets.all(16),
          child: Text(
            _isListening ? '$_partialText|' : _finalText,
            style: TextStyle(
              fontSize: 18,
              color: _isListening ? Colors.grey : Colors.black,
            ),
          ),
        ),

        // Recording button
        IconButton(
          icon: Icon(_isListening ? Icons.stop : Icons.mic),
          iconSize: 48,
          color: _isListening ? Colors.red : Colors.blue,
          onPressed: _isListening ? _stopListening : _startListening,
        ),
      ],
    );
  }

  @override
  void dispose() {
    _audioSubscription?.cancel();
    _recorder.dispose();
    super.dispose();
  }
}

Tips for Streaming STT

Accumulate audio in buffers of 100-500ms for optimal accuracy vs latency tradeoff.
Use VAD (Voice Activity Detection) to detect end of speech and finalize transcriptions.
Handle network interruptions and audio glitches gracefully. Consider retrying failed chunks.
Show a visual indicator (waveform, pulsing dot) to confirm audio is being captured.

See Also