Streaming TTS

Stream TTS audio as it’s generated for faster time-to-first-audio, especially with longer text.

Overview

Streaming TTS starts playing audio before the entire synthesis is complete. This is particularly useful for:

Long text passages
Voice assistants responding in real-time
Reducing perceived latency

Basic Concept

// The Voice Agent pipeline handles streaming TTS automatically
final session = await RunAnywhere.startVoiceSession(
  config: VoiceSessionConfig(
    autoPlayTTS: true,  // Automatically plays synthesized audio
  ),
);

session.events.listen((event) {
  if (event is VoiceSessionSpeaking) {
    print('Playing audio response...');
  }
});

Chunked Synthesis

For manual control, synthesize text in chunks:

Future<void> speakInChunks(String longText) async {
  // Split into sentences
  final sentences = longText.split(RegExp(r'(?<=[.!?])\s+'));

  for (final sentence in sentences) {
    final result = await RunAnywhere.synthesize(sentence);
    await playAudio(result);
  }
}

With Voice Agent Pipeline

The Voice Agent provides the best streaming TTS experience:

// Initialize all components
await RunAnywhere.loadSTTModel('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.loadModel('smollm2-360m-q8_0');
await RunAnywhere.loadTTSVoice('vits-piper-en_US-lessac-medium');

// Start session with auto-play
final session = await RunAnywhere.startVoiceSession(
  config: VoiceSessionConfig(
    autoPlayTTS: true,
    continuousMode: true,
  ),
);

// The pipeline automatically:
// 1. Detects speech (VAD)
// 2. Transcribes audio (STT)
// 3. Generates response (LLM)
// 4. Synthesizes and plays audio (TTS)
session.events.listen((event) {
  switch (event) {
    case VoiceSessionTranscribed(:final text):
      print('User: $text');
    case VoiceSessionResponded(:final text):
      print('AI: $text');
    case VoiceSessionSpeaking():
      print('Playing response...');
    case VoiceSessionTurnCompleted():
      print('Ready for next turn');
    default:
      break;
  }
});

Latency Optimization Tips

Preload Voice Model

Load the TTS voice during app startup or idle time, not when the user first needs it.

// In app initialization
await RunAnywhere.loadTTSVoice('vits-piper-en_US-lessac-medium');

Use Faster Voices

Smaller voice models synthesize faster. Choose based on your quality/speed tradeoff.

Sentence-Level Streaming

For very long responses, synthesize and play sentence by sentence rather than waiting for the complete response.

Buffer Management

Start playback as soon as you have enough audio buffered (typically 100-200ms).

synthesize()

Basic synthesis

Voice Agent

Complete voice pipeline

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

Overview

Basic Concept

Chunked Synthesis

With Voice Agent Pipeline

Latency Optimization Tips

See Also

synthesize()

Voice Agent

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

​Overview

​Basic Concept

​Chunked Synthesis

​With Voice Agent Pipeline

​Latency Optimization Tips

​See Also

synthesize()

Voice Agent

Overview

Basic Concept

Chunked Synthesis

With Voice Agent Pipeline

Latency Optimization Tips

See Also