Overview
Streaming TTS starts playing audio before the entire synthesis is complete. This is particularly useful for:- Long text passages
- Voice assistants responding in real-time
- Reducing perceived latency
Basic Concept
Chunked Synthesis
For manual control, synthesize text in chunks:With Voice Agent Pipeline
The Voice Agent provides the best streaming TTS experience:Latency Optimization Tips
Preload Voice Model
Preload Voice Model
Load the TTS voice during app startup or idle time, not when the user first needs it.
Use Faster Voices
Use Faster Voices
Smaller voice models synthesize faster. Choose based on your quality/speed tradeoff.
Sentence-Level Streaming
Sentence-Level Streaming
For very long responses, synthesize and play sentence by sentence rather than waiting for the
complete response.
Buffer Management
Buffer Management
Start playback as soon as you have enough audio buffered (typically 100-200ms).