Overview
The Text-to-Speech (TTS) API converts text to spoken audio using on-device neural voice synthesis with Piper TTS. All synthesis happens locally on the device.Basic Usage
Copy
Ask AI
import { RunAnywhere } from '@runanywhere/core'
// Synthesize speech
const result = await RunAnywhere.synthesize('Hello, welcome to the RunAnywhere SDK.', {
rate: 1.0,
pitch: 1.0,
volume: 1.0,
})
console.log('Duration:', result.duration, 'seconds')
console.log('Sample rate:', result.sampleRate)
// result.audio contains base64-encoded float32 PCM
Setup
Before synthesizing, download and load a TTS model:Copy
Ask AI
import { RunAnywhere, ModelCategory } from '@runanywhere/core'
import { ONNX, ModelArtifactType } from '@runanywhere/onnx'
// 1. Initialize SDK and register ONNX backend
await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
ONNX.register()
// 2. Add TTS model
await ONNX.addModel({
id: 'piper-en-lessac',
name: 'Piper English (Lessac)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',
modality: ModelCategory.SpeechSynthesis,
artifactType: ModelArtifactType.TarGzArchive,
memoryRequirement: 65_000_000,
})
// 3. Download model
await RunAnywhere.downloadModel('piper-en-lessac', (progress) => {
console.log(`Download: ${(progress.progress * 100).toFixed(1)}%`)
})
// 4. Load model
const modelInfo = await RunAnywhere.getModelInfo('piper-en-lessac')
await RunAnywhere.loadTTSModel(modelInfo.localPath, 'piper')
API Reference
synthesize
Convert text to audio data.
Copy
Ask AI
await RunAnywhere.synthesize(
text: string,
options?: TTSConfiguration
): Promise<TTSResult>
| Parameter | Type | Description |
|---|---|---|
text | string | Text to synthesize |
options | TTSConfiguration | Optional voice settings |
Configuration
Copy
Ask AI
interface TTSConfiguration {
/** Voice identifier */
voice?: string
/** Speech rate (0.5 - 2.0, default: 1.0) */
rate?: number
/** Pitch adjustment (0.5 - 2.0, default: 1.0) */
pitch?: number
/** Volume (0.0 - 1.0, default: 1.0) */
volume?: number
}
Result
Copy
Ask AI
interface TTSResult {
/** Base64-encoded audio (float32 PCM) */
audio: string
/** Audio sample rate in Hz */
sampleRate: number
/** Number of audio samples */
numSamples: number
/** Audio duration in seconds */
duration: number
}
Examples
Basic Synthesis
Copy
Ask AI
const result = await RunAnywhere.synthesize('Hello, world!')
console.log('Audio duration:', result.duration, 'seconds')
console.log('Sample rate:', result.sampleRate, 'Hz')
console.log('Samples:', result.numSamples)
With Voice Options
Copy
Ask AI
// Slower and lower pitch
const slow = await RunAnywhere.synthesize('This is spoken slowly with a lower pitch.', {
rate: 0.75,
pitch: 0.8,
volume: 1.0,
})
// Faster and higher pitch
const fast = await RunAnywhere.synthesize('This is spoken quickly with a higher pitch!', {
rate: 1.5,
pitch: 1.2,
volume: 1.0,
})
Play Audio
TTSPlayer.tsx
Copy
Ask AI
import React, { useState, useCallback } from 'react'
import { View, Button, TextInput, Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import Sound from 'react-native-sound' // Example audio playback library
export function TTSPlayer() {
const [text, setText] = useState('')
const [isPlaying, setIsPlaying] = useState(false)
const [duration, setDuration] = useState<number | null>(null)
const handleSpeak = useCallback(async () => {
if (!text.trim()) return
setIsPlaying(true)
try {
const result = await RunAnywhere.synthesize(text, {
rate: 1.0,
pitch: 1.0,
})
setDuration(result.duration)
// Convert base64 to audio and play
const audioBuffer = base64ToArrayBuffer(result.audio)
const sound = new Sound(audioBuffer, '', (error) => {
if (error) {
console.error('Failed to load sound', error)
setIsPlaying(false)
return
}
sound.play(() => {
setIsPlaying(false)
sound.release()
})
})
} catch (error) {
console.error('Synthesis failed:', error)
setIsPlaying(false)
}
}, [text])
return (
<View style={{ padding: 16 }}>
<TextInput
value={text}
onChangeText={setText}
placeholder="Enter text to speak..."
multiline
style={{ borderWidth: 1, padding: 12, minHeight: 80 }}
/>
<Button
title={isPlaying ? 'Speaking...' : 'Speak'}
onPress={handleSpeak}
disabled={isPlaying || !text.trim()}
/>
{duration && (
<Text style={{ marginTop: 8, color: '#666' }}>Duration: {duration.toFixed(2)}s</Text>
)}
</View>
)
}
Using System TTS
For simpler playback using platform’s built-in TTS:Copy
Ask AI
// Use system TTS (AVSpeechSynthesizer on iOS, Android TTS)
await RunAnywhere.speak('Hello from system TTS!', {
rate: 1.0,
pitch: 1.0,
volume: 1.0,
})
// Check if currently speaking
const speaking = await RunAnywhere.isSpeaking()
// Stop playback
await RunAnywhere.stopSpeaking()
Get Available Voices
Copy
Ask AI
const voices = await RunAnywhere.availableTTSVoices()
for (const voice of voices) {
console.log(`${voice.id}: ${voice.name} (${voice.language})`)
}
// Use a specific voice
await RunAnywhere.synthesize('Hello!', {
voice: 'en-US-female-1',
})
Converting Audio for Playback
The synthesized audio is base64-encoded float32 PCM. Here’s how to convert it:Copy
Ask AI
// Convert base64 audio to playable format
function convertTTSAudio(base64Audio: string, sampleRate: number): AudioBuffer {
// Decode base64 to binary
const binary = atob(base64Audio)
const bytes = new Uint8Array(binary.length)
for (let i = 0; i < binary.length; i++) {
bytes[i] = binary.charCodeAt(i)
}
// Convert to float32 array
const float32 = new Float32Array(bytes.buffer)
// Create AudioBuffer (Web Audio API)
const audioContext = new AudioContext()
const audioBuffer = audioContext.createBuffer(1, float32.length, sampleRate)
audioBuffer.getChannelData(0).set(float32)
return audioBuffer
}
// Play with Web Audio
async function playAudio(result: TTSResult) {
const audioContext = new AudioContext()
const audioBuffer = convertTTSAudio(result.audio, result.sampleRate)
const source = audioContext.createBufferSource()
source.buffer = audioBuffer
source.connect(audioContext.destination)
source.start()
}
Voice Options Explained
| Option | Range | Default | Effect |
|---|---|---|---|
rate | 0.5 - 2.0 | 1.0 | Speech speed (1.0 = normal) |
pitch | 0.5 - 2.0 | 1.0 | Voice pitch (1.0 = normal) |
volume | 0.0 - 1.0 | 1.0 | Audio volume (1.0 = full) |
Copy
Ask AI
// Different presets
const presets = {
normal: { rate: 1.0, pitch: 1.0, volume: 1.0 },
slow: { rate: 0.7, pitch: 1.0, volume: 1.0 },
fast: { rate: 1.4, pitch: 1.0, volume: 1.0 },
deep: { rate: 1.0, pitch: 0.7, volume: 1.0 },
high: { rate: 1.0, pitch: 1.3, volume: 1.0 },
quiet: { rate: 1.0, pitch: 1.0, volume: 0.5 },
}
const result = await RunAnywhere.synthesize(text, presets.slow)
Error Handling
Copy
Ask AI
import { isSDKError, SDKErrorCode } from '@runanywhere/core'
try {
const result = await RunAnywhere.synthesize(text)
} catch (error) {
if (isSDKError(error)) {
switch (error.code) {
case SDKErrorCode.notInitialized:
console.error('SDK not initialized')
break
case SDKErrorCode.modelNotLoaded:
console.error('Load a TTS model first')
break
case SDKErrorCode.ttsFailed:
console.error('Synthesis failed:', error.message)
break
}
}
}