Skip to main content

Overview

The Text-to-Speech (TTS) API converts text to spoken audio using on-device neural voice synthesis with Piper TTS. All synthesis happens locally on the device.

Basic Usage

import { RunAnywhere } from '@runanywhere/core'

// Synthesize speech
const result = await RunAnywhere.synthesize('Hello, welcome to the RunAnywhere SDK.', {
  rate: 1.0,
  pitch: 1.0,
  volume: 1.0,
})

console.log('Duration:', result.duration, 'seconds')
console.log('Sample rate:', result.sampleRate)
// result.audio contains base64-encoded float32 PCM

Setup

Before synthesizing, download and load a TTS model:
import { RunAnywhere, ModelCategory } from '@runanywhere/core'
import { ONNX, ModelArtifactType } from '@runanywhere/onnx'

// 1. Initialize SDK and register ONNX backend
await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
ONNX.register()

// 2. Add TTS model
await ONNX.addModel({
  id: 'piper-en-lessac',
  name: 'Piper English (Lessac)',
  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz',
  modality: ModelCategory.SpeechSynthesis,
  artifactType: ModelArtifactType.TarGzArchive,
  memoryRequirement: 65_000_000,
})

// 3. Download model
await RunAnywhere.downloadModel('piper-en-lessac', (progress) => {
  console.log(`Download: ${(progress.progress * 100).toFixed(1)}%`)
})

// 4. Load model
const modelInfo = await RunAnywhere.getModelInfo('piper-en-lessac')
await RunAnywhere.loadTTSModel(modelInfo.localPath, 'piper')

API Reference

synthesize

Convert text to audio data.
await RunAnywhere.synthesize(
  text: string,
  options?: TTSConfiguration
): Promise<TTSResult>
Parameters:
ParameterTypeDescription
textstringText to synthesize
optionsTTSConfigurationOptional voice settings

Configuration

interface TTSConfiguration {
  /** Voice identifier */
  voice?: string

  /** Speech rate (0.5 - 2.0, default: 1.0) */
  rate?: number

  /** Pitch adjustment (0.5 - 2.0, default: 1.0) */
  pitch?: number

  /** Volume (0.0 - 1.0, default: 1.0) */
  volume?: number
}

Result

interface TTSResult {
  /** Base64-encoded audio (float32 PCM) */
  audio: string

  /** Audio sample rate in Hz */
  sampleRate: number

  /** Number of audio samples */
  numSamples: number

  /** Audio duration in seconds */
  duration: number
}

Examples

Basic Synthesis

const result = await RunAnywhere.synthesize('Hello, world!')

console.log('Audio duration:', result.duration, 'seconds')
console.log('Sample rate:', result.sampleRate, 'Hz')
console.log('Samples:', result.numSamples)

With Voice Options

// Slower and lower pitch
const slow = await RunAnywhere.synthesize('This is spoken slowly with a lower pitch.', {
  rate: 0.75,
  pitch: 0.8,
  volume: 1.0,
})

// Faster and higher pitch
const fast = await RunAnywhere.synthesize('This is spoken quickly with a higher pitch!', {
  rate: 1.5,
  pitch: 1.2,
  volume: 1.0,
})

Play Audio

TTSPlayer.tsx
import React, { useState, useCallback } from 'react'
import { View, Button, TextInput, Text } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
import Sound from 'react-native-sound' // Example audio playback library

export function TTSPlayer() {
  const [text, setText] = useState('')
  const [isPlaying, setIsPlaying] = useState(false)
  const [duration, setDuration] = useState<number | null>(null)

  const handleSpeak = useCallback(async () => {
    if (!text.trim()) return

    setIsPlaying(true)
    try {
      const result = await RunAnywhere.synthesize(text, {
        rate: 1.0,
        pitch: 1.0,
      })

      setDuration(result.duration)

      // Convert base64 to audio and play
      const audioBuffer = base64ToArrayBuffer(result.audio)
      const sound = new Sound(audioBuffer, '', (error) => {
        if (error) {
          console.error('Failed to load sound', error)
          setIsPlaying(false)
          return
        }
        sound.play(() => {
          setIsPlaying(false)
          sound.release()
        })
      })
    } catch (error) {
      console.error('Synthesis failed:', error)
      setIsPlaying(false)
    }
  }, [text])

  return (
    <View style={{ padding: 16 }}>
      <TextInput
        value={text}
        onChangeText={setText}
        placeholder="Enter text to speak..."
        multiline
        style={{ borderWidth: 1, padding: 12, minHeight: 80 }}
      />
      <Button
        title={isPlaying ? 'Speaking...' : 'Speak'}
        onPress={handleSpeak}
        disabled={isPlaying || !text.trim()}
      />
      {duration && (
        <Text style={{ marginTop: 8, color: '#666' }}>Duration: {duration.toFixed(2)}s</Text>
      )}
    </View>
  )
}

Using System TTS

For simpler playback using platform’s built-in TTS:
// Use system TTS (AVSpeechSynthesizer on iOS, Android TTS)
await RunAnywhere.speak('Hello from system TTS!', {
  rate: 1.0,
  pitch: 1.0,
  volume: 1.0,
})

// Check if currently speaking
const speaking = await RunAnywhere.isSpeaking()

// Stop playback
await RunAnywhere.stopSpeaking()

Get Available Voices

const voices = await RunAnywhere.availableTTSVoices()

for (const voice of voices) {
  console.log(`${voice.id}: ${voice.name} (${voice.language})`)
}

// Use a specific voice
await RunAnywhere.synthesize('Hello!', {
  voice: 'en-US-female-1',
})

Converting Audio for Playback

The synthesized audio is base64-encoded float32 PCM. Here’s how to convert it:
// Convert base64 audio to playable format
function convertTTSAudio(base64Audio: string, sampleRate: number): AudioBuffer {
  // Decode base64 to binary
  const binary = atob(base64Audio)
  const bytes = new Uint8Array(binary.length)
  for (let i = 0; i < binary.length; i++) {
    bytes[i] = binary.charCodeAt(i)
  }

  // Convert to float32 array
  const float32 = new Float32Array(bytes.buffer)

  // Create AudioBuffer (Web Audio API)
  const audioContext = new AudioContext()
  const audioBuffer = audioContext.createBuffer(1, float32.length, sampleRate)
  audioBuffer.getChannelData(0).set(float32)

  return audioBuffer
}

// Play with Web Audio
async function playAudio(result: TTSResult) {
  const audioContext = new AudioContext()
  const audioBuffer = convertTTSAudio(result.audio, result.sampleRate)

  const source = audioContext.createBufferSource()
  source.buffer = audioBuffer
  source.connect(audioContext.destination)
  source.start()
}

Voice Options Explained

OptionRangeDefaultEffect
rate0.5 - 2.01.0Speech speed (1.0 = normal)
pitch0.5 - 2.01.0Voice pitch (1.0 = normal)
volume0.0 - 1.01.0Audio volume (1.0 = full)
// Different presets
const presets = {
  normal: { rate: 1.0, pitch: 1.0, volume: 1.0 },
  slow: { rate: 0.7, pitch: 1.0, volume: 1.0 },
  fast: { rate: 1.4, pitch: 1.0, volume: 1.0 },
  deep: { rate: 1.0, pitch: 0.7, volume: 1.0 },
  high: { rate: 1.0, pitch: 1.3, volume: 1.0 },
  quiet: { rate: 1.0, pitch: 1.0, volume: 0.5 },
}

const result = await RunAnywhere.synthesize(text, presets.slow)

Error Handling

import { isSDKError, SDKErrorCode } from '@runanywhere/core'

try {
  const result = await RunAnywhere.synthesize(text)
} catch (error) {
  if (isSDKError(error)) {
    switch (error.code) {
      case SDKErrorCode.notInitialized:
        console.error('SDK not initialized')
        break
      case SDKErrorCode.modelNotLoaded:
        console.error('Load a TTS model first')
        break
      case SDKErrorCode.ttsFailed:
        console.error('Synthesis failed:', error.message)
        break
    }
  }
}