Introduction

Overview

The RunAnywhere Swift SDK is a production-grade, on-device AI SDK for Apple platforms. It enables developers to run AI models directly on Apple devices without requiring network connectivity for inference, ensuring minimal latency and maximum privacy for your users.The SDK provides a unified interface to multiple AI capabilities:

LLM

Text generation with streaming support and structured output

STT

Speech-to-text transcription with multiple backends

TTS

Neural and system voice synthesis

VLM

Vision language models for image understanding

Tool Calling

Function calling with typed tool definitions

Image Generation

On-device image generation with Stable Diffusion

VAD

Real-time voice activity detection

Key Capabilities

Multi-backend architecture – Choose from LlamaCPP (GGUF models), ONNX Runtime, or Apple Foundation Models
Metal acceleration – GPU-accelerated inference on Apple Silicon
Event-driven design – Subscribe to SDK events for reactive UI updates
Production-ready – Built-in analytics, logging, device registration, and model lifecycle management

Core Philosophy

On-Device First

All AI inference runs locally, ensuring low latency and data privacy. Once models are downloaded, no network connection is required for inference.

Plugin Architecture

Backend engines are optional modules—include only what you need. This keeps your app binary size minimal.

Privacy by Design

Audio and text data never leaves the device unless explicitly configured. Only anonymous analytics are collected by default.

Event-Driven

Subscribe to SDK events for reactive UI updates and observability. Track generation progress, model loading, and errors in real-time.

Features

Language Models (LLM)

On-device text generation with streaming support
Structured output generation with Generatable protocol
System prompts and customizable generation parameters
Support for thinking/reasoning models with token extraction
Multiple framework backends (LlamaCPP, Apple Foundation Models)

Speech-to-Text (STT)

Real-time streaming transcription
Batch audio transcription
Multi-language support
Whisper-based models via ONNX Runtime

Text-to-Speech (TTS)

Neural voice synthesis with ONNX models
System voices via AVSpeechSynthesizer
Streaming audio generation for long text
Customizable voice, pitch, rate, and volume

Voice Activity Detection (VAD)

Energy-based speech detection
Configurable sensitivity thresholds
Real-time audio stream processing

Vision Language Models (VLM)

Multimodal image + text inference
Platform-conditional image constructors (iOS UIImage, macOS raw pixels)
Multi-file model registration with GGUF files
Streaming token output with cancellation support

Tool Calling

Register typed tool definitions with parameter schemas
Automatic tool execution loop with configurable limits
Multi-tool chaining for complex agent workflows
ToolValue types with string and number support

Image Generation (Diffusion)

On-device Stable Diffusion via Apple CoreML
Progress tracking with step-by-step callbacks
Cancellation support for long-running generation
Configurable guidance scale, steps, and dimensions

Voice Agent Pipeline

Full VAD → STT → LLM → TTS orchestration
Complete voice conversation flow
Streaming and batch processing modes

Model Management

Automatic model discovery and catalog sync
Download with progress tracking (download, extract, validate stages)
In-memory model storage with file system caching
Framework-specific model assignment

System Requirements

Platform	Minimum Version
iOS	17.0+
macOS	14.0+
tvOS	17.0+
watchOS	10.0+

Swift Version: 5.9+Xcode: 15.2+

Some optional modules have higher runtime requirements: - Apple Foundation Models (RunAnywhereAppleAI): iOS 26+ / macOS 26+ at runtime

SDK Modules

Module	Purpose
`RunAnywhere`	Core SDK (required)
`RunAnywhereLlamaCPP`	LLM text generation with GGUF models
`RunAnywhereONNX`	STT/TTS/VAD via ONNX Runtime

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

Overview

LLM

STT

TTS

VLM

Tool Calling

Image Generation

VAD

Key Capabilities

Core Philosophy

Features

Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Vision Language Models (VLM)

Tool Calling

Image Generation (Diffusion)

Voice Agent Pipeline

Model Management

System Requirements

SDK Modules

Next Steps

Installation

Quick Start

Getting Started

Swift SDK

Kotlin SDK

React Native SDK

Flutter SDK

Web SDK

Vibe Coding

​Overview

LLM

STT

TTS

VLM

Tool Calling

Image Generation

VAD

​Key Capabilities

​Core Philosophy

​Features

​Language Models (LLM)

​Speech-to-Text (STT)

​Text-to-Speech (TTS)

​Voice Activity Detection (VAD)

​Vision Language Models (VLM)

​Tool Calling

​Image Generation (Diffusion)

​Voice Agent Pipeline

​Model Management

​System Requirements

​SDK Modules

​Next Steps

Installation

Quick Start

Overview

Key Capabilities

Core Philosophy

Features

Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Vision Language Models (VLM)

Tool Calling

Image Generation (Diffusion)

Voice Agent Pipeline

Model Management

System Requirements

SDK Modules

Next Steps