Early Beta — The Web SDK is in early beta (v0.1.x). APIs may change between releases. We’d
love your feedback — report issues or share ideas on
GitHub.
Overview
The RunAnywhere Web SDK is a production-grade, on-device AI SDK for the browser. It compiles the same C++ inference engine used by the iOS and Android SDKs to WebAssembly, enabling developers to run LLMs, Speech-to-Text, Text-to-Speech, Vision, and Voice AI directly in the browser — private, offline-capable, and with zero server dependencies.The SDK is split into three npm packages by backend:| Package | Description |
|---|---|
@runanywhere/web | Core SDK: initialization, model management, events, VoicePipeline |
@runanywhere/web-llamacpp | LLM & VLM inference via llama.cpp WASM (TextGeneration, VLMWorkerBridge, VideoCapture) |
@runanywhere/web-onnx | STT, TTS & VAD via sherpa-onnx WASM (AudioCapture, AudioPlayback, VAD) |
LLM
Text generation with streaming support via llama.cpp WASM
STT
Speech-to-text transcription with Whisper and sherpa-onnx
TTS
Neural voice synthesis with Piper TTS via sherpa-onnx
VAD
Real-time voice activity detection with Silero VAD
VLM
Vision language models for image understanding
Tool Calling
Function calling and structured JSON output
Key Capabilities
- Three focused packages — Core SDK + LlamaCpp backend (LLM/VLM) + ONNX backend (STT/TTS/VAD), install only what you need
- Zero runtime dependencies — Everything is self-contained via WebAssembly
- TypeScript-first — Full type safety with comprehensive type definitions
- Privacy by default — All inference runs in-browser via WASM, no data leaves the device
- Persistent storage — Models cached in OPFS (Origin Private File System) across sessions
Core Philosophy
On-Device First
On-Device First
All AI inference runs locally in the browser via WebAssembly. Once models are downloaded, no
network connection is required for inference. Audio, text, and images never leave the device.
Modular Package Architecture
Modular Package Architecture
The Web SDK splits functionality across three packages by inference backend:
@runanywhere/web
(core), @runanywhere/web-llamacpp (LLM/VLM via llama.cpp WASM), and @runanywhere/web-onnx
(STT/TTS/VAD via sherpa-onnx WASM). This lets you install only the backends you need.Privacy by Design
Privacy by Design
All data stays in the browser. No server calls, no API keys required for inference. Model files
are stored in the browser’s sandboxed OPFS storage.
Platform Parity
Platform Parity
The Web SDK compiles the same C++ core as the iOS and Android SDKs to WebAssembly. Identical
inference logic, consistent results across all platforms.
Features
Language Models (LLM)
- On-device text generation with streaming support
- llama.cpp backend compiled to WASM (Liquid AI LFM2, Llama, Mistral, Qwen, SmolLM, and other GGUF models)
- Configurable system prompts, temperature, top-k/top-p, and max tokens
- Token streaming with async iterators and cancellation
- Result metrics:
tokensUsed,tokensPerSecond,latencyMs
Speech-to-Text (STT)
- Offline speech recognition via whisper.cpp and sherpa-onnx (WASM)
- Multiple model architectures: Whisper, Zipformer, Paraformer
- Batch transcription from
Float32Arrayaudio data - Real-time streaming transcription sessions
Text-to-Speech (TTS)
- Neural voice synthesis via sherpa-onnx Piper TTS (WASM)
- Multiple voice models with configurable speed and speaker
- PCM audio output (
Float32Array) with sample rate metadata
Voice Activity Detection (VAD)
- Silero VAD model via sherpa-onnx (WASM)
- Real-time speech/silence detection from audio streams
- Speech segment extraction with configurable thresholds
- Callback-based speech activity events
Voice Pipeline
- Full STT -> LLM (streaming) -> TTS orchestration
- Callback-driven state transitions (transcription, generation, synthesis)
- Cancellation support for in-progress turns
- Multi-model coexistence via
coexistflag
Vision Language Models (VLM)
- Multimodal image+text inference via llama.cpp with mtmd backend
- Camera integration with
VideoCaptureclass - Runs in a dedicated Web Worker via
VLMWorkerBridgefor responsive UI - Supports Liquid AI LFM2-VL, Qwen2-VL, SmolVLM, and LLaVA architectures
Tool Calling & Structured Output
- Function calling with typed tool definitions and parameter schemas
- Automatic tool orchestration loop (generate -> parse -> execute -> continue)
- JSON schema-guided generation with WASM-powered validation
- Supports default XML format and LFM2 Pythonic format
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| Browser | Chrome 96+ / Edge 96+ | Chrome 120+ / Edge 120+ |
| WebAssembly | Required | Required |
| SharedArrayBuffer | For multi-threaded | Requires COOP/COEP headers |
| OPFS | For model storage | All modern browsers |
| RAM | 2GB | 4GB+ for larger models |
Cross-Origin Isolation headers (
COOP and COEP) are required for multi-threaded WASM via
SharedArrayBuffer. Without them, the SDK falls back to single-threaded mode. See
Configuration for setup details.SDK Architecture
Key Differences from Native SDKs
| Aspect | Native SDKs (iOS/Android/RN) | Web SDK |
|---|---|---|
| Package | Multiple packages per backend | Three packages: web, web-llamacpp, web-onnx |
| Runtime | Native code | WebAssembly |
| Storage | File system | OPFS (browser sandbox) |
| Audio | Platform APIs | Web Audio API |
| GPU | Metal / Vulkan | WebGPU (when available) |
| Threading | OS threads | SharedArrayBuffer + COOP/COEP |
| Install | npm + native build | npm only |
Example App
A full-featured starter application is included in the SDK repository:- Web Starter App — Chat, Vision, and Voice demos with React + Vite