Skip to main content
Early Beta — The Web SDK is in early beta (v0.1.x). APIs may change between releases. We’d love your feedback — report issues or share ideas on GitHub.

Overview

The RunAnywhere Web SDK is a production-grade, on-device AI SDK for the browser. It compiles the same C++ inference engine used by the iOS and Android SDKs to WebAssembly, enabling developers to run LLMs, Speech-to-Text, Text-to-Speech, Vision, and Voice AI directly in the browser — private, offline-capable, and with zero server dependencies.The SDK is split into three npm packages by backend:
PackageDescription
@runanywhere/webCore SDK: initialization, model management, events, VoicePipeline
@runanywhere/web-llamacppLLM & VLM inference via llama.cpp WASM (TextGeneration, VLMWorkerBridge, VideoCapture)
@runanywhere/web-onnxSTT, TTS & VAD via sherpa-onnx WASM (AudioCapture, AudioPlayback, VAD)

Key Capabilities

  • Three focused packages — Core SDK + LlamaCpp backend (LLM/VLM) + ONNX backend (STT/TTS/VAD), install only what you need
  • Zero runtime dependencies — Everything is self-contained via WebAssembly
  • TypeScript-first — Full type safety with comprehensive type definitions
  • Privacy by default — All inference runs in-browser via WASM, no data leaves the device
  • Persistent storage — Models cached in OPFS (Origin Private File System) across sessions

Core Philosophy

All AI inference runs locally in the browser via WebAssembly. Once models are downloaded, no network connection is required for inference. Audio, text, and images never leave the device.
The Web SDK splits functionality across three packages by inference backend: @runanywhere/web (core), @runanywhere/web-llamacpp (LLM/VLM via llama.cpp WASM), and @runanywhere/web-onnx (STT/TTS/VAD via sherpa-onnx WASM). This lets you install only the backends you need.
All data stays in the browser. No server calls, no API keys required for inference. Model files are stored in the browser’s sandboxed OPFS storage.
The Web SDK compiles the same C++ core as the iOS and Android SDKs to WebAssembly. Identical inference logic, consistent results across all platforms.

Features

Language Models (LLM)

  • On-device text generation with streaming support
  • llama.cpp backend compiled to WASM (Liquid AI LFM2, Llama, Mistral, Qwen, SmolLM, and other GGUF models)
  • Configurable system prompts, temperature, top-k/top-p, and max tokens
  • Token streaming with async iterators and cancellation
  • Result metrics: tokensUsed, tokensPerSecond, latencyMs

Speech-to-Text (STT)

  • Offline speech recognition via whisper.cpp and sherpa-onnx (WASM)
  • Multiple model architectures: Whisper, Zipformer, Paraformer
  • Batch transcription from Float32Array audio data
  • Real-time streaming transcription sessions

Text-to-Speech (TTS)

  • Neural voice synthesis via sherpa-onnx Piper TTS (WASM)
  • Multiple voice models with configurable speed and speaker
  • PCM audio output (Float32Array) with sample rate metadata

Voice Activity Detection (VAD)

  • Silero VAD model via sherpa-onnx (WASM)
  • Real-time speech/silence detection from audio streams
  • Speech segment extraction with configurable thresholds
  • Callback-based speech activity events

Voice Pipeline

  • Full STT -> LLM (streaming) -> TTS orchestration
  • Callback-driven state transitions (transcription, generation, synthesis)
  • Cancellation support for in-progress turns
  • Multi-model coexistence via coexist flag

Vision Language Models (VLM)

  • Multimodal image+text inference via llama.cpp with mtmd backend
  • Camera integration with VideoCapture class
  • Runs in a dedicated Web Worker via VLMWorkerBridge for responsive UI
  • Supports Liquid AI LFM2-VL, Qwen2-VL, SmolVLM, and LLaVA architectures

Tool Calling & Structured Output

  • Function calling with typed tool definitions and parameter schemas
  • Automatic tool orchestration loop (generate -> parse -> execute -> continue)
  • JSON schema-guided generation with WASM-powered validation
  • Supports default XML format and LFM2 Pythonic format

System Requirements

ComponentMinimumRecommended
BrowserChrome 96+ / Edge 96+Chrome 120+ / Edge 120+
WebAssemblyRequiredRequired
SharedArrayBufferFor multi-threadedRequires COOP/COEP headers
OPFSFor model storageAll modern browsers
RAM2GB4GB+ for larger models
Cross-Origin Isolation headers (COOP and COEP) are required for multi-threaded WASM via SharedArrayBuffer. Without them, the SDK falls back to single-threaded mode. See Configuration for setup details.

SDK Architecture

Key Differences from Native SDKs

AspectNative SDKs (iOS/Android/RN)Web SDK
PackageMultiple packages per backendThree packages: web, web-llamacpp, web-onnx
RuntimeNative codeWebAssembly
StorageFile systemOPFS (browser sandbox)
AudioPlatform APIsWeb Audio API
GPUMetal / VulkanWebGPU (when available)
ThreadingOS threadsSharedArrayBuffer + COOP/COEP
Installnpm + native buildnpm only

Example App

A full-featured starter application is included in the SDK repository:

Next Steps