Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
This guide covers SDK initialization options, backend registration, model management, events, browser capabilities, and audio utilities.
SDK Initialization
Basic Initialization
import { RunAnywhere, SDKEnvironment } from '@runanywhere/web'
import { LlamaCPP } from '@runanywhere/web-llamacpp'
import { ONNX } from '@runanywhere/web-onnx'
// Step 1: Initialize core SDK
await RunAnywhere.initialize({
environment: SDKEnvironment.Development,
debug: true,
})
// Step 2: Register backends (loads WASM automatically)
await LlamaCPP.register() // LLM + VLM
await ONNX.register() // STT + TTS + VAD
Full Configuration
interface SDKInitOptions {
/** SDK environment */
environment?: SDKEnvironment // Development | Staging | Production
/** Enable debug logging */
debug?: boolean
/** API key for authentication (optional) */
apiKey?: string
/** Base URL for API requests */
baseURL?: string
/** Acceleration preference */
acceleration?: AccelerationPreference // 'auto' | 'webgpu' | 'cpu'
/** Custom URL for WebGPU WASM glue */
webgpuWasmUrl?: string
}
Backend Registration
After initializing the core SDK, register the inference backends you need:
import { LlamaCPP } from '@runanywhere/web-llamacpp'
import { ONNX } from '@runanywhere/web-onnx'
// LlamaCpp: LLM text generation + VLM vision
await LlamaCPP.register()
console.log('LlamaCpp registered:', LlamaCPP.isRegistered)
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'
// ONNX (sherpa-onnx): STT + TTS + VAD
await ONNX.register()
Backend registration loads WASM binaries and can take a few seconds. Always await the register
calls before using any inference APIs. Registration is idempotent — calling it multiple times is
safe.
Environment Modes
| Environment | Enum Value | Description | Logging |
|---|
| Development | SDKEnvironment.Development | Local development, full debugging | Debug |
| Staging | SDKEnvironment.Staging | Testing with real services | Info |
| Production | SDKEnvironment.Production | Production deployment | Warning |
Logging
import { SDKLogger, LogLevel } from '@runanywhere/web'
SDKLogger.level = LogLevel.Debug // Trace | Debug | Info | Warning | Error | Fatal
SDKLogger.enabled = true
Log Levels
| Level | Description | Use Case |
|---|
Trace | Very detailed tracing | Deep debug |
Debug | Detailed debugging info | Development |
Info | General information | Staging |
Warning | Potential issues | Production |
Error | Errors and failures | Production |
Fatal | Critical failures | Always |
Events
EventBus
The SDK provides a typed event system for monitoring SDK activities:
import { EventBus } from '@runanywhere/web'
// Subscribe to model download progress
const unsubscribe = EventBus.shared.on('model.downloadProgress', (evt) => {
console.log(`Model: ${evt.modelId}, Progress: ${((evt.progress ?? 0) * 100).toFixed(0)}%`)
})
EventBus.shared.on('model.loadCompleted', (evt) => {
console.log(`Model loaded: ${evt.modelId}`)
})
// Clean up
unsubscribe()
Event properties are directly on the event object (e.g., evt.modelId, evt.progress), not
nested under evt.data.
Event Types
| Event | Description |
|---|
model.downloadProgress | Model download progress (modelId, progress) |
model.downloadCompleted | Model download finished |
model.loadCompleted | Model loaded into memory |
model.unloaded | Model unloaded |
generation.started | Text generation started |
generation.completed | Text generation completed |
generation.failed | Text generation failed |
Model Sources
All models in RunAnywhere are sourced from HuggingFace. The SDK provides a model registry that resolves compact model definitions into full download URLs and manages the complete lifecycle: registration -> download -> storage -> loading.
How It Works
When you register a model with a repo field, the SDK constructs the download URL automatically:
https://huggingface.co/{repo}/resolve/main/{filename}
For example, repo: 'LiquidAI/LFM2-350M-GGUF' with files: ['LFM2-350M-Q4_K_M.gguf'] resolves to:
https://huggingface.co/LiquidAI/LFM2-350M-GGUF/resolve/main/LFM2-350M-Q4_K_M.gguf
CompactModelDef
The registerModels API accepts an array of compact model definitions:
import { ModelCategory, LLMFramework } from '@runanywhere/web'
interface CompactModelDef {
/** Unique identifier for the model */
id: string
/** Human-readable model name */
name: string
/** Inference backend */
framework: LLMFramework // LLMFramework.LlamaCpp | LLMFramework.ONNX
/** Model category (determines which engine handles it) */
modality: ModelCategory
// ModelCategory.Language — LLM text generation
// ModelCategory.Multimodal — VLM image + text
// ModelCategory.SpeechRecognition — STT
// ModelCategory.SpeechSynthesis — TTS
// ModelCategory.Audio — VAD
/** HuggingFace repo path (e.g., 'LiquidAI/LFM2-350M-GGUF') */
repo?: string
/** Model files in the repo. First file = primary, rest = additional (e.g., mmproj for VLM) */
files?: string[]
/** Direct URL (alternative to repo + files) */
url?: string
/** 'archive' for tar.gz bundles (STT/TTS), omit for individual GGUF files */
artifactType?: 'archive'
/** Estimated memory requirement in bytes */
memoryRequirement?: number
}
URL Resolution Rules
| Config | URL Pattern | Use Case |
|---|
repo + files | https://huggingface.co/{repo}/resolve/main/{file} | Most models (LLM, VLM) |
url only | Used as-is | Direct links, non-HF sources |
url + artifactType: 'archive' | Used as-is, extracted after download | STT/TTS model bundles |
Model Management
All model management operations use ModelManager from @runanywhere/web.
Register Models
import { RunAnywhere, ModelCategory, LLMFramework } from '@runanywhere/web'
RunAnywhere.registerModels([
// LLM: Liquid AI LFM2
{
id: 'lfm2-350m-q4_k_m',
name: 'LFM2 350M Q4_K_M',
repo: 'LiquidAI/LFM2-350M-GGUF',
files: ['LFM2-350M-Q4_K_M.gguf'],
framework: LLMFramework.LlamaCpp,
modality: ModelCategory.Language,
memoryRequirement: 250_000_000,
},
// VLM: Liquid AI LFM2-VL (two files: model + mmproj)
{
id: 'lfm2-vl-450m-q4_0',
name: 'LFM2-VL 450M Q4_0',
repo: 'runanywhere/LFM2-VL-450M-GGUF',
files: ['LFM2-VL-450M-Q4_0.gguf', 'mmproj-LFM2-VL-450M-Q8_0.gguf'],
framework: LLMFramework.LlamaCpp,
modality: ModelCategory.Multimodal,
memoryRequirement: 500_000_000,
},
// STT: Whisper (archive bundle from direct URL)
{
id: 'sherpa-onnx-whisper-tiny.en',
name: 'Whisper Tiny English (ONNX)',
url: 'https://huggingface.co/runanywhere/sherpa-onnx-whisper-tiny.en/resolve/main/sherpa-onnx-whisper-tiny.en.tar.gz',
framework: LLMFramework.ONNX,
modality: ModelCategory.SpeechRecognition,
memoryRequirement: 105_000_000,
artifactType: 'archive' as const,
},
// TTS: Piper (archive bundle)
{
id: 'vits-piper-en_US-lessac-medium',
name: 'Piper TTS US English (Lessac)',
url: 'https://huggingface.co/runanywhere/vits-piper-en_US-lessac-medium/resolve/main/vits-piper-en_US-lessac-medium.tar.gz',
framework: LLMFramework.ONNX,
modality: ModelCategory.SpeechSynthesis,
memoryRequirement: 65_000_000,
artifactType: 'archive' as const,
},
// VAD: Silero (single ONNX file)
{
id: 'silero-vad-v5',
name: 'Silero VAD v5',
url: 'https://huggingface.co/runanywhere/silero-vad-v5/resolve/main/silero_vad.onnx',
files: ['silero_vad.onnx'],
framework: LLMFramework.ONNX,
modality: ModelCategory.Audio,
memoryRequirement: 5_000_000,
},
])
Available Models on HuggingFace
LLM Models
| Model | HuggingFace Repo | Size | Notes |
|---|
| LFM2 350M | LiquidAI/LFM2-350M-GGUF | ~250MB | Liquid AI, ultra-compact |
| LFM2 1.2B Tool | LiquidAI/LFM2-1.2B-Tool-GGUF | ~800MB | Liquid AI, tool-calling optimized |
| Qwen 2.5 0.5B | Qwen/Qwen2.5-0.5B-Instruct-GGUF | ~400MB | Multilingual |
VLM Models
| Model | HuggingFace Repo | Size | Notes |
|---|
| LFM2-VL 450M | runanywhere/LFM2-VL-450M-GGUF | ~500MB | Liquid AI, smallest VLM |
| SmolVLM 500M | runanywhere/SmolVLM-500M-Instruct-GGUF | ~500MB | HuggingFace SmolVLM |
| Qwen2-VL 2B | runanywhere/Qwen2-VL-2B-Instruct-GGUF | ~1.5GB | Higher quality |
STT / TTS / VAD Models
| Model | URL | Size | Notes |
|---|
| Whisper Tiny EN | runanywhere/sherpa-onnx-whisper-tiny.en | ~105MB | Archive bundle |
| Piper TTS (Lessac) | runanywhere/vits-piper-en_US-lessac-medium | ~65MB | Archive bundle |
| Silero VAD v5 | runanywhere/silero-vad-v5 | ~5MB | Single ONNX file |
Download and Load
import { ModelManager, ModelCategory, EventBus } from '@runanywhere/web'
// Track download progress
EventBus.shared.on('model.downloadProgress', (evt) => {
console.log(`Downloading ${evt.modelId}: ${((evt.progress ?? 0) * 100).toFixed(0)}%`)
})
// Download to OPFS (persists across sessions)
await ModelManager.downloadModel('lfm2-350m-q4_k_m')
// Load into memory for inference
await ModelManager.loadModel('lfm2-350m-q4_k_m')
// Check loaded models
const allModels = ModelManager.getModels()
const loaded = ModelManager.getLoadedModel(ModelCategory.Language)
console.log('Loaded:', loaded?.id)
Multi-Model Loading with coexist
By default, loading a new model unloads any previously loaded model. For the voice pipeline (which needs STT + LLM + TTS + VAD simultaneously), pass coexist: true:
// Load all 4 voice models without unloading each other
await ModelManager.loadModel('silero-vad-v5', { coexist: true })
await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true })
await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true })
await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true })
Storage (OPFS)
Downloaded models are persisted in the browser’s Origin Private File System (OPFS). This means:
- Models survive page refreshes and browser restarts
- Each origin (domain) has its own isolated storage
- The SDK auto-detects previously downloaded models on page load
- If storage quota is exceeded, the SDK auto-evicts least-recently-used models
Large model downloads (>200MB) can crash the browser tab on memory-constrained devices. The
OPFS write buffers data in memory before committing. If the tab crashes mid-download, refresh and
retry — the SDK can resume partial downloads. Start with smaller models (LFM2 350M at ~250MB)
before attempting larger ones (Qwen2-VL 2B at ~1.5GB).
Delete Models
import { ModelManager } from '@runanywhere/web'
// Delete a specific model from OPFS
await ModelManager.deleteModel('lfm2-350m-q4_k_m')
Audio Utilities
Audio utilities (AudioCapture, AudioPlayback) are in @runanywhere/web-onnx, while video
utilities (VideoCapture) are in @runanywhere/web-llamacpp. Don’t mix up the import sources.
AudioCapture (Microphone)
AudioCapture is in @runanywhere/web-onnx. Configuration is passed to the constructor, and callbacks are passed to start():
import { AudioCapture } from '@runanywhere/web-onnx'
const capture = new AudioCapture({ sampleRate: 16000 })
await capture.start(
(chunk: Float32Array) => {
// Process audio samples (e.g., feed to VAD)
},
(level: number) => {
// Audio level 0.0-1.0 (for UI visualization)
}
)
// Stop when done
capture.stop()
AudioPlayback (Speaker)
AudioPlayback is in @runanywhere/web-onnx:
import { AudioPlayback } from '@runanywhere/web-onnx'
const player = new AudioPlayback({ sampleRate: 22050 })
await player.play(audioFloat32Array, 22050)
// Clean up resources
player.dispose()
VideoCapture (Camera)
VideoCapture is in @runanywhere/web-llamacpp:
import { VideoCapture } from '@runanywhere/web-llamacpp'
const camera = new VideoCapture({ facingMode: 'environment' }) // or 'user' for selfie
await camera.start()
// Add the video preview to the DOM
document.getElementById('preview')!.appendChild(camera.videoElement)
// Capture a frame (downscaled to 256px max dimension)
const frame = camera.captureFrame(256)
// frame.rgbPixels: Uint8Array (RGB, no alpha)
// frame.width, frame.height: actual dimensions
// Check state
console.log('Is capturing:', camera.isCapturing)
camera.stop()
Acceleration
GPU Acceleration
The SDK auto-detects WebGPU availability when LlamaCPP.register() is called:
import { LlamaCPP } from '@runanywhere/web-llamacpp'
await LlamaCPP.register()
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'
| Mode | Description |
|---|
webgpu | WebGPU detected and WASM loaded successfully |
cpu | CPU-only WASM (WebGPU not available or failed to load) |
If the WebGPU WASM file returns a 404, the SDK gracefully falls back to CPU mode. This is normal
behavior — check LlamaCPP.accelerationMode to confirm which mode is active.