After initializing the core SDK, register the inference backends you need:
import { LlamaCPP } from '@runanywhere/web-llamacpp'import { ONNX } from '@runanywhere/web-onnx'// LlamaCpp: LLM text generation + VLM visionawait LlamaCPP.register()console.log('LlamaCpp registered:', LlamaCPP.isRegistered)console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'// ONNX (sherpa-onnx): STT + TTS + VADawait ONNX.register()
Backend registration loads WASM binaries and can take a few seconds. Always await the register
calls before using any inference APIs. Registration is idempotent — calling it multiple times is
safe.
All models in RunAnywhere are sourced from HuggingFace. The SDK provides a model registry that resolves compact model definitions into full download URLs and manages the complete lifecycle: registration -> download -> storage -> loading.
By default, loading a new model unloads any previously loaded model. For the voice pipeline (which needs STT + LLM + TTS + VAD simultaneously), pass coexist: true:
// Load all 4 voice models without unloading each otherawait ModelManager.loadModel('silero-vad-v5', { coexist: true })await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true })await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true })await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true })
Downloaded models are persisted in the browser’s Origin Private File System (OPFS). This means:
Models survive page refreshes and browser restarts
Each origin (domain) has its own isolated storage
The SDK auto-detects previously downloaded models on page load
If storage quota is exceeded, the SDK auto-evicts least-recently-used models
Large model downloads (>200MB) can crash the browser tab on memory-constrained devices. The
OPFS write buffers data in memory before committing. If the tab crashes mid-download, refresh and
retry — the SDK can resume partial downloads. Start with smaller models (LFM2 350M at ~250MB)
before attempting larger ones (Qwen2-VL 2B at ~1.5GB).
Audio utilities (AudioCapture, AudioPlayback) are in @runanywhere/web-onnx, while video
utilities (VideoCapture) are in @runanywhere/web-llamacpp. Don’t mix up the import sources.
import { AudioPlayback } from '@runanywhere/web-onnx'const player = new AudioPlayback({ sampleRate: 22050 })await player.play(audioFloat32Array, 22050)// Clean up resourcesplayer.dispose()
import { VideoCapture } from '@runanywhere/web-llamacpp'const camera = new VideoCapture({ facingMode: 'environment' }) // or 'user' for selfieawait camera.start()// Add the video preview to the DOMdocument.getElementById('preview')!.appendChild(camera.videoElement)// Capture a frame (downscaled to 256px max dimension)const frame = camera.captureFrame(256)// frame.rgbPixels: Uint8Array (RGB, no alpha)// frame.width, frame.height: actual dimensions// Check stateconsole.log('Is capturing:', camera.isCapturing)camera.stop()
The SDK auto-detects WebGPU availability when LlamaCPP.register() is called:
import { LlamaCPP } from '@runanywhere/web-llamacpp'await LlamaCPP.register()console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'
Mode
Description
webgpu
WebGPU detected and WASM loaded successfully
cpu
CPU-only WASM (WebGPU not available or failed to load)
If the WebGPU WASM file returns a 404, the SDK gracefully falls back to CPU mode. This is normal
behavior — check LlamaCPP.accelerationMode to confirm which mode is active.