Skip to main content
Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

This guide covers SDK initialization options, backend registration, model management, events, browser capabilities, and audio utilities.

SDK Initialization

Basic Initialization

import { RunAnywhere, SDKEnvironment } from '@runanywhere/web'
import { LlamaCPP } from '@runanywhere/web-llamacpp'
import { ONNX } from '@runanywhere/web-onnx'

// Step 1: Initialize core SDK
await RunAnywhere.initialize({
  environment: SDKEnvironment.Development,
  debug: true,
})

// Step 2: Register backends (loads WASM automatically)
await LlamaCPP.register() // LLM + VLM
await ONNX.register() // STT + TTS + VAD

Full Configuration

interface SDKInitOptions {
  /** SDK environment */
  environment?: SDKEnvironment // Development | Staging | Production

  /** Enable debug logging */
  debug?: boolean

  /** API key for authentication (optional) */
  apiKey?: string

  /** Base URL for API requests */
  baseURL?: string

  /** Acceleration preference */
  acceleration?: AccelerationPreference // 'auto' | 'webgpu' | 'cpu'

  /** Custom URL for WebGPU WASM glue */
  webgpuWasmUrl?: string
}

Backend Registration

After initializing the core SDK, register the inference backends you need:
import { LlamaCPP } from '@runanywhere/web-llamacpp'
import { ONNX } from '@runanywhere/web-onnx'

// LlamaCpp: LLM text generation + VLM vision
await LlamaCPP.register()
console.log('LlamaCpp registered:', LlamaCPP.isRegistered)
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'

// ONNX (sherpa-onnx): STT + TTS + VAD
await ONNX.register()
Backend registration loads WASM binaries and can take a few seconds. Always await the register calls before using any inference APIs. Registration is idempotent — calling it multiple times is safe.

Environment Modes

EnvironmentEnum ValueDescriptionLogging
DevelopmentSDKEnvironment.DevelopmentLocal development, full debuggingDebug
StagingSDKEnvironment.StagingTesting with real servicesInfo
ProductionSDKEnvironment.ProductionProduction deploymentWarning

Logging

Configure Log Level

import { SDKLogger, LogLevel } from '@runanywhere/web'

SDKLogger.level = LogLevel.Debug // Trace | Debug | Info | Warning | Error | Fatal
SDKLogger.enabled = true

Log Levels

LevelDescriptionUse Case
TraceVery detailed tracingDeep debug
DebugDetailed debugging infoDevelopment
InfoGeneral informationStaging
WarningPotential issuesProduction
ErrorErrors and failuresProduction
FatalCritical failuresAlways

Events

EventBus

The SDK provides a typed event system for monitoring SDK activities:
import { EventBus } from '@runanywhere/web'

// Subscribe to model download progress
const unsubscribe = EventBus.shared.on('model.downloadProgress', (evt) => {
  console.log(`Model: ${evt.modelId}, Progress: ${((evt.progress ?? 0) * 100).toFixed(0)}%`)
})

EventBus.shared.on('model.loadCompleted', (evt) => {
  console.log(`Model loaded: ${evt.modelId}`)
})

// Clean up
unsubscribe()
Event properties are directly on the event object (e.g., evt.modelId, evt.progress), not nested under evt.data.

Event Types

EventDescription
model.downloadProgressModel download progress (modelId, progress)
model.downloadCompletedModel download finished
model.loadCompletedModel loaded into memory
model.unloadedModel unloaded
generation.startedText generation started
generation.completedText generation completed
generation.failedText generation failed

Model Sources

All models in RunAnywhere are sourced from HuggingFace. The SDK provides a model registry that resolves compact model definitions into full download URLs and manages the complete lifecycle: registration -> download -> storage -> loading.

How It Works

When you register a model with a repo field, the SDK constructs the download URL automatically:
https://huggingface.co/{repo}/resolve/main/{filename}
For example, repo: 'LiquidAI/LFM2-350M-GGUF' with files: ['LFM2-350M-Q4_K_M.gguf'] resolves to:
https://huggingface.co/LiquidAI/LFM2-350M-GGUF/resolve/main/LFM2-350M-Q4_K_M.gguf

CompactModelDef

The registerModels API accepts an array of compact model definitions:
import { ModelCategory, LLMFramework } from '@runanywhere/web'

interface CompactModelDef {
  /** Unique identifier for the model */
  id: string

  /** Human-readable model name */
  name: string

  /** Inference backend */
  framework: LLMFramework // LLMFramework.LlamaCpp | LLMFramework.ONNX

  /** Model category (determines which engine handles it) */
  modality: ModelCategory
  // ModelCategory.Language         — LLM text generation
  // ModelCategory.Multimodal       — VLM image + text
  // ModelCategory.SpeechRecognition — STT
  // ModelCategory.SpeechSynthesis   — TTS
  // ModelCategory.Audio             — VAD

  /** HuggingFace repo path (e.g., 'LiquidAI/LFM2-350M-GGUF') */
  repo?: string

  /** Model files in the repo. First file = primary, rest = additional (e.g., mmproj for VLM) */
  files?: string[]

  /** Direct URL (alternative to repo + files) */
  url?: string

  /** 'archive' for tar.gz bundles (STT/TTS), omit for individual GGUF files */
  artifactType?: 'archive'

  /** Estimated memory requirement in bytes */
  memoryRequirement?: number
}

URL Resolution Rules

ConfigURL PatternUse Case
repo + fileshttps://huggingface.co/{repo}/resolve/main/{file}Most models (LLM, VLM)
url onlyUsed as-isDirect links, non-HF sources
url + artifactType: 'archive'Used as-is, extracted after downloadSTT/TTS model bundles

Model Management

All model management operations use ModelManager from @runanywhere/web.

Register Models

import { RunAnywhere, ModelCategory, LLMFramework } from '@runanywhere/web'

RunAnywhere.registerModels([
  // LLM: Liquid AI LFM2
  {
    id: 'lfm2-350m-q4_k_m',
    name: 'LFM2 350M Q4_K_M',
    repo: 'LiquidAI/LFM2-350M-GGUF',
    files: ['LFM2-350M-Q4_K_M.gguf'],
    framework: LLMFramework.LlamaCpp,
    modality: ModelCategory.Language,
    memoryRequirement: 250_000_000,
  },

  // VLM: Liquid AI LFM2-VL (two files: model + mmproj)
  {
    id: 'lfm2-vl-450m-q4_0',
    name: 'LFM2-VL 450M Q4_0',
    repo: 'runanywhere/LFM2-VL-450M-GGUF',
    files: ['LFM2-VL-450M-Q4_0.gguf', 'mmproj-LFM2-VL-450M-Q8_0.gguf'],
    framework: LLMFramework.LlamaCpp,
    modality: ModelCategory.Multimodal,
    memoryRequirement: 500_000_000,
  },

  // STT: Whisper (archive bundle from direct URL)
  {
    id: 'sherpa-onnx-whisper-tiny.en',
    name: 'Whisper Tiny English (ONNX)',
    url: 'https://huggingface.co/runanywhere/sherpa-onnx-whisper-tiny.en/resolve/main/sherpa-onnx-whisper-tiny.en.tar.gz',
    framework: LLMFramework.ONNX,
    modality: ModelCategory.SpeechRecognition,
    memoryRequirement: 105_000_000,
    artifactType: 'archive' as const,
  },

  // TTS: Piper (archive bundle)
  {
    id: 'vits-piper-en_US-lessac-medium',
    name: 'Piper TTS US English (Lessac)',
    url: 'https://huggingface.co/runanywhere/vits-piper-en_US-lessac-medium/resolve/main/vits-piper-en_US-lessac-medium.tar.gz',
    framework: LLMFramework.ONNX,
    modality: ModelCategory.SpeechSynthesis,
    memoryRequirement: 65_000_000,
    artifactType: 'archive' as const,
  },

  // VAD: Silero (single ONNX file)
  {
    id: 'silero-vad-v5',
    name: 'Silero VAD v5',
    url: 'https://huggingface.co/runanywhere/silero-vad-v5/resolve/main/silero_vad.onnx',
    files: ['silero_vad.onnx'],
    framework: LLMFramework.ONNX,
    modality: ModelCategory.Audio,
    memoryRequirement: 5_000_000,
  },
])

Available Models on HuggingFace

LLM Models

ModelHuggingFace RepoSizeNotes
LFM2 350MLiquidAI/LFM2-350M-GGUF~250MBLiquid AI, ultra-compact
LFM2 1.2B ToolLiquidAI/LFM2-1.2B-Tool-GGUF~800MBLiquid AI, tool-calling optimized
Qwen 2.5 0.5BQwen/Qwen2.5-0.5B-Instruct-GGUF~400MBMultilingual

VLM Models

ModelHuggingFace RepoSizeNotes
LFM2-VL 450Mrunanywhere/LFM2-VL-450M-GGUF~500MBLiquid AI, smallest VLM
SmolVLM 500Mrunanywhere/SmolVLM-500M-Instruct-GGUF~500MBHuggingFace SmolVLM
Qwen2-VL 2Brunanywhere/Qwen2-VL-2B-Instruct-GGUF~1.5GBHigher quality

STT / TTS / VAD Models

ModelURLSizeNotes
Whisper Tiny ENrunanywhere/sherpa-onnx-whisper-tiny.en~105MBArchive bundle
Piper TTS (Lessac)runanywhere/vits-piper-en_US-lessac-medium~65MBArchive bundle
Silero VAD v5runanywhere/silero-vad-v5~5MBSingle ONNX file

Download and Load

import { ModelManager, ModelCategory, EventBus } from '@runanywhere/web'

// Track download progress
EventBus.shared.on('model.downloadProgress', (evt) => {
  console.log(`Downloading ${evt.modelId}: ${((evt.progress ?? 0) * 100).toFixed(0)}%`)
})

// Download to OPFS (persists across sessions)
await ModelManager.downloadModel('lfm2-350m-q4_k_m')

// Load into memory for inference
await ModelManager.loadModel('lfm2-350m-q4_k_m')

// Check loaded models
const allModels = ModelManager.getModels()
const loaded = ModelManager.getLoadedModel(ModelCategory.Language)
console.log('Loaded:', loaded?.id)

Multi-Model Loading with coexist

By default, loading a new model unloads any previously loaded model. For the voice pipeline (which needs STT + LLM + TTS + VAD simultaneously), pass coexist: true:
// Load all 4 voice models without unloading each other
await ModelManager.loadModel('silero-vad-v5', { coexist: true })
await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true })
await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true })
await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true })

Storage (OPFS)

Downloaded models are persisted in the browser’s Origin Private File System (OPFS). This means:
  • Models survive page refreshes and browser restarts
  • Each origin (domain) has its own isolated storage
  • The SDK auto-detects previously downloaded models on page load
  • If storage quota is exceeded, the SDK auto-evicts least-recently-used models
Large model downloads (>200MB) can crash the browser tab on memory-constrained devices. The OPFS write buffers data in memory before committing. If the tab crashes mid-download, refresh and retry — the SDK can resume partial downloads. Start with smaller models (LFM2 350M at ~250MB) before attempting larger ones (Qwen2-VL 2B at ~1.5GB).

Delete Models

import { ModelManager } from '@runanywhere/web'

// Delete a specific model from OPFS
await ModelManager.deleteModel('lfm2-350m-q4_k_m')

Audio Utilities

Audio utilities (AudioCapture, AudioPlayback) are in @runanywhere/web-onnx, while video utilities (VideoCapture) are in @runanywhere/web-llamacpp. Don’t mix up the import sources.

AudioCapture (Microphone)

AudioCapture is in @runanywhere/web-onnx. Configuration is passed to the constructor, and callbacks are passed to start():
import { AudioCapture } from '@runanywhere/web-onnx'

const capture = new AudioCapture({ sampleRate: 16000 })

await capture.start(
  (chunk: Float32Array) => {
    // Process audio samples (e.g., feed to VAD)
  },
  (level: number) => {
    // Audio level 0.0-1.0 (for UI visualization)
  }
)

// Stop when done
capture.stop()

AudioPlayback (Speaker)

AudioPlayback is in @runanywhere/web-onnx:
import { AudioPlayback } from '@runanywhere/web-onnx'

const player = new AudioPlayback({ sampleRate: 22050 })

await player.play(audioFloat32Array, 22050)

// Clean up resources
player.dispose()

VideoCapture (Camera)

VideoCapture is in @runanywhere/web-llamacpp:
import { VideoCapture } from '@runanywhere/web-llamacpp'

const camera = new VideoCapture({ facingMode: 'environment' }) // or 'user' for selfie
await camera.start()

// Add the video preview to the DOM
document.getElementById('preview')!.appendChild(camera.videoElement)

// Capture a frame (downscaled to 256px max dimension)
const frame = camera.captureFrame(256)
// frame.rgbPixels: Uint8Array (RGB, no alpha)
// frame.width, frame.height: actual dimensions

// Check state
console.log('Is capturing:', camera.isCapturing)

camera.stop()

Acceleration

GPU Acceleration

The SDK auto-detects WebGPU availability when LlamaCPP.register() is called:
import { LlamaCPP } from '@runanywhere/web-llamacpp'

await LlamaCPP.register()
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'
ModeDescription
webgpuWebGPU detected and WASM loaded successfully
cpuCPU-only WASM (WebGPU not available or failed to load)
If the WebGPU WASM file returns a 404, the SDK gracefully falls back to CPU mode. This is normal behavior — check LlamaCPP.accelerationMode to confirm which mode is active.