Configuration - RunAnywhere Documentation

Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

This guide covers SDK initialization options, backend registration, model management, events, browser capabilities, and audio utilities.

SDK Initialization

Basic Initialization

import { RunAnywhere, SDKEnvironment } from '@runanywhere/web'
import { LlamaCPP } from '@runanywhere/web-llamacpp'
import { ONNX } from '@runanywhere/web-onnx'

// Step 1: Initialize core SDK
await RunAnywhere.initialize({
  environment: SDKEnvironment.Development,
  debug: true,
})

// Step 2: Register backends (loads WASM automatically)
await LlamaCPP.register() // LLM + VLM
await ONNX.register() // STT + TTS + VAD

Full Configuration

interface SDKInitOptions {
  /** SDK environment */
  environment?: SDKEnvironment // Development | Staging | Production

  /** Enable debug logging */
  debug?: boolean

  /** API key for authentication (optional) */
  apiKey?: string

  /** Base URL for API requests */
  baseURL?: string

  /** Acceleration preference */
  acceleration?: AccelerationPreference // 'auto' | 'webgpu' | 'cpu'

  /** Custom URL for WebGPU WASM glue */
  webgpuWasmUrl?: string
}

Backend Registration

After initializing the core SDK, register the inference backends you need:

import { LlamaCPP } from '@runanywhere/web-llamacpp'
import { ONNX } from '@runanywhere/web-onnx'

// LlamaCpp: LLM text generation + VLM vision
await LlamaCPP.register()
console.log('LlamaCpp registered:', LlamaCPP.isRegistered)
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'

// ONNX (sherpa-onnx): STT + TTS + VAD
await ONNX.register()

Backend registration loads WASM binaries and can take a few seconds. Always await the register calls before using any inference APIs. Registration is idempotent — calling it multiple times is safe.

Environment Modes

Environment	Enum Value	Description	Logging
Development	`SDKEnvironment.Development`	Local development, full debugging	Debug
Staging	`SDKEnvironment.Staging`	Testing with real services	Info
Production	`SDKEnvironment.Production`	Production deployment	Warning

Logging

Configure Log Level

import { SDKLogger, LogLevel } from '@runanywhere/web'

SDKLogger.level = LogLevel.Debug // Trace | Debug | Info | Warning | Error | Fatal
SDKLogger.enabled = true

Log Levels

Level	Description	Use Case
`Trace`	Very detailed tracing	Deep debug
`Debug`	Detailed debugging info	Development
`Info`	General information	Staging
`Warning`	Potential issues	Production
`Error`	Errors and failures	Production
`Fatal`	Critical failures	Always

Events

EventBus

The SDK provides a typed event system for monitoring SDK activities:

import { EventBus } from '@runanywhere/web'

// Subscribe to model download progress
const unsubscribe = EventBus.shared.on('model.downloadProgress', (evt) => {
  console.log(`Model: ${evt.modelId}, Progress: ${((evt.progress ?? 0) * 100).toFixed(0)}%`)
})

EventBus.shared.on('model.loadCompleted', (evt) => {
  console.log(`Model loaded: ${evt.modelId}`)
})

// Clean up
unsubscribe()

Event properties are directly on the event object (e.g., evt.modelId, evt.progress), not nested under evt.data.

Event Types

Event	Description
`model.downloadProgress`	Model download progress (`modelId`, `progress`)
`model.downloadCompleted`	Model download finished
`model.loadCompleted`	Model loaded into memory
`model.unloaded`	Model unloaded
`generation.started`	Text generation started
`generation.completed`	Text generation completed
`generation.failed`	Text generation failed

Model Sources

All models in RunAnywhere are sourced from HuggingFace. The SDK provides a model registry that resolves compact model definitions into full download URLs and manages the complete lifecycle: registration -> download -> storage -> loading.

How It Works

When you register a model with a repo field, the SDK constructs the download URL automatically:

https://huggingface.co/{repo}/resolve/main/{filename}

For example, repo: 'LiquidAI/LFM2-350M-GGUF' with files: ['LFM2-350M-Q4_K_M.gguf'] resolves to:

https://huggingface.co/LiquidAI/LFM2-350M-GGUF/resolve/main/LFM2-350M-Q4_K_M.gguf

CompactModelDef

The registerModels API accepts an array of compact model definitions:

import { ModelCategory, LLMFramework } from '@runanywhere/web'

interface CompactModelDef {
  /** Unique identifier for the model */
  id: string

  /** Human-readable model name */
  name: string

  /** Inference backend */
  framework: LLMFramework // LLMFramework.LlamaCpp | LLMFramework.ONNX

  /** Model category (determines which engine handles it) */
  modality: ModelCategory
  // ModelCategory.Language         — LLM text generation
  // ModelCategory.Multimodal       — VLM image + text
  // ModelCategory.SpeechRecognition — STT
  // ModelCategory.SpeechSynthesis   — TTS
  // ModelCategory.Audio             — VAD

  /** HuggingFace repo path (e.g., 'LiquidAI/LFM2-350M-GGUF') */
  repo?: string

  /** Model files in the repo. First file = primary, rest = additional (e.g., mmproj for VLM) */
  files?: string[]

  /** Direct URL (alternative to repo + files) */
  url?: string

  /** 'archive' for tar.gz bundles (STT/TTS), omit for individual GGUF files */
  artifactType?: 'archive'

  /** Estimated memory requirement in bytes */
  memoryRequirement?: number
}

URL Resolution Rules

Config	URL Pattern	Use Case
`repo` + `files`	`https://huggingface.co/{repo}/resolve/main/{file}`	Most models (LLM, VLM)
`url` only	Used as-is	Direct links, non-HF sources
`url` + `artifactType: 'archive'`	Used as-is, extracted after download	STT/TTS model bundles

Model Management

All model management operations use ModelManager from @runanywhere/web.

Register Models

import { RunAnywhere, ModelCategory, LLMFramework } from '@runanywhere/web'

RunAnywhere.registerModels([
  // LLM: Liquid AI LFM2
  {
    id: 'lfm2-350m-q4_k_m',
    name: 'LFM2 350M Q4_K_M',
    repo: 'LiquidAI/LFM2-350M-GGUF',
    files: ['LFM2-350M-Q4_K_M.gguf'],
    framework: LLMFramework.LlamaCpp,
    modality: ModelCategory.Language,
    memoryRequirement: 250_000_000,
  },

  // VLM: Liquid AI LFM2-VL (two files: model + mmproj)
  {
    id: 'lfm2-vl-450m-q4_0',
    name: 'LFM2-VL 450M Q4_0',
    repo: 'runanywhere/LFM2-VL-450M-GGUF',
    files: ['LFM2-VL-450M-Q4_0.gguf', 'mmproj-LFM2-VL-450M-Q8_0.gguf'],
    framework: LLMFramework.LlamaCpp,
    modality: ModelCategory.Multimodal,
    memoryRequirement: 500_000_000,
  },

  // STT: Whisper (archive bundle from direct URL)
  {
    id: 'sherpa-onnx-whisper-tiny.en',
    name: 'Whisper Tiny English (ONNX)',
    url: 'https://huggingface.co/runanywhere/sherpa-onnx-whisper-tiny.en/resolve/main/sherpa-onnx-whisper-tiny.en.tar.gz',
    framework: LLMFramework.ONNX,
    modality: ModelCategory.SpeechRecognition,
    memoryRequirement: 105_000_000,
    artifactType: 'archive' as const,
  },

  // TTS: Piper (archive bundle)
  {
    id: 'vits-piper-en_US-lessac-medium',
    name: 'Piper TTS US English (Lessac)',
    url: 'https://huggingface.co/runanywhere/vits-piper-en_US-lessac-medium/resolve/main/vits-piper-en_US-lessac-medium.tar.gz',
    framework: LLMFramework.ONNX,
    modality: ModelCategory.SpeechSynthesis,
    memoryRequirement: 65_000_000,
    artifactType: 'archive' as const,
  },

  // VAD: Silero (single ONNX file)
  {
    id: 'silero-vad-v5',
    name: 'Silero VAD v5',
    url: 'https://huggingface.co/runanywhere/silero-vad-v5/resolve/main/silero_vad.onnx',
    files: ['silero_vad.onnx'],
    framework: LLMFramework.ONNX,
    modality: ModelCategory.Audio,
    memoryRequirement: 5_000_000,
  },
])

Available Models on HuggingFace

LLM Models

Model	HuggingFace Repo	Size	Notes
LFM2 350M	`LiquidAI/LFM2-350M-GGUF`	~250MB	Liquid AI, ultra-compact
LFM2 1.2B Tool	`LiquidAI/LFM2-1.2B-Tool-GGUF`	~800MB	Liquid AI, tool-calling optimized
Qwen 2.5 0.5B	`Qwen/Qwen2.5-0.5B-Instruct-GGUF`	~400MB	Multilingual

VLM Models

Model	HuggingFace Repo	Size	Notes
LFM2-VL 450M	`runanywhere/LFM2-VL-450M-GGUF`	~500MB	Liquid AI, smallest VLM
SmolVLM 500M	`runanywhere/SmolVLM-500M-Instruct-GGUF`	~500MB	HuggingFace SmolVLM
Qwen2-VL 2B	`runanywhere/Qwen2-VL-2B-Instruct-GGUF`	~1.5GB	Higher quality

STT / TTS / VAD Models

Model	URL	Size	Notes
Whisper Tiny EN	`runanywhere/sherpa-onnx-whisper-tiny.en`	~105MB	Archive bundle
Piper TTS (Lessac)	`runanywhere/vits-piper-en_US-lessac-medium`	~65MB	Archive bundle
Silero VAD v5	`runanywhere/silero-vad-v5`	~5MB	Single ONNX file

Download and Load

import { ModelManager, ModelCategory, EventBus } from '@runanywhere/web'

// Track download progress
EventBus.shared.on('model.downloadProgress', (evt) => {
  console.log(`Downloading ${evt.modelId}: ${((evt.progress ?? 0) * 100).toFixed(0)}%`)
})

// Download to OPFS (persists across sessions)
await ModelManager.downloadModel('lfm2-350m-q4_k_m')

// Load into memory for inference
await ModelManager.loadModel('lfm2-350m-q4_k_m')

// Check loaded models
const allModels = ModelManager.getModels()
const loaded = ModelManager.getLoadedModel(ModelCategory.Language)
console.log('Loaded:', loaded?.id)

Multi-Model Loading with `coexist`

By default, loading a new model unloads any previously loaded model. For the voice pipeline (which needs STT + LLM + TTS + VAD simultaneously), pass coexist: true:

// Load all 4 voice models without unloading each other
await ModelManager.loadModel('silero-vad-v5', { coexist: true })
await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true })
await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true })
await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true })

Storage (OPFS)

Downloaded models are persisted in the browser’s Origin Private File System (OPFS). This means:

Models survive page refreshes and browser restarts
Each origin (domain) has its own isolated storage
The SDK auto-detects previously downloaded models on page load
If storage quota is exceeded, the SDK auto-evicts least-recently-used models

Large model downloads (>200MB) can crash the browser tab on memory-constrained devices. The OPFS write buffers data in memory before committing. If the tab crashes mid-download, refresh and retry — the SDK can resume partial downloads. Start with smaller models (LFM2 350M at ~250MB) before attempting larger ones (Qwen2-VL 2B at ~1.5GB).

Delete Models

import { ModelManager } from '@runanywhere/web'

// Delete a specific model from OPFS
await ModelManager.deleteModel('lfm2-350m-q4_k_m')

Audio Utilities

Audio utilities (AudioCapture, AudioPlayback) are in @runanywhere/web-onnx, while video utilities (VideoCapture) are in @runanywhere/web-llamacpp. Don’t mix up the import sources.

AudioCapture (Microphone)

AudioCapture is in @runanywhere/web-onnx. Configuration is passed to the constructor, and callbacks are passed to start():

import { AudioCapture } from '@runanywhere/web-onnx'

const capture = new AudioCapture({ sampleRate: 16000 })

await capture.start(
  (chunk: Float32Array) => {
    // Process audio samples (e.g., feed to VAD)
  },
  (level: number) => {
    // Audio level 0.0-1.0 (for UI visualization)
  }
)

// Stop when done
capture.stop()

AudioPlayback (Speaker)

AudioPlayback is in @runanywhere/web-onnx:

import { AudioPlayback } from '@runanywhere/web-onnx'

const player = new AudioPlayback({ sampleRate: 22050 })

await player.play(audioFloat32Array, 22050)

// Clean up resources
player.dispose()

VideoCapture (Camera)

VideoCapture is in @runanywhere/web-llamacpp:

import { VideoCapture } from '@runanywhere/web-llamacpp'

const camera = new VideoCapture({ facingMode: 'environment' }) // or 'user' for selfie
await camera.start()

// Add the video preview to the DOM
document.getElementById('preview')!.appendChild(camera.videoElement)

// Capture a frame (downscaled to 256px max dimension)
const frame = camera.captureFrame(256)
// frame.rgbPixels: Uint8Array (RGB, no alpha)
// frame.width, frame.height: actual dimensions

// Check state
console.log('Is capturing:', camera.isCapturing)

camera.stop()

Acceleration

GPU Acceleration

The SDK auto-detects WebGPU availability when LlamaCPP.register() is called:

import { LlamaCPP } from '@runanywhere/web-llamacpp'

await LlamaCPP.register()
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'

Mode	Description
`webgpu`	WebGPU detected and WASM loaded successfully
`cpu`	CPU-only WASM (WebGPU not available or failed to load)

If the WebGPU WASM file returns a 404, the SDK gracefully falls back to CPU mode. This is normal behavior — check LlamaCPP.accelerationMode to confirm which mode is active.

Error Handling

Handle errors gracefully

Best Practices

Optimization tips

​Overview

​SDK Initialization

​Basic Initialization

​Full Configuration

​Backend Registration

​Environment Modes

​Logging

​Configure Log Level

​Log Levels

​Events

​EventBus

​Event Types

​Model Sources

​How It Works

​CompactModelDef

​URL Resolution Rules

​Model Management

​Register Models

​Available Models on HuggingFace

​LLM Models

​VLM Models

​STT / TTS / VAD Models

​Download and Load

​Multi-Model Loading with coexist

​Storage (OPFS)

​Delete Models

​Audio Utilities

​AudioCapture (Microphone)

​AudioPlayback (Speaker)

​VideoCapture (Camera)

​Acceleration

​GPU Acceleration

​Related

Error Handling

Best Practices

Overview

SDK Initialization

Basic Initialization

Full Configuration

Backend Registration

Environment Modes

Logging

Configure Log Level

Log Levels

Events

EventBus

Event Types

Model Sources

How It Works

CompactModelDef

URL Resolution Rules

Model Management

Register Models

Available Models on HuggingFace

LLM Models

VLM Models

STT / TTS / VAD Models

Download and Load

Multi-Model Loading with `coexist`

Storage (OPFS)

Delete Models

Audio Utilities

AudioCapture (Microphone)

AudioPlayback (Speaker)

VideoCapture (Camera)

Acceleration

GPU Acceleration

Related