Best Practices - RunAnywhere Documentation

Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

This guide covers best practices for building performant, reliable, and user-friendly AI applications with the RunAnywhere Web SDK in the browser.

Model Selection

Choose the Right Model Size

Model Size	RAM Required	Use Case	Speed
360M-500M (Q4)	~300-500MB	Quick responses, chat	Very Fast
1B-3B (Q4)	1-2GB	Balanced quality/speed	Fast
7B (Q4)	4-5GB	High quality	Slower

Browser memory is more limited than native apps. Models larger than 2GB may cause tab crashes on devices with limited RAM. Start with smaller models and test on target devices.

Quantization Trade-offs

Quantization	Quality	Size	Speed
Q8_0	Best	Largest	Slower
Q6_K	Great	Large	Fast
Q4_K_M	Good	Medium	Faster
Q4_0	Acceptable	Small	Fastest

For browser use, Q4_0 and Q4_K_M offer the best balance of quality and memory efficiency. Start with smaller quantizations and only increase if output quality is insufficient.

Performance Optimization

Use Streaming for Better UX

import { TextGeneration } from '@runanywhere/web-llamacpp'

// Bad: User waits for entire response
const result = await TextGeneration.generate(prompt, { maxTokens: 500 })
document.getElementById('output')!.textContent = result.text

// Good: User sees response as it's generated
const { stream } = await TextGeneration.generateStream(prompt, { maxTokens: 500 })
let text = ''
for await (const token of stream) {
  text += token
  document.getElementById('output')!.textContent = text
}

Enable Cross-Origin Isolation

Multi-threaded WASM is significantly faster. Always configure COOP/COEP headers:

import { LlamaCPP } from '@runanywhere/web-llamacpp'

// Check after backend registration
await LlamaCPP.register()
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'

if (!crossOriginIsolated) {
  console.warn('Running in single-threaded mode. Add COOP/COEP headers for better performance.')
}

Exclude WASM Packages from Vite Pre-Bundling

This is the most common gotcha with Vite:

vite.config.ts

export default defineConfig({
  optimizeDeps: {
    exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
  },
})

Without this, import.meta.url resolves to the wrong paths and WASM files won’t be found.

Limit Token Generation

import { TextGeneration } from '@runanywhere/web-llamacpp'

// For quick responses
const { stream } = await TextGeneration.generateStream(prompt, {
  maxTokens: 100,
  temperature: 0.5,
})

// For detailed responses
const { stream: detailedStream } = await TextGeneration.generateStream(prompt, {
  maxTokens: 500,
  temperature: 0.7,
})

Batch DOM Updates

For fast token generation, throttle UI updates to avoid rendering bottlenecks:

let pending = ''
let frameId: number | null = null

function appendToken(token: string) {
  pending += token
  if (!frameId) {
    frameId = requestAnimationFrame(() => {
      document.getElementById('output')!.textContent = pending
      frameId = null
    })
  }
}

Model Management

Use OPFS for Persistent Storage

Download models to OPFS so they persist across browser sessions:

import { RunAnywhere, ModelManager, ModelCategory, LLMFramework, EventBus } from '@runanywhere/web'

RunAnywhere.registerModels([
  {
    id: 'lfm2-350m-q4_k_m',
    name: 'LFM2 350M',
    repo: 'LiquidAI/LFM2-350M-GGUF',
    files: ['LFM2-350M-Q4_K_M.gguf'],
    framework: LLMFramework.LlamaCpp,
    modality: ModelCategory.Language,
    memoryRequirement: 250_000_000,
  },
])

// First visit: download model
await ModelManager.downloadModel('lfm2-350m-q4_k_m')
await ModelManager.loadModel('lfm2-350m-q4_k_m')

// Subsequent visits: model loads from OPFS (no re-download)

Show Download Progress

import { EventBus } from '@runanywhere/web'

EventBus.shared.on('model.downloadProgress', (evt) => {
  const percent = ((evt.progress ?? 0) * 100).toFixed(0)
  document.getElementById('progress')!.textContent = `Downloading: ${percent}%`
})

Handle Large Model Downloads

Models over ~200MB can crash the browser tab, especially on memory-constrained devices. Mitigations:

// Check available memory before downloading
if ('deviceMemory' in navigator) {
  const memoryGB = (navigator as any).deviceMemory
  const modelSizeMB = 250

  if (memoryGB < 4 && modelSizeMB > 200) {
    console.warn('Low memory device — large model downloads may fail.')
  }
}

// Monitor for download stalls (no progress for 30s may indicate trouble)
let lastProgress = 0
let lastTime = Date.now()

EventBus.shared.on('model.downloadProgress', (evt) => {
  const progress = evt.progress ?? 0
  if (progress > lastProgress) {
    lastProgress = progress
    lastTime = Date.now()
  } else if (Date.now() - lastTime > 30000) {
    console.warn('Download appears stalled. Tab may be running low on memory.')
  }
})

If the browser tab crashes during a model download, the partial download is stored in OPFS. On the next attempt, ModelManager.downloadModel() will resume from where it left off. Recommend starting with smaller models (LFM2 350M at ~250MB) before attempting larger ones.

Use `coexist` for Multi-Model Loading

When loading multiple models (e.g., for voice pipeline), pass coexist: true:

await ModelManager.loadModel('silero-vad-v5', { coexist: true })
await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true })
await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true })
await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true })

Idempotent SDK Initialization

Wrap initialization in a cached-promise pattern so it’s safe to call from multiple components:

let _initPromise: Promise<void> | null = null

export async function initSDK(): Promise<void> {
  if (_initPromise) return _initPromise
  _initPromise = (async () => {
    await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true })
    await LlamaCPP.register()
    await ONNX.register()
    RunAnywhere.registerModels(MODELS)
  })()
  return _initPromise
}

Hosted IDE & Iframe Environments

Replit, CodeSandbox, StackBlitz

These platforms run your app inside an iframe, which has important implications:

Limitation	Impact	Workaround
No `SharedArrayBuffer`	WASM runs single-threaded (slower)	Access app directly, not via iframe preview
Memory constraints	Large models (>200MB) may crash	Use smaller models (350M-500M params, Q4_0)
COOP header conflict	`same-origin` breaks iframe embedding	SDK falls back gracefully; no action needed
Preview URL differs	CORS/COEP may behave differently	Test with the published URL, not the preview

Do not use COEP: require-corp in hosted IDE environments. It will block Vite’s internal /@fs/ module serving and cause “non-JavaScript MIME type” errors for worker scripts and WASM glue files. Always use COEP: credentialless.

SPA Routing and Static Assets

When using a custom Express/Node server with SPA catch-all routing, static asset routes must come before the catch-all. Otherwise, .wasm, .js, and worker files get served as index.html:

// WRONG ORDER — catch-all swallows WASM requests:
app.get('*', (req, res) => res.sendFile('index.html'))
app.use(express.static('dist')) // Never reached for .wasm files

// CORRECT ORDER — static files served first:
app.use(
  express.static('dist', {
    setHeaders: (res, path) => {
      if (path.endsWith('.wasm')) {
        res.setHeader('Content-Type', 'application/wasm')
      }
    },
  })
)
app.get('*', (req, res) => {
  if (!req.path.match(/\.(js|css|wasm|json|png|svg|woff2?)$/)) {
    res.sendFile('index.html', { root: 'dist' })
  } else {
    res.status(404).end()
  }
})

Browser-Specific Considerations

Handle Tab Visibility

Cancel in-progress generation when the tab is hidden:

import { TextGeneration } from '@runanywhere/web-llamacpp'

document.addEventListener('visibilitychange', () => {
  if (document.hidden) {
    TextGeneration.cancel()
  }
})

Handle Memory Pressure

if ('deviceMemory' in navigator) {
  const memory = (navigator as any).deviceMemory // GB
  if (memory < 4) {
    console.warn('Low memory device. Use smaller models.')
  }
}

Safari Considerations

OPFS has known reliability issues in Safari — test thoroughly
WebGPU is not available in Safari (as of early 2026)
Prefer Chrome/Edge for the best experience

Mobile Browser Considerations

Mobile browsers have stricter memory limits
Models larger than 1GB may cause tab crashes
Use Q4_0 quantization for mobile
Test on actual mobile devices

Camera Permission Handling

Provide clear error messages for camera permissions (important for VLM):

try {
  const camera = new VideoCapture({ facingMode: 'environment' })
  await camera.start()
} catch (err) {
  const msg = (err as Error).message
  if (msg.includes('NotAllowed') || msg.includes('Permission')) {
    alert('Camera permission denied. Check System Settings → Privacy & Security → Camera.')
  } else if (msg.includes('NotFound')) {
    alert('No camera found on this device.')
  } else if (msg.includes('NotReadable')) {
    alert('Camera is in use by another application.')
  }
}

Error Handling

Always Handle Errors Gracefully

import { SDKError, SDKErrorCode } from '@runanywhere/web'
import { TextGeneration } from '@runanywhere/web-llamacpp'

async function generateSafely(prompt: string): Promise<string> {
  try {
    const { stream, result } = await TextGeneration.generateStream(prompt, { maxTokens: 200 })
    let text = ''
    for await (const token of stream) {
      text += token
    }
    return text
  } catch (err) {
    if (err instanceof SDKError) {
      switch (err.code) {
        case SDKErrorCode.ModelNotLoaded:
          return 'Please load a model first.'
        case SDKErrorCode.GenerationCancelled:
          return ''
        default:
          return 'Sorry, an error occurred. Please try again.'
      }
    }
    return 'An unexpected error occurred.'
  }
}

Security & Privacy

All Data Stays Local

The Web SDK runs entirely in the browser via WebAssembly. No data is sent to any server. This is a key advantage for privacy-sensitive applications.

Use Correct Environment Mode

// Development: Full logging
await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true })

// Production: Minimal logging
await RunAnywhere.initialize({ environment: SDKEnvironment.Production })

Vite Gotchas Summary

Issue	Fix
WASM files not found	Add `optimizeDeps.exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx']`
Web Workers not bundling	Add `worker: { format: 'es' }`
WASM not served in dev	Add `assetsInclude: ['*/.wasm']`
VLM Worker TypeScript error	Use `@ts-ignore` above `?worker&url` import
WASM missing in production build	Add `copyWasmPlugin()` to copy WASM to `dist/assets/`
Single-threaded mode	Add COOP/COEP headers to `server.headers`
WASM served as HTML in prod	Ensure static file serving comes before SPA catch-all route
Worker “MIME type” error	Use `COEP: credentialless` (NOT `require-corp`); fix SPA catch-all route ordering
Camera “source width is 0”	Wait for `loadedmetadata` event before calling `captureFrame()`

Summary Checklist

Install all three packages: @runanywhere/web, web-llamacpp, web-onnx

Choose appropriate model size for browser memory constraints

Use streaming for better perceived performance

Configure Cross-Origin Isolation headers (COEP: credentialless, NOT require-corp)

Add optimizeDeps.exclude in Vite config for WASM packages

Add copyWasmPlugin() to copy WASM files for production builds

Serve static assets (.wasm, .js) BEFORE SPA catch-all routes

Set Content-Type: application/wasm for .wasm files on custom servers

Use OPFS for persistent model storage via ModelManager

Use coexist: true when loading multiple models simultaneously

Wait for loadedmetadata before calling VideoCapture.captureFrame()

For voice pipeline: load all 4 models (VAD + STT + LLM + TTS) — VAD alone is not STT

Handle all error cases gracefully including WASM memory crashes

Show progress during model downloads via EventBus

Batch DOM updates during fast token streaming

Test on target browsers and devices (not just iframe previews)

Configuration

SDK configuration

Error Handling

Handle errors gracefully

Quick Start

Getting started guide

​Overview

​Model Selection

​Choose the Right Model Size

​Quantization Trade-offs

​Performance Optimization

​Use Streaming for Better UX

​Enable Cross-Origin Isolation

​Exclude WASM Packages from Vite Pre-Bundling

​Limit Token Generation

​Batch DOM Updates

​Model Management

​Use OPFS for Persistent Storage

​Show Download Progress

​Handle Large Model Downloads

​Use coexist for Multi-Model Loading

​Idempotent SDK Initialization

​Hosted IDE & Iframe Environments

​Replit, CodeSandbox, StackBlitz

​SPA Routing and Static Assets

​Browser-Specific Considerations

​Handle Tab Visibility

​Handle Memory Pressure

​Safari Considerations

​Mobile Browser Considerations

​Camera Permission Handling

​Error Handling

​Always Handle Errors Gracefully

​Security & Privacy

​All Data Stays Local

​Use Correct Environment Mode

​Vite Gotchas Summary

​Summary Checklist

​Related

Configuration

Error Handling

Quick Start

Overview

Model Selection

Choose the Right Model Size

Quantization Trade-offs

Performance Optimization

Use Streaming for Better UX

Enable Cross-Origin Isolation

Exclude WASM Packages from Vite Pre-Bundling

Limit Token Generation

Batch DOM Updates

Model Management

Use OPFS for Persistent Storage

Show Download Progress

Handle Large Model Downloads

Use `coexist` for Multi-Model Loading

Idempotent SDK Initialization

Hosted IDE & Iframe Environments

Replit, CodeSandbox, StackBlitz

SPA Routing and Static Assets

Browser-Specific Considerations

Handle Tab Visibility

Handle Memory Pressure

Safari Considerations

Mobile Browser Considerations

Camera Permission Handling

Error Handling

Always Handle Errors Gracefully

Security & Privacy

All Data Stays Local

Use Correct Environment Mode

Vite Gotchas Summary

Summary Checklist

Related