> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runanywhere.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Best Practices

> Tips for building great browser AI experiences

<Note>**Early Beta** -- The Web SDK is in early beta. APIs may change between releases.</Note>

## Overview

This guide covers best practices for building performant, reliable, and user-friendly AI applications with the RunAnywhere Web SDK in the browser.

## Model Selection

### Choose the Right Model Size

| Model Size     | RAM Required | Use Case               | Speed     |
| -------------- | ------------ | ---------------------- | --------- |
| 360M-500M (Q4) | \~300-500MB  | Quick responses, chat  | Very Fast |
| 1B-3B (Q4)     | 1-2GB        | Balanced quality/speed | Fast      |
| 7B (Q4)        | 4-5GB        | High quality           | Slower    |

<Warning>
  Browser memory is more limited than native apps. Models larger than 2GB may cause tab crashes on
  devices with limited RAM. Start with smaller models and test on target devices.
</Warning>

### Quantization Trade-offs

| Quantization | Quality    | Size    | Speed   |
| ------------ | ---------- | ------- | ------- |
| Q8\_0        | Best       | Largest | Slower  |
| Q6\_K        | Great      | Large   | Fast    |
| Q4\_K\_M     | Good       | Medium  | Faster  |
| Q4\_0        | Acceptable | Small   | Fastest |

<Tip>
  For browser use, Q4\_0 and Q4\_K\_M offer the best balance of quality and memory efficiency. Start
  with smaller quantizations and only increase if output quality is insufficient.
</Tip>

## Performance Optimization

### Use Streaming for Better UX

```typescript theme={null}
import { TextGeneration } from '@runanywhere/web-llamacpp'

// Bad: User waits for entire response
const result = await TextGeneration.generate(prompt, { maxTokens: 500 })
document.getElementById('output')!.textContent = result.text

// Good: User sees response as it's generated
const { stream } = await TextGeneration.generateStream(prompt, { maxTokens: 500 })
let text = ''
for await (const token of stream) {
  text += token
  document.getElementById('output')!.textContent = text
}
```

### Enable Cross-Origin Isolation

Multi-threaded WASM is significantly faster. Always configure COOP/COEP headers:

```typescript theme={null}
import { LlamaCPP } from '@runanywhere/web-llamacpp'

// Check after backend registration
await LlamaCPP.register()
console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu'

if (!crossOriginIsolated) {
  console.warn('Running in single-threaded mode. Add COOP/COEP headers for better performance.')
}
```

### Exclude WASM Packages from Vite Pre-Bundling

This is the most common gotcha with Vite:

```typescript vite.config.ts theme={null}
export default defineConfig({
  optimizeDeps: {
    exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
  },
})
```

Without this, `import.meta.url` resolves to the wrong paths and WASM files won't be found.

### Limit Token Generation

```typescript theme={null}
import { TextGeneration } from '@runanywhere/web-llamacpp'

// For quick responses
const { stream } = await TextGeneration.generateStream(prompt, {
  maxTokens: 100,
  temperature: 0.5,
})

// For detailed responses
const { stream: detailedStream } = await TextGeneration.generateStream(prompt, {
  maxTokens: 500,
  temperature: 0.7,
})
```

### Batch DOM Updates

For fast token generation, throttle UI updates to avoid rendering bottlenecks:

```typescript theme={null}
let pending = ''
let frameId: number | null = null

function appendToken(token: string) {
  pending += token
  if (!frameId) {
    frameId = requestAnimationFrame(() => {
      document.getElementById('output')!.textContent = pending
      frameId = null
    })
  }
}
```

## Model Management

### Use OPFS for Persistent Storage

Download models to OPFS so they persist across browser sessions:

```typescript theme={null}
import { RunAnywhere, ModelManager, ModelCategory, LLMFramework, EventBus } from '@runanywhere/web'

RunAnywhere.registerModels([
  {
    id: 'lfm2-350m-q4_k_m',
    name: 'LFM2 350M',
    repo: 'LiquidAI/LFM2-350M-GGUF',
    files: ['LFM2-350M-Q4_K_M.gguf'],
    framework: LLMFramework.LlamaCpp,
    modality: ModelCategory.Language,
    memoryRequirement: 250_000_000,
  },
])

// First visit: download model
await ModelManager.downloadModel('lfm2-350m-q4_k_m')
await ModelManager.loadModel('lfm2-350m-q4_k_m')

// Subsequent visits: model loads from OPFS (no re-download)
```

### Show Download Progress

```typescript theme={null}
import { EventBus } from '@runanywhere/web'

EventBus.shared.on('model.downloadProgress', (evt) => {
  const percent = ((evt.progress ?? 0) * 100).toFixed(0)
  document.getElementById('progress')!.textContent = `Downloading: ${percent}%`
})
```

### Handle Large Model Downloads

Models over \~200MB can crash the browser tab, especially on memory-constrained devices. Mitigations:

```typescript theme={null}
// Check available memory before downloading
if ('deviceMemory' in navigator) {
  const memoryGB = (navigator as any).deviceMemory
  const modelSizeMB = 250

  if (memoryGB < 4 && modelSizeMB > 200) {
    console.warn('Low memory device — large model downloads may fail.')
  }
}

// Monitor for download stalls (no progress for 30s may indicate trouble)
let lastProgress = 0
let lastTime = Date.now()

EventBus.shared.on('model.downloadProgress', (evt) => {
  const progress = evt.progress ?? 0
  if (progress > lastProgress) {
    lastProgress = progress
    lastTime = Date.now()
  } else if (Date.now() - lastTime > 30000) {
    console.warn('Download appears stalled. Tab may be running low on memory.')
  }
})
```

<Warning>
  If the browser tab crashes during a model download, the partial download is stored in OPFS. On the
  next attempt, `ModelManager.downloadModel()` will resume from where it left off. Recommend
  starting with smaller models (LFM2 350M at \~250MB) before attempting larger ones.
</Warning>

### Use `coexist` for Multi-Model Loading

When loading multiple models (e.g., for voice pipeline), pass `coexist: true`:

```typescript theme={null}
await ModelManager.loadModel('silero-vad-v5', { coexist: true })
await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true })
await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true })
await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true })
```

### Idempotent SDK Initialization

Wrap initialization in a cached-promise pattern so it's safe to call from multiple components:

```typescript theme={null}
let _initPromise: Promise<void> | null = null

export async function initSDK(): Promise<void> {
  if (_initPromise) return _initPromise
  _initPromise = (async () => {
    await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true })
    await LlamaCPP.register()
    await ONNX.register()
    RunAnywhere.registerModels(MODELS)
  })()
  return _initPromise
}
```

## Hosted IDE & Iframe Environments

### Replit, CodeSandbox, StackBlitz

These platforms run your app inside an iframe, which has important implications:

| Limitation             | Impact                                | Workaround                                   |
| ---------------------- | ------------------------------------- | -------------------------------------------- |
| No `SharedArrayBuffer` | WASM runs single-threaded (slower)    | Access app directly, not via iframe preview  |
| Memory constraints     | Large models (>200MB) may crash       | Use smaller models (350M-500M params, Q4\_0) |
| COOP header conflict   | `same-origin` breaks iframe embedding | SDK falls back gracefully; no action needed  |
| Preview URL differs    | CORS/COEP may behave differently      | Test with the published URL, not the preview |

<Warning>
  **Do not use `COEP: require-corp`** in hosted IDE environments. It will block Vite's internal
  `/@fs/` module serving and cause "non-JavaScript MIME type" errors for worker scripts and WASM
  glue files. Always use `COEP: credentialless`.
</Warning>

### SPA Routing and Static Assets

When using a custom Express/Node server with SPA catch-all routing, **static asset routes must come before the catch-all**. Otherwise, `.wasm`, `.js`, and worker files get served as `index.html`:

```typescript theme={null}
// WRONG ORDER — catch-all swallows WASM requests:
app.get('*', (req, res) => res.sendFile('index.html'))
app.use(express.static('dist')) // Never reached for .wasm files

// CORRECT ORDER — static files served first:
app.use(
  express.static('dist', {
    setHeaders: (res, path) => {
      if (path.endsWith('.wasm')) {
        res.setHeader('Content-Type', 'application/wasm')
      }
    },
  })
)
app.get('*', (req, res) => {
  if (!req.path.match(/\.(js|css|wasm|json|png|svg|woff2?)$/)) {
    res.sendFile('index.html', { root: 'dist' })
  } else {
    res.status(404).end()
  }
})
```

## Browser-Specific Considerations

### Handle Tab Visibility

Cancel in-progress generation when the tab is hidden:

```typescript theme={null}
import { TextGeneration } from '@runanywhere/web-llamacpp'

document.addEventListener('visibilitychange', () => {
  if (document.hidden) {
    TextGeneration.cancel()
  }
})
```

### Handle Memory Pressure

```typescript theme={null}
if ('deviceMemory' in navigator) {
  const memory = (navigator as any).deviceMemory // GB
  if (memory < 4) {
    console.warn('Low memory device. Use smaller models.')
  }
}
```

### Safari Considerations

* OPFS has known reliability issues in Safari -- test thoroughly
* WebGPU is not available in Safari (as of early 2026)
* Prefer Chrome/Edge for the best experience

### Mobile Browser Considerations

* Mobile browsers have stricter memory limits
* Models larger than 1GB may cause tab crashes
* Use Q4\_0 quantization for mobile
* Test on actual mobile devices

### Camera Permission Handling

Provide clear error messages for camera permissions (important for VLM):

```typescript theme={null}
try {
  const camera = new VideoCapture({ facingMode: 'environment' })
  await camera.start()
} catch (err) {
  const msg = (err as Error).message
  if (msg.includes('NotAllowed') || msg.includes('Permission')) {
    alert('Camera permission denied. Check System Settings → Privacy & Security → Camera.')
  } else if (msg.includes('NotFound')) {
    alert('No camera found on this device.')
  } else if (msg.includes('NotReadable')) {
    alert('Camera is in use by another application.')
  }
}
```

## Error Handling

### Always Handle Errors Gracefully

```typescript theme={null}
import { SDKError, SDKErrorCode } from '@runanywhere/web'
import { TextGeneration } from '@runanywhere/web-llamacpp'

async function generateSafely(prompt: string): Promise<string> {
  try {
    const { stream, result } = await TextGeneration.generateStream(prompt, { maxTokens: 200 })
    let text = ''
    for await (const token of stream) {
      text += token
    }
    return text
  } catch (err) {
    if (err instanceof SDKError) {
      switch (err.code) {
        case SDKErrorCode.ModelNotLoaded:
          return 'Please load a model first.'
        case SDKErrorCode.GenerationCancelled:
          return ''
        default:
          return 'Sorry, an error occurred. Please try again.'
      }
    }
    return 'An unexpected error occurred.'
  }
}
```

## Security & Privacy

### All Data Stays Local

The Web SDK runs entirely in the browser via WebAssembly. No data is sent to any server. This is a key advantage for privacy-sensitive applications.

### Use Correct Environment Mode

```typescript theme={null}
// Development: Full logging
await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true })

// Production: Minimal logging
await RunAnywhere.initialize({ environment: SDKEnvironment.Production })
```

## Vite Gotchas Summary

| Issue                            | Fix                                                                                |
| -------------------------------- | ---------------------------------------------------------------------------------- |
| WASM files not found             | Add `optimizeDeps.exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx']` |
| Web Workers not bundling         | Add `worker: { format: 'es' }`                                                     |
| WASM not served in dev           | Add `assetsInclude: ['**/*.wasm']`                                                 |
| VLM Worker TypeScript error      | Use `@ts-ignore` above `?worker&url` import                                        |
| WASM missing in production build | Add `copyWasmPlugin()` to copy WASM to `dist/assets/`                              |
| Single-threaded mode             | Add COOP/COEP headers to `server.headers`                                          |
| WASM served as HTML in prod      | Ensure static file serving comes **before** SPA catch-all route                    |
| Worker "MIME type" error         | Use `COEP: credentialless` (NOT `require-corp`); fix SPA catch-all route ordering  |
| Camera "source width is 0"       | Wait for `loadedmetadata` event before calling `captureFrame()`                    |

## Summary Checklist

<Check>Install all three packages: `@runanywhere/web`, `web-llamacpp`, `web-onnx`</Check>
<Check>Register backends with `LlamaCPP.register()` and `ONNX.register()`</Check>
<Check>Choose appropriate model size for browser memory constraints</Check>
<Check>Use streaming for better perceived performance</Check>
<Check>Configure Cross-Origin Isolation headers (`COEP: credentialless`, NOT `require-corp`)</Check>
<Check>Add `optimizeDeps.exclude` in Vite config for WASM packages</Check>
<Check>Add `copyWasmPlugin()` to copy WASM files for production builds</Check>
<Check>Serve static assets (`.wasm`, `.js`) BEFORE SPA catch-all routes</Check>
<Check>Set `Content-Type: application/wasm` for `.wasm` files on custom servers</Check>
<Check>Use OPFS for persistent model storage via `ModelManager`</Check>
<Check>Use `coexist: true` when loading multiple models simultaneously</Check>
<Check>Wait for `loadedmetadata` before calling `VideoCapture.captureFrame()`</Check>
<Check>For voice pipeline: load all 4 models (VAD + STT + LLM + TTS) — VAD alone is not STT</Check>
<Check>Handle all error cases gracefully including WASM memory crashes</Check>
<Check>Show progress during model downloads via `EventBus`</Check>
<Check>Batch DOM updates during fast token streaming</Check>
<Check>Test on target browsers and devices (not just iframe previews)</Check>

## Related

<CardGroup cols={2}>
  <Card title="Configuration" icon="gear" href="/web/configuration">
    SDK configuration
  </Card>

  <Card title="Error Handling" icon="triangle-exclamation" href="/web/error-handling">
    Handle errors gracefully
  </Card>

  <Card title="Quick Start" icon="rocket" href="/web/quick-start">
    Getting started guide
  </Card>
</CardGroup>
