> ## Documentation Index > Fetch the complete documentation index at: https://docs.runanywhere.ai/llms.txt > Use this file to discover all available pages before exploring further. # Best Practices > Tips for building great browser AI experiences **Early Beta** -- The Web SDK is in early beta. APIs may change between releases. ## Overview This guide covers best practices for building performant, reliable, and user-friendly AI applications with the RunAnywhere Web SDK in the browser. ## Model Selection ### Choose the Right Model Size | Model Size | RAM Required | Use Case | Speed | | -------------- | ------------ | ---------------------- | --------- | | 360M-500M (Q4) | \~300-500MB | Quick responses, chat | Very Fast | | 1B-3B (Q4) | 1-2GB | Balanced quality/speed | Fast | | 7B (Q4) | 4-5GB | High quality | Slower | Browser memory is more limited than native apps. Models larger than 2GB may cause tab crashes on devices with limited RAM. Start with smaller models and test on target devices. ### Quantization Trade-offs | Quantization | Quality | Size | Speed | | ------------ | ---------- | ------- | ------- | | Q8\_0 | Best | Largest | Slower | | Q6\_K | Great | Large | Fast | | Q4\_K\_M | Good | Medium | Faster | | Q4\_0 | Acceptable | Small | Fastest | For browser use, Q4\_0 and Q4\_K\_M offer the best balance of quality and memory efficiency. Start with smaller quantizations and only increase if output quality is insufficient. ## Performance Optimization ### Use Streaming for Better UX ```typescript theme={null} import { TextGeneration } from '@runanywhere/web-llamacpp' // Bad: User waits for entire response const result = await TextGeneration.generate(prompt, { maxTokens: 500 }) document.getElementById('output')!.textContent = result.text // Good: User sees response as it's generated const { stream } = await TextGeneration.generateStream(prompt, { maxTokens: 500 }) let text = '' for await (const token of stream) { text += token document.getElementById('output')!.textContent = text } ``` ### Enable Cross-Origin Isolation Multi-threaded WASM is significantly faster. Always configure COOP/COEP headers: ```typescript theme={null} import { LlamaCPP } from '@runanywhere/web-llamacpp' // Check after backend registration await LlamaCPP.register() console.log('Acceleration:', LlamaCPP.accelerationMode) // 'webgpu' or 'cpu' if (!crossOriginIsolated) { console.warn('Running in single-threaded mode. Add COOP/COEP headers for better performance.') } ``` ### Exclude WASM Packages from Vite Pre-Bundling This is the most common gotcha with Vite: ```typescript vite.config.ts theme={null} export default defineConfig({ optimizeDeps: { exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'], }, }) ``` Without this, `import.meta.url` resolves to the wrong paths and WASM files won't be found. ### Limit Token Generation ```typescript theme={null} import { TextGeneration } from '@runanywhere/web-llamacpp' // For quick responses const { stream } = await TextGeneration.generateStream(prompt, { maxTokens: 100, temperature: 0.5, }) // For detailed responses const { stream: detailedStream } = await TextGeneration.generateStream(prompt, { maxTokens: 500, temperature: 0.7, }) ``` ### Batch DOM Updates For fast token generation, throttle UI updates to avoid rendering bottlenecks: ```typescript theme={null} let pending = '' let frameId: number | null = null function appendToken(token: string) { pending += token if (!frameId) { frameId = requestAnimationFrame(() => { document.getElementById('output')!.textContent = pending frameId = null }) } } ``` ## Model Management ### Use OPFS for Persistent Storage Download models to OPFS so they persist across browser sessions: ```typescript theme={null} import { RunAnywhere, ModelManager, ModelCategory, LLMFramework, EventBus } from '@runanywhere/web' RunAnywhere.registerModels([ { id: 'lfm2-350m-q4_k_m', name: 'LFM2 350M', repo: 'LiquidAI/LFM2-350M-GGUF', files: ['LFM2-350M-Q4_K_M.gguf'], framework: LLMFramework.LlamaCpp, modality: ModelCategory.Language, memoryRequirement: 250_000_000, }, ]) // First visit: download model await ModelManager.downloadModel('lfm2-350m-q4_k_m') await ModelManager.loadModel('lfm2-350m-q4_k_m') // Subsequent visits: model loads from OPFS (no re-download) ``` ### Show Download Progress ```typescript theme={null} import { EventBus } from '@runanywhere/web' EventBus.shared.on('model.downloadProgress', (evt) => { const percent = ((evt.progress ?? 0) * 100).toFixed(0) document.getElementById('progress')!.textContent = `Downloading: ${percent}%` }) ``` ### Handle Large Model Downloads Models over \~200MB can crash the browser tab, especially on memory-constrained devices. Mitigations: ```typescript theme={null} // Check available memory before downloading if ('deviceMemory' in navigator) { const memoryGB = (navigator as any).deviceMemory const modelSizeMB = 250 if (memoryGB < 4 && modelSizeMB > 200) { console.warn('Low memory device — large model downloads may fail.') } } // Monitor for download stalls (no progress for 30s may indicate trouble) let lastProgress = 0 let lastTime = Date.now() EventBus.shared.on('model.downloadProgress', (evt) => { const progress = evt.progress ?? 0 if (progress > lastProgress) { lastProgress = progress lastTime = Date.now() } else if (Date.now() - lastTime > 30000) { console.warn('Download appears stalled. Tab may be running low on memory.') } }) ``` If the browser tab crashes during a model download, the partial download is stored in OPFS. On the next attempt, `ModelManager.downloadModel()` will resume from where it left off. Recommend starting with smaller models (LFM2 350M at \~250MB) before attempting larger ones. ### Use `coexist` for Multi-Model Loading When loading multiple models (e.g., for voice pipeline), pass `coexist: true`: ```typescript theme={null} await ModelManager.loadModel('silero-vad-v5', { coexist: true }) await ModelManager.loadModel('sherpa-onnx-whisper-tiny.en', { coexist: true }) await ModelManager.loadModel('lfm2-350m-q4_k_m', { coexist: true }) await ModelManager.loadModel('vits-piper-en_US-lessac-medium', { coexist: true }) ``` ### Idempotent SDK Initialization Wrap initialization in a cached-promise pattern so it's safe to call from multiple components: ```typescript theme={null} let _initPromise: Promise | null = null export async function initSDK(): Promise { if (_initPromise) return _initPromise _initPromise = (async () => { await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true }) await LlamaCPP.register() await ONNX.register() RunAnywhere.registerModels(MODELS) })() return _initPromise } ``` ## Hosted IDE & Iframe Environments ### Replit, CodeSandbox, StackBlitz These platforms run your app inside an iframe, which has important implications: | Limitation | Impact | Workaround | | ---------------------- | ------------------------------------- | -------------------------------------------- | | No `SharedArrayBuffer` | WASM runs single-threaded (slower) | Access app directly, not via iframe preview | | Memory constraints | Large models (>200MB) may crash | Use smaller models (350M-500M params, Q4\_0) | | COOP header conflict | `same-origin` breaks iframe embedding | SDK falls back gracefully; no action needed | | Preview URL differs | CORS/COEP may behave differently | Test with the published URL, not the preview | **Do not use `COEP: require-corp`** in hosted IDE environments. It will block Vite's internal `/@fs/` module serving and cause "non-JavaScript MIME type" errors for worker scripts and WASM glue files. Always use `COEP: credentialless`. ### SPA Routing and Static Assets When using a custom Express/Node server with SPA catch-all routing, **static asset routes must come before the catch-all**. Otherwise, `.wasm`, `.js`, and worker files get served as `index.html`: ```typescript theme={null} // WRONG ORDER — catch-all swallows WASM requests: app.get('*', (req, res) => res.sendFile('index.html')) app.use(express.static('dist')) // Never reached for .wasm files // CORRECT ORDER — static files served first: app.use( express.static('dist', { setHeaders: (res, path) => { if (path.endsWith('.wasm')) { res.setHeader('Content-Type', 'application/wasm') } }, }) ) app.get('*', (req, res) => { if (!req.path.match(/\.(js|css|wasm|json|png|svg|woff2?)$/)) { res.sendFile('index.html', { root: 'dist' }) } else { res.status(404).end() } }) ``` ## Browser-Specific Considerations ### Handle Tab Visibility Cancel in-progress generation when the tab is hidden: ```typescript theme={null} import { TextGeneration } from '@runanywhere/web-llamacpp' document.addEventListener('visibilitychange', () => { if (document.hidden) { TextGeneration.cancel() } }) ``` ### Handle Memory Pressure ```typescript theme={null} if ('deviceMemory' in navigator) { const memory = (navigator as any).deviceMemory // GB if (memory < 4) { console.warn('Low memory device. Use smaller models.') } } ``` ### Safari Considerations * OPFS has known reliability issues in Safari -- test thoroughly * WebGPU is not available in Safari (as of early 2026) * Prefer Chrome/Edge for the best experience ### Mobile Browser Considerations * Mobile browsers have stricter memory limits * Models larger than 1GB may cause tab crashes * Use Q4\_0 quantization for mobile * Test on actual mobile devices ### Camera Permission Handling Provide clear error messages for camera permissions (important for VLM): ```typescript theme={null} try { const camera = new VideoCapture({ facingMode: 'environment' }) await camera.start() } catch (err) { const msg = (err as Error).message if (msg.includes('NotAllowed') || msg.includes('Permission')) { alert('Camera permission denied. Check System Settings → Privacy & Security → Camera.') } else if (msg.includes('NotFound')) { alert('No camera found on this device.') } else if (msg.includes('NotReadable')) { alert('Camera is in use by another application.') } } ``` ## Error Handling ### Always Handle Errors Gracefully ```typescript theme={null} import { SDKError, SDKErrorCode } from '@runanywhere/web' import { TextGeneration } from '@runanywhere/web-llamacpp' async function generateSafely(prompt: string): Promise { try { const { stream, result } = await TextGeneration.generateStream(prompt, { maxTokens: 200 }) let text = '' for await (const token of stream) { text += token } return text } catch (err) { if (err instanceof SDKError) { switch (err.code) { case SDKErrorCode.ModelNotLoaded: return 'Please load a model first.' case SDKErrorCode.GenerationCancelled: return '' default: return 'Sorry, an error occurred. Please try again.' } } return 'An unexpected error occurred.' } } ``` ## Security & Privacy ### All Data Stays Local The Web SDK runs entirely in the browser via WebAssembly. No data is sent to any server. This is a key advantage for privacy-sensitive applications. ### Use Correct Environment Mode ```typescript theme={null} // Development: Full logging await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true }) // Production: Minimal logging await RunAnywhere.initialize({ environment: SDKEnvironment.Production }) ``` ## Vite Gotchas Summary | Issue | Fix | | -------------------------------- | ---------------------------------------------------------------------------------- | | WASM files not found | Add `optimizeDeps.exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx']` | | Web Workers not bundling | Add `worker: { format: 'es' }` | | WASM not served in dev | Add `assetsInclude: ['**/*.wasm']` | | VLM Worker TypeScript error | Use `@ts-ignore` above `?worker&url` import | | WASM missing in production build | Add `copyWasmPlugin()` to copy WASM to `dist/assets/` | | Single-threaded mode | Add COOP/COEP headers to `server.headers` | | WASM served as HTML in prod | Ensure static file serving comes **before** SPA catch-all route | | Worker "MIME type" error | Use `COEP: credentialless` (NOT `require-corp`); fix SPA catch-all route ordering | | Camera "source width is 0" | Wait for `loadedmetadata` event before calling `captureFrame()` | ## Summary Checklist Install all three packages: `@runanywhere/web`, `web-llamacpp`, `web-onnx` Register backends with `LlamaCPP.register()` and `ONNX.register()` Choose appropriate model size for browser memory constraints Use streaming for better perceived performance Configure Cross-Origin Isolation headers (`COEP: credentialless`, NOT `require-corp`) Add `optimizeDeps.exclude` in Vite config for WASM packages Add `copyWasmPlugin()` to copy WASM files for production builds Serve static assets (`.wasm`, `.js`) BEFORE SPA catch-all routes Set `Content-Type: application/wasm` for `.wasm` files on custom servers Use OPFS for persistent model storage via `ModelManager` Use `coexist: true` when loading multiple models simultaneously Wait for `loadedmetadata` before calling `VideoCapture.captureFrame()` For voice pipeline: load all 4 models (VAD + STT + LLM + TTS) — VAD alone is not STT Handle all error cases gracefully including WASM memory crashes Show progress during model downloads via `EventBus` Batch DOM updates during fast token streaming Test on target browsers and devices (not just iframe previews) ## Related SDK configuration Handle errors gracefully Getting started guide