If you only need LLM text generation, you can skip @runanywhere/web-onnx. If you only need
STT/TTS/VAD, you can skip @runanywhere/web-llamacpp. The core @runanywhere/web package is
always required.
The starter app uses Vite. Here is the complete vite.config.ts that handles WASM serving, Cross-Origin Isolation, Web Workers, and production builds:
vite.config.ts
Copy
Ask AI
import { defineConfig, type Plugin } from 'vite'import react from '@vitejs/plugin-react'import path from 'path'import fs from 'fs'import { fileURLToPath } from 'url'const __dir = path.dirname(fileURLToPath(import.meta.url))/** * Copies WASM binaries from @runanywhere npm packages into dist/assets/ * for production builds. In dev mode Vite serves node_modules directly. */function copyWasmPlugin(): Plugin { const llamacppWasm = path.resolve(__dir, 'node_modules/@runanywhere/web-llamacpp/wasm') const onnxWasm = path.resolve(__dir, 'node_modules/@runanywhere/web-onnx/wasm') return { name: 'copy-wasm', writeBundle(options) { const outDir = options.dir ?? path.resolve(__dir, 'dist') const assetsDir = path.join(outDir, 'assets') fs.mkdirSync(assetsDir, { recursive: true }) // LlamaCpp WASM binaries (LLM/VLM) for (const file of [ 'racommons-llamacpp.wasm', 'racommons-llamacpp.js', 'racommons-llamacpp-webgpu.wasm', 'racommons-llamacpp-webgpu.js', ]) { const src = path.join(llamacppWasm, file) if (fs.existsSync(src)) { fs.copyFileSync(src, path.join(assetsDir, file)) } } // Sherpa-ONNX: copy all files in sherpa/ subdirectory (STT/TTS/VAD) const sherpaDir = path.join(onnxWasm, 'sherpa') const sherpaOut = path.join(assetsDir, 'sherpa') if (fs.existsSync(sherpaDir)) { fs.mkdirSync(sherpaOut, { recursive: true }) for (const file of fs.readdirSync(sherpaDir)) { fs.copyFileSync(path.join(sherpaDir, file), path.join(sherpaOut, file)) } } }, }}export default defineConfig({ plugins: [react(), copyWasmPlugin()], server: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'credentialless', }, }, assetsInclude: ['**/*.wasm'], worker: { format: 'es' }, optimizeDeps: { // CRITICAL: exclude WASM packages from pre-bundling so import.meta.url // resolves correctly for automatic WASM file discovery exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'], },})
optimizeDeps.exclude is critical. Without excluding the WASM packages from Vite’s
pre-bundling, import.meta.url resolves to the wrong paths and WASM files won’t be found at
runtime. This is the most common cause of “WASM not loading” errors with Vite.
These enable SharedArrayBuffer, which is required for multi-threaded WASM. Without them, the SDK falls back to single-threaded mode.
Always use credentialless, NOT require-corp for COEP. Using require-corp will break WASM
loading in most setups because it requires every sub-resource (including Vite’s /@fs/ served
files, CDN assets, fonts, and worker scripts) to include a Cross-Origin-Resource-Policy header.
In practice, require-corp causes silent failures where module scripts get blocked with
“non-JavaScript MIME type” errors. Use credentialless — it enables SharedArrayBuffer without
breaking cross-origin resource loading.
Iframe environments (Replit, CodeSandbox, StackBlitz): The Cross-Origin-Opener-Policy: same-origin header breaks iframe embedding because the parent and child frames are on different
origins. In these environments, SharedArrayBuffer will NOT be available regardless of your
header configuration. The SDK will fall back to single-threaded WASM mode, which still works
but is slower. This is an environment limitation, not a bug. When accessed directly (not in an
iframe), the headers work correctly.
The sherpa-onnx WASM module is only loaded when you call ONNX.register(). If you only need LLM
text generation, you don’t need @runanywhere/web-onnx at all.
Safari has known reliability issues with OPFS. Mobile browsers have memory constraints that limit
larger models. Chrome/Edge 120+ is recommended for the best experience.
Cause: Missing Cross-Origin Isolation headers.Fix: Add the COOP/COEP headers to your server configuration. The SDK will still work in single-threaded mode without them, but performance will be reduced.
In iframe-based environments (Replit preview, CodeSandbox, StackBlitz), SharedArrayBuffer is
unavailable even with correct headers because COOP: same-origin conflicts with iframe embedding.
The SDK still works in single-threaded mode. Access the app directly (not through the iframe preview)
for full multi-threaded performance.
Cause: Bundler not configured correctly, or optimizeDeps.exclude missing for Vite.Fix: For Vite, ensure @runanywhere/web-llamacpp and @runanywhere/web-onnx are in optimizeDeps.exclude. For other bundlers, configure .wasm as static assets.
Cause: Your server has a SPA catch-all route (e.g., Express app.get('*', (req, res) => res.sendFile('index.html'))) that serves HTML for any unmatched path, including .wasm file requests. The WASM compiler then receives HTML bytes (3c 21 44 4f = <!DO…) instead of the binary, causing a cryptic error.Error message:CompileError: WebAssembly.instantiate(): expected magic word 00 61 73 6d, found 3c 21 44 4fFix: Ensure your server serves .wasm files with the correct MIME type before the SPA catch-all. For Express:
Copy
Ask AI
import express from 'express'const app = express()// Serve static assets BEFORE the catch-all — wasm files need correct MIME typeapp.use(express.static('dist/public', { setHeaders: (res, filePath) => { if (filePath.endsWith('.wasm')) { res.setHeader('Content-Type', 'application/wasm') } },}))// SPA catch-all AFTER static filesapp.get('*', (req, res) => { // Only serve index.html for non-asset requests if (!req.path.match(/\.(js|css|wasm|json|png|jpg|svg|ico|woff2?)$/)) { res.sendFile('index.html', { root: 'dist/public' }) } else { res.status(404).end() }})
This is the #1 production deployment issue. The copyWasmPlugin() correctly copies .wasm
files to dist/assets/, but if your server’s catch-all route intercepts the request first, the
browser receives HTML instead of the WASM binary. Always serve static assets before SPA routing.
Cause: The VLM Web Worker script URL resolves to a path that returns HTML (same catch-all issue as above), or Vite’s ?worker&url import isn’t configured correctly.Error message:Failed to load module script: The server responded with a non-JavaScript MIME type of "text/html"Fix:
Ensure worker: { format: 'es' } is in your Vite config
Ensure the catch-all route doesn’t intercept .js file requests (see fix above)
For the ?worker&url import, add a TypeScript declaration:
Cause: Incognito/Private mode or browser eviction.Fix: Ensure you are not in private browsing mode. Safari has known OPFS issues — Chrome/Edge is recommended.
Cause: Downloading models larger than ~200MB can exhaust available browser memory, especially on memory-constrained devices or when other tabs are open. The OPFS write operation buffers the entire model before committing.Fix:
Close other browser tabs to free memory before downloading large models
Start with smaller models (350M-500M parameter models are typically under 300MB)
Monitor model.downloadProgress events to detect stalls
If the tab crashes during download, refresh and retry — OPFS supports resuming from partial downloads
Cause: The SDK tries to load racommons-llamacpp-webgpu.wasm for GPU acceleration but it may not be available.Fix: This is harmless. The SDK gracefully falls back to CPU mode. You can suppress the 404 by ensuring the WebGPU WASM files are copied to your assets directory.