The VLM (Vision Language Model) module enables multimodal inference — you can feed an image and a text prompt to get descriptions, answers, or analysis. It uses llama.cpp’s mtmd (multimodal) backend compiled to WebAssembly and runs inference in a dedicated Web Worker to keep the UI responsive.
VLM inference runs in a dedicated Web Worker for responsiveness. You need to set up the worker bridge during SDK initialization:
runanywhere.ts
import { RunAnywhere, SDKEnvironment } from '@runanywhere/web'import { LlamaCPP, VLMWorkerBridge } from '@runanywhere/web-llamacpp'// Vite bundles the worker as a standalone JS chunk and returns its URL// @ts-ignore — Vite-specific ?worker&url queryimport vlmWorkerUrl from './workers/vlm-worker?worker&url'await RunAnywhere.initialize({ environment: SDKEnvironment.Development, debug: true })await LlamaCPP.register()// Wire up VLM workerVLMWorkerBridge.shared.workerUrl = vlmWorkerUrlRunAnywhere.setVLMLoader({ get isInitialized() { return VLMWorkerBridge.shared.isInitialized }, init: () => VLMWorkerBridge.shared.init(), loadModel: (params) => VLMWorkerBridge.shared.loadModel(params), unloadModel: () => VLMWorkerBridge.shared.unloadModel(),})
workers/vlm-worker.ts
import { startVLMWorkerRuntime } from '@runanywhere/web-llamacpp'startVLMWorkerRuntime()
The ?worker&url import syntax is Vite-specific and not recognized by TypeScript, requiring a
@ts-ignore directive. For other bundlers, you may need to configure the worker URL differently.
“non-JavaScript MIME type” error for VLM worker: If you see Failed to load module script: The server responded with a non-JavaScript MIME type of "text/html", the worker URL is resolving
to your SPA’s index.html instead of the actual JavaScript file. This typically happens when:
Your server’s catch-all route intercepts .js file requests
The worker file isn’t included in the production build output
worker: { format: 'es' } is missing from your Vite config
Use VLMWorkerBridge for the best user experience — inference runs in a Web Worker so the UI stays responsive:
import { ModelManager, ModelCategory } from '@runanywhere/web'import { VLMWorkerBridge, VideoCapture } from '@runanywhere/web-llamacpp'// Ensure VLM model is downloaded and loadedawait ModelManager.downloadModel('lfm2-vl-450m-q4_0')await ModelManager.loadModel('lfm2-vl-450m-q4_0')// Capture a frame from the cameraconst camera = new VideoCapture({ facingMode: 'environment' })await camera.start()// IMPORTANT: Wait for video to be ready before capturingawait new Promise<void>((resolve) => { const video = camera.videoElement if (video.videoWidth > 0) { resolve() } else { video.addEventListener('loadedmetadata', () => resolve(), { once: true }) }})const frame = camera.captureFrame(256) // downscale to 256px max// Process the frameif (frame && VLMWorkerBridge.shared.isModelLoaded) { const result = await VLMWorkerBridge.shared.process( frame.rgbPixels, frame.width, frame.height, 'Describe what you see briefly.', { maxTokens: 60, temperature: 0.7 } ) console.log(result.text)}camera.stop()
Always wait for camera readiness before calling captureFrame(). The video stream takes time
to initialize after camera.start() resolves. If you call captureFrame() before the video has
valid dimensions, you’ll get Error: Failed to execute 'getImageData' on 'CanvasRenderingContext2D': The source width is 0. Wait for the loadedmetadata event or check
videoElement.videoWidth > 0.
VLMWorkerBridge.process() does NOT support systemPrompt in its options. The options only
accept maxTokens and temperature. To include system-level instructions, prepend them to the
prompt parameter directly:
// WRONG — systemPrompt is silently ignored:await VLMWorkerBridge.shared.process(pixels, w, h, 'Describe this.', { systemPrompt: 'You are a helpful assistant.', // ❌ Not supported maxTokens: 60,})// CORRECT — include instructions in the prompt:const prompt = 'You are a helpful assistant. Describe what you see in this image.'await VLMWorkerBridge.shared.process(pixels, w, h, prompt, { maxTokens: 60 })
The standalone VLMGenerationOptions type (below) does include systemPrompt, but that is for
the native VLM API, not for VLMWorkerBridge.process() which uses a simplified options type.
The VideoCapture class is in @runanywhere/web-llamacpp:
import { VideoCapture } from '@runanywhere/web-llamacpp'const camera = new VideoCapture({ facingMode: 'environment', // or 'user' for selfie camera})await camera.start()// Add the video preview to the DOMdocument.getElementById('preview')!.appendChild(camera.videoElement)// Capture a frame (downscaled to 256px max dimension)const frame = camera.captureFrame(256)// frame.rgbPixels: Uint8Array (RGB, no alpha)// frame.width, frame.height: actual dimensions// Check stateconsole.log('Capturing:', camera.isCapturing)camera.stop()
The camera stream takes a moment to initialize after start() resolves. Always guard against zero-dimension frames:
const camera = new VideoCapture({ facingMode: 'user' })await camera.start()// Option 1: Wait for loadedmetadata eventawait new Promise<void>((resolve) => { const video = camera.videoElement if (video.videoWidth > 0 && video.videoHeight > 0) { resolve() return } video.addEventListener('loadedmetadata', () => resolve(), { once: true })})// Option 2: Guard in captureFrame callsconst frame = camera.captureFrame(256)if (!frame || frame.width === 0 || frame.height === 0) { console.warn('Camera not ready yet, skipping frame') return}
Calling captureFrame() before the video stream is fully initialized causes: Failed to execute 'getImageData' on 'CanvasRenderingContext2D': The source width is 0. This commonly happens in
React components that call captureFrame() immediately after start() without waiting for video
dimensions to be valid.
VLM models require two files: the main model GGUF and a multimodal projector (mmproj) GGUF. Register them using the files array — the first file is the main model, the second is the projector:
VLM image encoding is computationally expensive in WASM and can occasionally trigger memory access out of bounds errors. Always wrap VLM calls in try/catch and handle these gracefully:
try { const result = await VLMWorkerBridge.shared.process(/* ... */)} catch (err) { const msg = (err as Error).message if (msg.includes('memory access out of bounds') || msg.includes('RuntimeError')) { // Recoverable — the next frame will usually work console.warn('WASM memory crash, will retry') }}