Overview
This guide covers best practices for building performant, reliable, and user-friendly AI applications with the RunAnywhere SDK.
Model Selection
Choose the Right Model Size
| Model Size | RAM Required | Use Case | Speed |
|---|
| 360M–500M (Q8) | ~500MB | Quick responses, chat | Very Fast |
| 1B–3B (Q4/Q6) | 1–2GB | Balanced quality/speed | Fast |
| 7B (Q4) | 4–5GB | High quality | Slower |
// For chat applications - use smaller, faster models
await LlamaCPP.addModel({
id: 'smollm2-360m',
name: 'SmolLM2 360M',
url: 'https://huggingface.co/.../SmolLM2-360M.Q8_0.gguf',
memoryRequirement: 500_000_000,
})
// For quality-critical tasks - use larger models
await LlamaCPP.addModel({
id: 'qwen-1.5b',
name: 'Qwen 1.5B',
url: 'https://huggingface.co/.../qwen-1.5b-q4_k_m.gguf',
memoryRequirement: 1_500_000_000,
})
Start with smaller models during development for faster iteration. Switch to larger models when
quality becomes critical.
Quantization Trade-offs
| Quantization | Quality | Size | Speed |
|---|
| Q8_0 | Best | Largest | Slower |
| Q6_K | Great | Large | Fast |
| Q4_K_M | Good | Medium | Faster |
| Q4_0 | Acceptable | Small | Fastest |
Memory Management
Unload Unused Models
// Unload LLM when not in use
await RunAnywhere.unloadModel()
// Unload STT when not needed
await RunAnywhere.unloadSTTModel()
// Unload TTS when done
await RunAnywhere.unloadTTSModel()
Handle App Lifecycle
import { useEffect } from 'react'
import { AppState, AppStateStatus } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'
function useModelLifecycle(modelId: string) {
useEffect(() => {
const subscription = AppState.addEventListener('change', (state: AppStateStatus) => {
if (state === 'background') {
// Free memory when app backgrounds
RunAnywhere.unloadModel()
} else if (state === 'active') {
// Optionally reload when app returns
// This depends on your UX requirements
}
})
return () => subscription.remove()
}, [modelId])
}
Check Memory Before Loading
async function safeLoadModel(modelId: string): Promise<boolean> {
const modelInfo = await RunAnywhere.getModelInfo(modelId)
const storage = await RunAnywhere.getStorageInfo()
if (!modelInfo?.memoryRequired) {
return false
}
// Check if we have enough free memory (with 20% buffer)
const requiredWithBuffer = modelInfo.memoryRequired * 1.2
if (storage.freeSpace < requiredWithBuffer) {
console.warn('Insufficient memory for model')
return false
}
await RunAnywhere.loadModel(modelInfo.localPath!)
return true
}
Use Streaming for Better UX
// ❌ Bad: User waits for entire response
const result = await RunAnywhere.generate(prompt, { maxTokens: 500 })
setResponse(result.text)
// ✅ Good: User sees response as it's generated
const stream = await RunAnywhere.generateStream(prompt, { maxTokens: 500 })
for await (const token of stream.stream) {
setResponse((prev) => prev + token)
}
Limit Token Generation
// For quick responses
const quick = await RunAnywhere.generate(prompt, {
maxTokens: 100, // Short responses
temperature: 0.5, // Faster sampling
})
// For detailed responses
const detailed = await RunAnywhere.generate(prompt, {
maxTokens: 500,
temperature: 0.7,
})
Pre-Download Models
Download models during onboarding for a better user experience:
async function downloadModels(onProgress: (percent: number) => void) {
const models = ['smollm2-360m', 'whisper-tiny-en', 'piper-en-lessac']
for (let i = 0; i < models.length; i++) {
await RunAnywhere.downloadModel(models[i], (progress) => {
const overallProgress = (i + progress.progress) / models.length
onProgress(overallProgress * 100)
})
}
}
Error Handling
Always Handle Errors Gracefully
async function generateSafely(prompt: string): Promise<string> {
try {
const result = await RunAnywhere.generate(prompt)
return result.text
} catch (error) {
if (isSDKError(error)) {
switch (error.code) {
case SDKErrorCode.modelNotLoaded:
// Try to load model and retry
await loadDefaultModel()
return generateSafely(prompt)
case SDKErrorCode.insufficientMemory:
// Use a smaller model
return 'I need to use a smaller model. Please try again.'
case SDKErrorCode.generationCancelled:
return '' // Expected, don't show error
default:
return 'Sorry, I encountered an error. Please try again.'
}
}
return 'An unexpected error occurred.'
}
}
Provide User Feedback
// Show loading states
const [status, setStatus] = useState<'idle' | 'loading' | 'generating' | 'error'>('idle')
async function generate() {
setStatus('loading')
try {
const result = await RunAnywhere.generate(prompt)
setStatus('idle')
return result
} catch (error) {
setStatus('error')
throw error
}
}
User Experience
Show Progress During Downloads
import React, { useState } from 'react'
import { View, Text, ActivityIndicator, StyleSheet } from 'react-native'
import { RunAnywhere, DownloadState } from '@runanywhere/core'
export function ModelDownloader({ modelId }: { modelId: string }) {
const [progress, setProgress] = useState(0)
const [status, setStatus] = useState<DownloadState>('queued')
const download = async () => {
await RunAnywhere.downloadModel(modelId, (p) => {
setProgress(p.progress * 100)
setStatus(p.state)
})
}
return (
<View style={styles.container}>
{status === 'downloading' && (
<>
<ActivityIndicator />
<Text>Downloading: {progress.toFixed(0)}%</Text>
</>
)}
{status === 'extracting' && (
<>
<ActivityIndicator />
<Text>Setting up model...</Text>
</>
)}
{status === 'completed' && <Text>✅ Ready to use!</Text>}
</View>
)
}
Add Typing Indicators
export function TypingIndicator() {
return (
<View style={styles.container}>
<Text style={styles.dot}>•</Text>
<Text style={[styles.dot, styles.dot2]}>•</Text>
<Text style={[styles.dot, styles.dot3]}>•</Text>
</View>
)
}
const styles = StyleSheet.create({
container: { flexDirection: 'row', padding: 8 },
dot: {
fontSize: 24,
marginHorizontal: 2,
opacity: 0.3,
},
dot2: { animationDelay: '0.2s' },
dot3: { animationDelay: '0.4s' },
})
Graceful Degradation
async function getAIResponse(prompt: string): Promise<string> {
// Try on-device first
if (await RunAnywhere.isModelLoaded()) {
try {
return (await RunAnywhere.generate(prompt)).text
} catch {
// Fall through to fallback
}
}
// Fallback to simpler response
return "I'm still setting up. Please try again in a moment."
}
Security & Privacy
Never Log Sensitive Data
// ❌ Bad: Logging user prompts
console.log('User prompt:', prompt)
logger.debug('Generating response for:', prompt)
// ✅ Good: Log only metadata
logger.debug('Generating response', { promptLength: prompt.length })
Use Development Mode Appropriately
// Development: Full logging, no auth
await RunAnywhere.initialize({
environment: SDKEnvironment.Development,
})
// Production: Minimal logging, with auth
await RunAnywhere.initialize({
environment: SDKEnvironment.Production,
apiKey: process.env.RUNANYWHERE_API_KEY,
})
Disable Telemetry If Needed
// Telemetry is only in Production mode
// Use Development mode to disable all telemetry
await RunAnywhere.initialize({
environment: SDKEnvironment.Development, // No telemetry
})
Testing
Use Smaller Models for Testing
// In tests, use the smallest available model
const testModel = 'smollm2-360m' // Fast for CI/CD
beforeAll(async () => {
await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
LlamaCPP.register()
// Download and load small model for tests
})
Mock for Unit Tests
// Create a mock for unit tests
jest.mock('@runanywhere/core', () => ({
RunAnywhere: {
generate: jest.fn().mockResolvedValue({
text: 'Mock response',
tokensUsed: 10,
latencyMs: 100,
}),
chat: jest.fn().mockResolvedValue('Mock response'),
},
}))
Summary Checklist
Choose appropriate model size for your use case
Use streaming for better perceived performance
Unload models when not in use
Handle all error cases gracefully
Show progress during downloads and loading
Pre-download models during onboarding
Test on actual target devices
Use Development mode to iterate quickly
Never log sensitive user data
Provide clear feedback for all operations