Skip to main content

Overview

This guide covers best practices for building performant, reliable, and user-friendly AI applications with the RunAnywhere SDK.

Model Selection

Choose the Right Model Size

Model SizeRAM RequiredUse CaseSpeed
360M–500M (Q8)~500MBQuick responses, chatVery Fast
1B–3B (Q4/Q6)1–2GBBalanced quality/speedFast
7B (Q4)4–5GBHigh qualitySlower
// For chat applications - use smaller, faster models
await LlamaCPP.addModel({
  id: 'smollm2-360m',
  name: 'SmolLM2 360M',
  url: 'https://huggingface.co/.../SmolLM2-360M.Q8_0.gguf',
  memoryRequirement: 500_000_000,
})

// For quality-critical tasks - use larger models
await LlamaCPP.addModel({
  id: 'qwen-1.5b',
  name: 'Qwen 1.5B',
  url: 'https://huggingface.co/.../qwen-1.5b-q4_k_m.gguf',
  memoryRequirement: 1_500_000_000,
})
Start with smaller models during development for faster iteration. Switch to larger models when quality becomes critical.

Quantization Trade-offs

QuantizationQualitySizeSpeed
Q8_0BestLargestSlower
Q6_KGreatLargeFast
Q4_K_MGoodMediumFaster
Q4_0AcceptableSmallFastest

Memory Management

Unload Unused Models

// Unload LLM when not in use
await RunAnywhere.unloadModel()

// Unload STT when not needed
await RunAnywhere.unloadSTTModel()

// Unload TTS when done
await RunAnywhere.unloadTTSModel()

Handle App Lifecycle

import { useEffect } from 'react'
import { AppState, AppStateStatus } from 'react-native'
import { RunAnywhere } from '@runanywhere/core'

function useModelLifecycle(modelId: string) {
  useEffect(() => {
    const subscription = AppState.addEventListener('change', (state: AppStateStatus) => {
      if (state === 'background') {
        // Free memory when app backgrounds
        RunAnywhere.unloadModel()
      } else if (state === 'active') {
        // Optionally reload when app returns
        // This depends on your UX requirements
      }
    })

    return () => subscription.remove()
  }, [modelId])
}

Check Memory Before Loading

async function safeLoadModel(modelId: string): Promise<boolean> {
  const modelInfo = await RunAnywhere.getModelInfo(modelId)
  const storage = await RunAnywhere.getStorageInfo()

  if (!modelInfo?.memoryRequired) {
    return false
  }

  // Check if we have enough free memory (with 20% buffer)
  const requiredWithBuffer = modelInfo.memoryRequired * 1.2

  if (storage.freeSpace < requiredWithBuffer) {
    console.warn('Insufficient memory for model')
    return false
  }

  await RunAnywhere.loadModel(modelInfo.localPath!)
  return true
}

Performance Optimization

Use Streaming for Better UX

// ❌ Bad: User waits for entire response
const result = await RunAnywhere.generate(prompt, { maxTokens: 500 })
setResponse(result.text)

// ✅ Good: User sees response as it's generated
const stream = await RunAnywhere.generateStream(prompt, { maxTokens: 500 })
for await (const token of stream.stream) {
  setResponse((prev) => prev + token)
}

Limit Token Generation

// For quick responses
const quick = await RunAnywhere.generate(prompt, {
  maxTokens: 100, // Short responses
  temperature: 0.5, // Faster sampling
})

// For detailed responses
const detailed = await RunAnywhere.generate(prompt, {
  maxTokens: 500,
  temperature: 0.7,
})

Pre-Download Models

Download models during onboarding for a better user experience:
OnboardingScreen.tsx
async function downloadModels(onProgress: (percent: number) => void) {
  const models = ['smollm2-360m', 'whisper-tiny-en', 'piper-en-lessac']

  for (let i = 0; i < models.length; i++) {
    await RunAnywhere.downloadModel(models[i], (progress) => {
      const overallProgress = (i + progress.progress) / models.length
      onProgress(overallProgress * 100)
    })
  }
}

Error Handling

Always Handle Errors Gracefully

async function generateSafely(prompt: string): Promise<string> {
  try {
    const result = await RunAnywhere.generate(prompt)
    return result.text
  } catch (error) {
    if (isSDKError(error)) {
      switch (error.code) {
        case SDKErrorCode.modelNotLoaded:
          // Try to load model and retry
          await loadDefaultModel()
          return generateSafely(prompt)
        case SDKErrorCode.insufficientMemory:
          // Use a smaller model
          return 'I need to use a smaller model. Please try again.'
        case SDKErrorCode.generationCancelled:
          return '' // Expected, don't show error
        default:
          return 'Sorry, I encountered an error. Please try again.'
      }
    }
    return 'An unexpected error occurred.'
  }
}

Provide User Feedback

// Show loading states
const [status, setStatus] = useState<'idle' | 'loading' | 'generating' | 'error'>('idle')

async function generate() {
  setStatus('loading')
  try {
    const result = await RunAnywhere.generate(prompt)
    setStatus('idle')
    return result
  } catch (error) {
    setStatus('error')
    throw error
  }
}

User Experience

Show Progress During Downloads

DownloadProgress.tsx
import React, { useState } from 'react'
import { View, Text, ActivityIndicator, StyleSheet } from 'react-native'
import { RunAnywhere, DownloadState } from '@runanywhere/core'

export function ModelDownloader({ modelId }: { modelId: string }) {
  const [progress, setProgress] = useState(0)
  const [status, setStatus] = useState<DownloadState>('queued')

  const download = async () => {
    await RunAnywhere.downloadModel(modelId, (p) => {
      setProgress(p.progress * 100)
      setStatus(p.state)
    })
  }

  return (
    <View style={styles.container}>
      {status === 'downloading' && (
        <>
          <ActivityIndicator />
          <Text>Downloading: {progress.toFixed(0)}%</Text>
        </>
      )}
      {status === 'extracting' && (
        <>
          <ActivityIndicator />
          <Text>Setting up model...</Text>
        </>
      )}
      {status === 'completed' && <Text>✅ Ready to use!</Text>}
    </View>
  )
}

Add Typing Indicators

TypingIndicator.tsx
export function TypingIndicator() {
  return (
    <View style={styles.container}>
      <Text style={styles.dot}></Text>
      <Text style={[styles.dot, styles.dot2]}></Text>
      <Text style={[styles.dot, styles.dot3]}></Text>
    </View>
  )
}

const styles = StyleSheet.create({
  container: { flexDirection: 'row', padding: 8 },
  dot: {
    fontSize: 24,
    marginHorizontal: 2,
    opacity: 0.3,
  },
  dot2: { animationDelay: '0.2s' },
  dot3: { animationDelay: '0.4s' },
})

Graceful Degradation

async function getAIResponse(prompt: string): Promise<string> {
  // Try on-device first
  if (await RunAnywhere.isModelLoaded()) {
    try {
      return (await RunAnywhere.generate(prompt)).text
    } catch {
      // Fall through to fallback
    }
  }

  // Fallback to simpler response
  return "I'm still setting up. Please try again in a moment."
}

Security & Privacy

Never Log Sensitive Data

// ❌ Bad: Logging user prompts
console.log('User prompt:', prompt)
logger.debug('Generating response for:', prompt)

// ✅ Good: Log only metadata
logger.debug('Generating response', { promptLength: prompt.length })

Use Development Mode Appropriately

// Development: Full logging, no auth
await RunAnywhere.initialize({
  environment: SDKEnvironment.Development,
})

// Production: Minimal logging, with auth
await RunAnywhere.initialize({
  environment: SDKEnvironment.Production,
  apiKey: process.env.RUNANYWHERE_API_KEY,
})

Disable Telemetry If Needed

// Telemetry is only in Production mode
// Use Development mode to disable all telemetry
await RunAnywhere.initialize({
  environment: SDKEnvironment.Development, // No telemetry
})

Testing

Use Smaller Models for Testing

// In tests, use the smallest available model
const testModel = 'smollm2-360m' // Fast for CI/CD

beforeAll(async () => {
  await RunAnywhere.initialize({ environment: SDKEnvironment.Development })
  LlamaCPP.register()
  // Download and load small model for tests
})

Mock for Unit Tests

// Create a mock for unit tests
jest.mock('@runanywhere/core', () => ({
  RunAnywhere: {
    generate: jest.fn().mockResolvedValue({
      text: 'Mock response',
      tokensUsed: 10,
      latencyMs: 100,
    }),
    chat: jest.fn().mockResolvedValue('Mock response'),
  },
}))

Summary Checklist

Choose appropriate model size for your use case
Use streaming for better perceived performance
Unload models when not in use
Handle all error cases gracefully
Show progress during downloads and loading
Pre-download models during onboarding
Test on actual target devices
Use Development mode to iterate quickly
Never log sensitive user data
Provide clear feedback for all operations