> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runanywhere.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Best Practices

> Recommendations for optimal SDK usage

## Memory Management

<Tip>
  On-device AI models are memory-intensive. Proper memory management is critical for app stability.
</Tip>

### Load Only What You Need

```kotlin theme={null}
// ❌ Don't load multiple large models
RunAnywhere.loadLLMModel("model-3b")
RunAnywhere.loadSTTModel("whisper-large")

// ✅ Load one LLM at a time, use smaller models
RunAnywhere.loadLLMModel("model-0.5b")
```

### Unload When Not Needed

```kotlin theme={null}
// Unload models when switching tasks or backgrounding
override fun onStop() {
    super.onStop()
    lifecycleScope.launch {
        RunAnywhere.unloadLLMModel()
        RunAnywhere.unloadSTTModel()
    }
}
```

### Monitor Memory Before Loading

```kotlin theme={null}
val modelInfo = RunAnywhere.model(modelId)
val requiredMemory = modelInfo?.downloadSize ?: 0

val activityManager = getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
val memoryInfo = ActivityManager.MemoryInfo()
activityManager.getMemoryInfo(memoryInfo)

if (memoryInfo.availMem < requiredMemory * 1.5) {
    showWarning("Low memory - performance may be affected")
}
```

## Performance Optimization

### Use Quantized Models

| Model Type | Size     | Performance | Quality    |
| ---------- | -------- | ----------- | ---------- |
| Q8 (8-bit) | Larger   | Slower      | Best       |
| Q4 (4-bit) | Smaller  | Faster      | Good       |
| Q2 (2-bit) | Smallest | Fastest     | Acceptable |

```kotlin theme={null}
// ✅ Use Q4 models for mobile devices
val model = RunAnywhere.registerModel(
    name = "Qwen 0.5B Q4",
    url = "...qwen2.5-0.5b-instruct-q4_0.gguf",
    framework = InferenceFramework.LLAMA_CPP
)
```

### Use Streaming for Better UX

```kotlin theme={null}
// ✅ Stream tokens for perceived faster responses
RunAnywhere.generateStream(prompt)
    .collect { token ->
        textView.append(token)
    }

// ❌ Waiting for full response feels slow
val result = RunAnywhere.generate(prompt)
textView.text = result.text
```

### Set Appropriate Token Limits

```kotlin theme={null}
// ✅ Match maxTokens to your use case
val shortAnswer = LLMGenerationOptions(maxTokens = 50)   // Quick Q&A
val mediumAnswer = LLMGenerationOptions(maxTokens = 200) // General chat
val longForm = LLMGenerationOptions(maxTokens = 500)     // Stories/articles
```

## App Lifecycle

### Handle Background/Foreground

```kotlin theme={null}
class MyApplication : Application(), LifecycleEventObserver {

    override fun onCreate() {
        super.onCreate()
        ProcessLifecycleOwner.get().lifecycle.addObserver(this)
        RunAnywhere.initialize(environment = SDKEnvironment.PRODUCTION)
    }

    override fun onStateChanged(source: LifecycleOwner, event: Lifecycle.Event) {
        when (event) {
            Lifecycle.Event.ON_STOP -> {
                // App backgrounded - release resources
                CoroutineScope(Dispatchers.Main).launch {
                    RunAnywhere.stopVoiceSession()
                    RunAnywhere.cleanup()
                }
            }
            Lifecycle.Event.ON_START -> {
                // App foregrounded - reinitialize if needed
            }
            else -> {}
        }
    }
}
```

### Preload Models at Launch

```kotlin theme={null}
class SplashActivity : AppCompatActivity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        lifecycleScope.launch {
            // Show loading UI
            showLoading()

            // Preload commonly used models
            val models = listOf("qwen-0.5b", "whisper-tiny")

            models.forEach { modelId ->
                if (!RunAnywhere.isModelDownloaded(modelId)) {
                    RunAnywhere.downloadModel(modelId).collect { progress ->
                        updateProgress(modelId, progress.progress)
                    }
                }
            }

            // Pre-load the primary model
            RunAnywhere.loadLLMModel("qwen-0.5b")

            // Navigate to main screen
            startActivity(Intent(this@SplashActivity, MainActivity::class.java))
            finish()
        }
    }
}
```

## Error Handling

### Always Handle Errors

```kotlin theme={null}
// ✅ Comprehensive error handling
lifecycleScope.launch {
    try {
        val result = RunAnywhere.generate(prompt)
        showResponse(result.text)
    } catch (e: SDKError) {
        when (e.category) {
            ErrorCategory.MODEL -> promptModelDownload()
            ErrorCategory.STORAGE -> promptStorageCleanup()
            else -> showGenericError(e.message)
        }
    }
}
```

### Provide User Feedback

```kotlin theme={null}
// ✅ Show meaningful progress
RunAnywhere.downloadModel(modelId).collect { progress ->
    when (progress.state) {
        DownloadState.DOWNLOADING -> {
            progressBar.progress = (progress.progress * 100).toInt()
            statusText.text = "Downloading AI model..."
        }
        DownloadState.EXTRACTING -> {
            statusText.text = "Preparing model..."
        }
        DownloadState.COMPLETED -> {
            statusText.text = "Ready!"
        }
        DownloadState.ERROR -> {
            showError("Download failed: ${progress.error}")
        }
    }
}
```

## Testing

### Test on Real Devices

```kotlin theme={null}
// Simulators don't accurately reflect:
// - Memory constraints
// - CPU performance
// - Thermal throttling
// Always test on physical devices before release
```

### Measure Performance

```kotlin theme={null}
val result = RunAnywhere.generate(prompt, options)

Log.d("Performance", """
    Model: ${result.modelUsed}
    Tokens: ${result.tokensUsed}
    Speed: ${result.tokensPerSecond} tok/s
    Latency: ${result.latencyMs}ms
    TTFT: ${result.timeToFirstTokenMs}ms
""".trimIndent())
```

## Security

### Protect API Keys

```kotlin theme={null}
// ❌ Don't hardcode API keys
RunAnywhere.initialize(apiKey = "sk-12345...")

// ✅ Use BuildConfig or secure storage
RunAnywhere.initialize(apiKey = BuildConfig.RUNANYWHERE_API_KEY)
```

### Clear Sensitive Data

```kotlin theme={null}
// Clear conversation history when appropriate
RunAnywhere.clearVoiceConversation()

// Reset SDK to clear all state
RunAnywhere.reset()
```

## Android-Specific Gotchas

### Initialization Order

The SDK initialization on Android requires a strict sequence:

1. `AndroidPlatformContext.initialize(this)` — sets up Android storage paths
2. `RunAnywhere.initialize(environment = SDKEnvironment.DEVELOPMENT)` — SDK init
3. `CppBridgeModelPaths.setBaseDirectory(path)` — model storage path
4. `LlamaCPP.register(priority = 100)` — LLM/VLM backend
5. `ONNX.register(priority = 100)` — STT/TTS backend
6. `ModelService.registerDefaultModels()` — register model definitions

### LlamaCPP VLM Registration May Fail

Wrap `LlamaCPP.register()` in a try/catch. VLM native registration may fail if the `.so` library doesn't include `nativeRegisterVlm`, but LLM text generation still works:

```kotlin theme={null}
try {
    LlamaCPP.register(priority = 100)
} catch (e: Exception) {
    Log.w("SDK", "VLM registration failed, LLM still works: ${e.message}")
}
```

### isVLMModelLoaded is a Property

Unlike other model state checks which are suspend functions (`isLLMModelLoaded()`, `isSTTModelLoaded()`, `isTTSVoiceLoaded()`), `isVLMModelLoaded` is a direct property access — not a suspend function.

### JitPack Repository Required

The RunAnywhere SDK has transitive dependencies (`android-vad`, `PRDownloader`) hosted on JitPack. Add `maven { url = uri("https://jitpack.io") }` to your `settings.gradle.kts` repositories.

### Audio Format for STT

STT requires 16kHz mono PCM 16-bit audio. TTS output is WAV format. The voice pipeline assumes a 22050 Hz sample rate for TTS playback.

### VLM Image Path Workaround

`VLMImage.fromFilePath()` requires a file path, not a content URI. Images from the photo picker must be saved to a temporary file first:

```kotlin theme={null}
val bitmap = /* decode from content URI */
val tempFile = File(cacheDir, "vlm_temp.jpg")
bitmap.compress(Bitmap.CompressFormat.JPEG, 90, tempFile.outputStream())
val vlmImage = VLMImage.fromFilePath(tempFile.absolutePath)
```

## Summary Checklist

* [ ] Use quantized models (Q4) for mobile devices
* [ ] Unload models when backgrounding the app
* [ ] Use streaming for long text generation
* [ ] Handle all error categories appropriately
* [ ] Test on physical devices, not simulators
* [ ] Preload commonly used models at app startup
* [ ] Monitor memory before loading large models
* [ ] Secure API keys using BuildConfig or secure storage