Overview
The RunAnywhere Kotlin SDK is a production-grade, on-device AI SDK for Android. It enables developers to run AI models directly on Android devices without requiring network connectivity for inference, ensuring minimal latency and maximum privacy for your users.The SDK provides a unified interface to multiple AI capabilities:LLM
Text generation with streaming support via Kotlin Flows
STT
Speech-to-text transcription with Whisper models
TTS
Neural voice synthesis with Sherpa-ONNX
VLM
Vision language models for image understanding
Tool Calling
Function calling with typed tool definitions
VAD
Real-time voice activity detection
Key Capabilities
- Multi-backend architecture – Choose from LlamaCPP (GGUF models) or ONNX Runtime
- GPU acceleration – Hardware-accelerated inference on supported devices
- Kotlin-first – Built with Coroutines and Flows for async operations
- Production-ready – Built-in analytics, logging, and model lifecycle management
Core Philosophy
On-Device First
On-Device First
All AI inference runs locally, ensuring low latency and data privacy. Once models are
downloaded, no network connection is required for inference.
Modular Architecture
Modular Architecture
Backend engines are optional modules—include only what you need. This keeps your APK size
minimal.
Privacy by Design
Privacy by Design
Audio and text data never leaves the device unless explicitly configured. Only anonymous
analytics are collected by default.
Platform Parity
Platform Parity
API mirrors the iOS Swift SDK for consistent cross-platform development.
Features
Language Models (LLM)
- On-device text generation with streaming support
- Kotlin Flow-based token streaming
- System prompts and customizable generation parameters
- Support for thinking/reasoning models
- LlamaCPP backend for GGUF models
Speech-to-Text (STT)
- Real-time streaming transcription
- Batch audio transcription
- Multi-language support
- Whisper-based models via ONNX Runtime
Text-to-Speech (TTS)
- Neural voice synthesis with Sherpa-ONNX
- System voices via Android TTS
- Streaming audio generation for long text
- Customizable voice, pitch, rate, and volume
Voice Activity Detection (VAD)
- Energy-based speech detection with Silero VAD
- Configurable sensitivity thresholds
- Real-time audio stream processing
Vision Language Models (VLM)
- Multimodal image + text inference with Kotlin Flow streaming
- VLMImage from file paths with photo picker integration
- Multi-file model registration with GGUF files
- Cancellation support and state checking
Tool Calling
- Register typed tool definitions via RunAnywhereToolCalling
- Automatic tool execution with configurable limits
- Multi-tool chaining for complex workflows
- ToolValue factory methods with string and number support
Voice Agent Pipeline
- Full VAD → STT → LLM → TTS orchestration
- Complete voice conversation flow
- Streaming and batch processing modes
System Requirements
| Platform | Minimum Version |
|---|---|
| Android | API 24 (7.0+) |
ARM64 devices are recommended for best performance. GPU acceleration provides significant speedups
on supported devices.
SDK Modules
| Module | Purpose |
|---|---|
runanywhere-core | Core SDK (required) |
runanywhere-llamacpp | LLM text generation with GGUF models |
runanywhere-onnx | STT/TTS/VAD via ONNX Runtime |