Introduction

Overview

The RunAnywhere Kotlin SDK is a production-grade, on-device AI SDK for Android. It enables developers to run AI models directly on Android devices without requiring network connectivity for inference, ensuring minimal latency and maximum privacy for your users.The SDK provides a unified interface to multiple AI capabilities:

LLM

Text generation with streaming support via Kotlin Flows

VLM

Vision language models for image understanding

LoRA

Hot-swap fine-tuned adapters at runtime

RAG

On-device retrieval-augmented generation

STT

Speech-to-text transcription with Whisper models

TTS

Neural voice synthesis with Sherpa-ONNX

Tool Calling

Function calling with typed tool definitions

VAD

Real-time voice activity detection

Voice Agent

Full VAD → STT → LLM → TTS pipeline

Key Capabilities

Multi-backend architecture – Choose from LlamaCPP (GGUF models) or ONNX Runtime
GPU acceleration – Hardware-accelerated inference on supported devices
Kotlin-first – Built with Coroutines and Flows for async operations
LoRA fine-tuning – Hot-swap adapters at runtime without reloading models
On-device RAG – Vector search and grounded generation, fully offline
Production-ready – Built-in analytics, logging, and model lifecycle management

Core Philosophy

On-Device First

All AI inference runs locally, ensuring low latency and data privacy. Once models are downloaded, no network connection is required for inference.

Modular Architecture

Backend engines are optional modules—include only what you need. This keeps your APK size minimal.

Privacy by Design

Audio and text data never leaves the device unless explicitly configured. Only anonymous analytics are collected by default.

Platform Parity

API mirrors the iOS Swift SDK for consistent cross-platform development.

Features

Language Models (LLM)

On-device text generation with streaming support
Kotlin Flow-based token streaming
System prompts and customizable generation parameters
Support for thinking/reasoning models
LlamaCPP backend for GGUF models

Speech-to-Text (STT)

Real-time streaming transcription
Batch audio transcription
Multi-language support
Whisper-based models via ONNX Runtime

Text-to-Speech (TTS)

Neural voice synthesis with Sherpa-ONNX
System voices via Android TTS
Streaming audio generation for long text
Customizable voice, pitch, rate, and volume

Voice Activity Detection (VAD)

Energy-based speech detection with Silero VAD
Configurable sensitivity thresholds
Real-time audio stream processing

Vision Language Models (VLM)

Multimodal image + text inference with Kotlin Flow streaming
VLMImage from file paths with photo picker integration
Multi-file model registration with GGUF files
Cancellation support and state checking

LoRA Adapters

Hot-swap fine-tuned adapters at runtime without reloading the base model
Stack multiple adapters simultaneously with independent scale factors
Compatibility checking before loading
Adapter catalog for discovery and management

RAG (Retrieval-Augmented Generation)

Fully on-device document Q&A pipeline
ONNX embedding models with vector similarity search
Automatic text chunking with configurable size and overlap
Grounded LLM generation with retrieved context
Timing metrics for retrieval and generation phases

Tool Calling

Register typed tool definitions via RunAnywhereToolCalling
Automatic tool execution with configurable limits
Multi-tool chaining for complex workflows
ToolValue factory methods with string and number support

Voice Agent Pipeline

Full VAD → STT → LLM → TTS orchestration
Complete voice conversation flow
Streaming and batch processing modes

System Requirements

Platform	Minimum Version
Android	API 24 (7.0+)

Kotlin: 2.0+Gradle: 8.0+

ARM64 devices are recommended for best performance. GPU acceleration provides significant speedups on supported devices.

SDK Modules

Module	Purpose
`runanywhere-core`	Core SDK (required)
`runanywhere-llamacpp`	LLM/VLM text generation + LoRA (GGUF)
`runanywhere-onnx`	STT/TTS/VAD via ONNX Runtime
`runanywhere-rag`	RAG pipeline (embeddings + vector search)

Overview

LLM

VLM

LoRA

RAG

STT

TTS

Tool Calling

VAD

Voice Agent

Key Capabilities

Core Philosophy

Features

Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Vision Language Models (VLM)

LoRA Adapters

RAG (Retrieval-Augmented Generation)

Tool Calling

Voice Agent Pipeline

System Requirements

SDK Modules

Architecture

Next Steps

Installation

Quick Start

Documentation Index

​Overview

LLM

VLM

LoRA

RAG

STT

TTS

Tool Calling

VAD

Voice Agent

​Key Capabilities

​Core Philosophy

​Features

​Language Models (LLM)

​Speech-to-Text (STT)

​Text-to-Speech (TTS)

​Voice Activity Detection (VAD)

​Vision Language Models (VLM)

​LoRA Adapters

​RAG (Retrieval-Augmented Generation)

​Tool Calling

​Voice Agent Pipeline

​System Requirements

​SDK Modules

​Architecture

​Next Steps

Installation

Quick Start

Overview

Key Capabilities

Core Philosophy

Features

Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Vision Language Models (VLM)

LoRA Adapters

RAG (Retrieval-Augmented Generation)

Tool Calling

Voice Agent Pipeline

System Requirements

SDK Modules

Architecture

Next Steps