LoRA (Low-Rank Adaptation) lets you apply lightweight fine-tuned adapters to a loaded base model at runtime. Swap adapters instantly without reloading the full model — perfect for switching between domain-specific behaviors like medical QA, creative writing, or code generation.
// 1. Load a base model firstRunAnywhere.loadLLMModel("qwen-0.5b")// 2. Check compatibilityval compat = RunAnywhere.checkLoraCompatibility("/path/to/adapter.gguf")if (!compat.isCompatible) { println("Incompatible: ${compat.error}") return}// 3. Apply a LoRA adapterRunAnywhere.loadLoraAdapter(LoRAAdapterConfig( path = "/path/to/adapter.gguf", scale = 1.0f))// 4. Generate with the adapter appliedval result = RunAnywhere.generate("What are the symptoms of diabetes?")println(result.text)
A base LLM model must be loaded before applying LoRA adapters. Calling loadLoraAdapter()
without a loaded model will throw an SDKError.
@Serializabledata class LoRAAdapterConfig( val path: String, // Path to LoRA adapter GGUF file (required) val scale: Float = 1.0f // Scale factor (0.0 to 1.0+))
Parameter
Type
Default
Description
path
String
—
Path to the LoRA adapter .gguf file. Must not be blank.
@Serializabledata class LoRAAdapterInfo( val path: String, // Path used when loading val scale: Float, // Active scale factor val applied: Boolean // Whether applied to current context)
suspend fun RunAnywhere.loadLoraAdapter(config: LoRAAdapterConfig)
Loads and applies a LoRA adapter to the currently loaded model. Context is recreated internally.Throws:SDKError if no model is loaded or loading fails.
Check compatibility first — always call checkLoraCompatibility() before loading to avoid
errors - Adapter loading recreates context — expect ~100-300ms latency when loading/removing
adapters - KV cache is cleared — conversation history is reset when adapters change -
Adapters are lightweight — typically 10-50MB vs. multi-GB base models - Scale tuning —
start with 1.0 and adjust down if the adapter overfits