Skip to main content
Use generate() for full control over text generation with detailed performance metrics.
val result = RunAnywhere.generate(
    prompt = "Write a haiku about Kotlin programming",
    options = LLMGenerationOptions(
        maxTokens = 50,
        temperature = 1.0f,
        topP = 0.9f,
        stopSequences = listOf("###")
    )
)

println("Response: ${result.text}")
println("Model: ${result.modelUsed}")
println("Tokens: ${result.tokensUsed}")
println("Speed: ${result.tokensPerSecond} tok/s")
println("Latency: ${result.latencyMs}ms")

// For reasoning models (e.g., models with thinking capability)
result.thinkingContent?.let { thinking ->
    println("Reasoning: $thinking")
}

LLMGenerationResult

The result object contains comprehensive metrics:
PropertyTypeDescription
textStringGenerated response text
thinkingContentString?Reasoning content (for thinking models)
inputTokensIntNumber of prompt tokens
tokensUsedIntNumber of output tokens
modelUsedStringModel ID used for generation
latencyMsDoubleTotal generation time in milliseconds
tokensPerSecondDoubleGeneration speed
timeToFirstTokenMsDouble?Time to first token (streaming)
frameworkString?Inference framework used

LLMGenerationOptions

Customize generation behavior:
data class LLMGenerationOptions(
    val maxTokens: Int = 100,           // Maximum tokens to generate
    val temperature: Float = 0.8f,      // Creativity (0.0-2.0)
    val topP: Float = 1.0f,             // Nucleus sampling (0.0-1.0)
    val stopSequences: List<String> = emptyList(),
    val streamingEnabled: Boolean = false,
    val systemPrompt: String? = null    // System behavior prompt
)

Example: Creative Writing

val story = RunAnywhere.generate(
    prompt = "Write a short story about a robot learning to paint",
    options = LLMGenerationOptions(
        maxTokens = 500,
        temperature = 1.2f,   // Higher = more creative
        topP = 0.95f
    )
)

Example: Factual Response

val facts = RunAnywhere.generate(
    prompt = "List the planets in our solar system",
    options = LLMGenerationOptions(
        maxTokens = 200,
        temperature = 0.1f,   // Lower = more deterministic
        topP = 0.5f
    )
)

Cancel Generation

Cancel an ongoing generation:
// Start generation in a coroutine
val job = lifecycleScope.launch {
    val result = RunAnywhere.generate(longPrompt, options)
}

// Cancel if needed
cancelButton.setOnClickListener {
    RunAnywhere.cancelGeneration()
    job.cancel()
}