Skip to main content
Generate text with detailed metrics including latency, token count, and generation speed.
final result = await RunAnywhere.generate(
  'Explain quantum computing in simple terms',
  options: LLMGenerationOptions(
    maxTokens: 200,
    temperature: 0.7,
  ),
);

print('Response: ${result.text}');
print('Tokens: ${result.tokensUsed}');
print('Speed: ${result.tokensPerSecond.toStringAsFixed(1)} tok/s');
print('Latency: ${result.latencyMs.toStringAsFixed(0)}ms');

LLMGenerationOptions

ParameterTypeDefaultDescription
maxTokensint100Maximum tokens to generate
temperaturedouble0.8Randomness (0.0–2.0)
topPdouble1.0Nucleus sampling parameter
stopSequencesList<String>[]Stop generation at these
systemPromptString?nullSystem prompt for context
const options = LLMGenerationOptions(
  maxTokens: 256,
  temperature: 0.7,
  topP: 0.95,
  stopSequences: ['END', '###'],
  systemPrompt: 'You are a helpful coding assistant.',
);

LLMGenerationResult

PropertyTypeDescription
textStringGenerated text
thinkingContentString?Thinking content (if supported)
inputTokensintNumber of input tokens
tokensUsedintNumber of output tokens
modelUsedStringModel ID used
latencyMsdoubleTotal latency in milliseconds
tokensPerSeconddoubleGeneration speed
timeToFirstTokenMsdouble?Time to first token

Thinking Models

Some models support “thinking” tokens for chain-of-thought reasoning:
LlamaCpp.addModel(
  id: 'qwen-cot',
  name: 'Qwen CoT',
  url: '...',
  supportsThinking: true,  // Enable thinking token parsing
);

final result = await RunAnywhere.generate('Solve: 2x + 5 = 15');

if (result.thinkingContent != null) {
  print('Reasoning: ${result.thinkingContent}');
}
print('Answer: ${result.text}');

See Also