Early Beta — The Web SDK is in early beta. APIs may change between releases.
Overview
The generate() method provides full control over text generation with customizable options and detailed performance metrics. Use this for production applications where you need fine-grained control.
Basic Usage
import { TextGeneration } from '@runanywhere/web'
const result = await TextGeneration.generate('Explain quantum computing in simple terms', {
maxTokens: 200,
temperature: 0.7,
})
console.log('Response:', result.text)
console.log('Tokens used:', result.tokensUsed)
console.log('Speed:', result.tokensPerSecond.toFixed(1), 'tok/s')
console.log('Latency:', result.latencyMs, 'ms')
API Reference
await TextGeneration.generate(
prompt: string,
options?: LLMGenerationOptions
): Promise<LLMGenerationResult>
Parameters
interface LLMGenerationOptions {
/** Maximum tokens to generate (default: 256) */
maxTokens?: number
/** Sampling temperature 0.0-2.0 (default: 0.7) */
temperature?: number
/** Top-p nucleus sampling (default: 0.95) */
topP?: number
/** Top-k sampling */
topK?: number
/** Stop generation at these sequences */
stopSequences?: string[]
/** System prompt to define AI behavior */
systemPrompt?: string
/** Enable streaming mode */
streamingEnabled?: boolean
}
Returns
interface LLMGenerationResult {
/** Generated text */
text: string
/** Extracted thinking/reasoning content (if model supports it) */
thinkingContent?: string
/** Number of input tokens */
inputTokens: number
/** Total tokens used (prompt + response) */
tokensUsed: number
/** Model ID that was used */
modelUsed: string
/** Total latency in milliseconds */
latencyMs: number
/** Framework used for inference */
framework: LLMFramework
/** Hardware acceleration used */
hardwareUsed: HardwareAcceleration
/** Tokens generated per second */
tokensPerSecond: number
/** Time to first token in ms */
timeToFirstTokenMs?: number
/** Thinking tokens count */
thinkingTokens: number
/** Response tokens count */
responseTokens: number
}
Generation Options
Temperature
Controls randomness in the output. Lower values make output more focused and deterministic.
// Creative writing - higher temperature
const creative = await TextGeneration.generate('Write a poem about the ocean', {
temperature: 1.2,
maxTokens: 150,
})
// Factual response - lower temperature
const factual = await TextGeneration.generate('What is the boiling point of water?', {
temperature: 0.1,
maxTokens: 50,
})
// Balanced (default)
const balanced = await TextGeneration.generate('Explain machine learning', {
temperature: 0.7,
maxTokens: 200,
})
| Temperature | Use Case |
|---|
| 0.0-0.3 | Factual, deterministic responses |
| 0.4-0.7 | Balanced, general-purpose |
| 0.8-1.2 | Creative, varied outputs |
| 1.3-2.0 | Very creative, experimental |
Max Tokens
Limits the length of the generated response.
// Short answer
const short = await TextGeneration.generate('What is 2+2?', { maxTokens: 10 })
// Detailed explanation
const detailed = await TextGeneration.generate('Explain how computers work', { maxTokens: 500 })
Stop Sequences
Stop generation when specific sequences are encountered.
const result = await TextGeneration.generate('List 3 fruits:', {
maxTokens: 100,
stopSequences: ['4.', '\n\n'],
})
System Prompts
Define the AI’s behavior and persona.
const result = await TextGeneration.generate('What is the best programming language?', {
maxTokens: 200,
systemPrompt: 'You are a helpful coding assistant. Be concise and practical.',
})
See System Prompts for more details.
Examples
Full Example with Metrics
async function generateWithMetrics(prompt: string) {
const result = await TextGeneration.generate(prompt, {
maxTokens: 200,
temperature: 0.7,
})
console.log('=== Generation Results ===')
console.log('Response:', result.text)
console.log('')
console.log('=== Metrics ===')
console.log('Input tokens:', result.inputTokens)
console.log('Response tokens:', result.responseTokens)
console.log('Total latency:', result.latencyMs, 'ms')
console.log('TTFT:', result.timeToFirstTokenMs, 'ms')
console.log('Speed:', result.tokensPerSecond.toFixed(1), 'tok/s')
console.log('Hardware:', result.hardwareUsed)
return result
}
Thinking Models
Some models support “thinking” or reasoning before responding:
const result = await TextGeneration.generate('Solve this step by step: What is 15% of 240?', {
maxTokens: 500,
})
if (result.thinkingContent) {
console.log('Thinking:', result.thinkingContent)
// "Let me calculate 15% of 240. First, I'll convert 15% to a decimal..."
}
console.log('Answer:', result.text)
// "15% of 240 is 36."
console.log('Thinking tokens:', result.thinkingTokens)
console.log('Response tokens:', result.responseTokens)
Cancellation
Cancel an ongoing generation:
// Start generation
const promise = TextGeneration.generate('Write a long story...', { maxTokens: 1000 })
// Cancel after 2 seconds
setTimeout(() => {
TextGeneration.cancel()
}, 2000)
try {
const result = await promise
} catch (err) {
if (err instanceof SDKError && err.code === SDKErrorCode.GenerationCancelled) {
console.log('Generation was cancelled')
}
}