The generate() method provides full control over text generation with customizable options and detailed performance metrics. Use this for production applications where you need fine-grained control.
interface LLMGenerationResult { /** Generated text */ text: string /** Extracted thinking/reasoning content (if model supports it) */ thinkingContent?: string /** Number of input tokens */ inputTokens: number /** Total tokens used (prompt + response) */ tokensUsed: number /** Model ID that was used */ modelUsed: string /** Total latency in milliseconds */ latencyMs: number /** Framework used for inference */ framework: LLMFramework /** Hardware acceleration used */ hardwareUsed: HardwareAcceleration /** Tokens generated per second */ tokensPerSecond: number /** Time to first token in ms */ timeToFirstTokenMs?: number /** Thinking tokens count */ thinkingTokens: number /** Response tokens count */ responseTokens: number}
const result = await TextGeneration.generate('What is the best programming language?', { maxTokens: 200, systemPrompt: 'You are a helpful coding assistant. Be concise and practical.',})
Some models support “thinking” or reasoning before responding:
const result = await TextGeneration.generate('Solve this step by step: What is 15% of 240?', { maxTokens: 500,})if (result.thinkingContent) { console.log('Thinking:', result.thinkingContent) // "Let me calculate 15% of 240. First, I'll convert 15% to a decimal..."}console.log('Answer:', result.text)// "15% of 240 is 36."console.log('Thinking tokens:', result.thinkingTokens)console.log('Response tokens:', result.responseTokens)