generate() - RunAnywhere Documentation

Early Beta — The Web SDK is in early beta. APIs may change between releases.

Overview

The generate() method provides full control over text generation with customizable options and detailed performance metrics. Use this for production applications where you need fine-grained control.

Basic Usage

import { TextGeneration } from '@runanywhere/web'

const result = await TextGeneration.generate('Explain quantum computing in simple terms', {
  maxTokens: 200,
  temperature: 0.7,
})

console.log('Response:', result.text)
console.log('Tokens used:', result.tokensUsed)
console.log('Speed:', result.tokensPerSecond.toFixed(1), 'tok/s')
console.log('Latency:', result.latencyMs, 'ms')

API Reference

await TextGeneration.generate(
  prompt: string,
  options?: LLMGenerationOptions
): Promise<LLMGenerationResult>

Parameters

interface LLMGenerationOptions {
  /** Maximum tokens to generate (default: 256) */
  maxTokens?: number

  /** Sampling temperature 0.0-2.0 (default: 0.7) */
  temperature?: number

  /** Top-p nucleus sampling (default: 0.95) */
  topP?: number

  /** Top-k sampling */
  topK?: number

  /** Stop generation at these sequences */
  stopSequences?: string[]

  /** System prompt to define AI behavior */
  systemPrompt?: string

  /** Enable streaming mode */
  streamingEnabled?: boolean
}

Returns

interface LLMGenerationResult {
  /** Generated text */
  text: string

  /** Extracted thinking/reasoning content (if model supports it) */
  thinkingContent?: string

  /** Number of input tokens */
  inputTokens: number

  /** Total tokens used (prompt + response) */
  tokensUsed: number

  /** Model ID that was used */
  modelUsed: string

  /** Total latency in milliseconds */
  latencyMs: number

  /** Framework used for inference */
  framework: LLMFramework

  /** Hardware acceleration used */
  hardwareUsed: HardwareAcceleration

  /** Tokens generated per second */
  tokensPerSecond: number

  /** Time to first token in ms */
  timeToFirstTokenMs?: number

  /** Thinking tokens count */
  thinkingTokens: number

  /** Response tokens count */
  responseTokens: number
}

Generation Options

Temperature

Controls randomness in the output. Lower values make output more focused and deterministic.

// Creative writing - higher temperature
const creative = await TextGeneration.generate('Write a poem about the ocean', {
  temperature: 1.2,
  maxTokens: 150,
})

// Factual response - lower temperature
const factual = await TextGeneration.generate('What is the boiling point of water?', {
  temperature: 0.1,
  maxTokens: 50,
})

// Balanced (default)
const balanced = await TextGeneration.generate('Explain machine learning', {
  temperature: 0.7,
  maxTokens: 200,
})

Temperature	Use Case
0.0-0.3	Factual, deterministic responses
0.4-0.7	Balanced, general-purpose
0.8-1.2	Creative, varied outputs
1.3-2.0	Very creative, experimental

Max Tokens

Limits the length of the generated response.

// Short answer
const short = await TextGeneration.generate('What is 2+2?', { maxTokens: 10 })

// Detailed explanation
const detailed = await TextGeneration.generate('Explain how computers work', { maxTokens: 500 })

Stop Sequences

Stop generation when specific sequences are encountered.

const result = await TextGeneration.generate('List 3 fruits:', {
  maxTokens: 100,
  stopSequences: ['4.', '\n\n'],
})

System Prompts

Define the AI’s behavior and persona.

const result = await TextGeneration.generate('What is the best programming language?', {
  maxTokens: 200,
  systemPrompt: 'You are a helpful coding assistant. Be concise and practical.',
})

See System Prompts for more details.

Examples

Full Example with Metrics

async function generateWithMetrics(prompt: string) {
  const result = await TextGeneration.generate(prompt, {
    maxTokens: 200,
    temperature: 0.7,
  })

  console.log('=== Generation Results ===')
  console.log('Response:', result.text)
  console.log('')
  console.log('=== Metrics ===')
  console.log('Input tokens:', result.inputTokens)
  console.log('Response tokens:', result.responseTokens)
  console.log('Total latency:', result.latencyMs, 'ms')
  console.log('TTFT:', result.timeToFirstTokenMs, 'ms')
  console.log('Speed:', result.tokensPerSecond.toFixed(1), 'tok/s')
  console.log('Hardware:', result.hardwareUsed)

  return result
}

Thinking Models

Some models support “thinking” or reasoning before responding:

const result = await TextGeneration.generate('Solve this step by step: What is 15% of 240?', {
  maxTokens: 500,
})

if (result.thinkingContent) {
  console.log('Thinking:', result.thinkingContent)
  // "Let me calculate 15% of 240. First, I'll convert 15% to a decimal..."
}

console.log('Answer:', result.text)
// "15% of 240 is 36."

console.log('Thinking tokens:', result.thinkingTokens)
console.log('Response tokens:', result.responseTokens)

Cancellation

Cancel an ongoing generation:

// Start generation
const promise = TextGeneration.generate('Write a long story...', { maxTokens: 1000 })

// Cancel after 2 seconds
setTimeout(() => {
  TextGeneration.cancel()
}, 2000)

try {
  const result = await promise
} catch (err) {
  if (err instanceof SDKError && err.code === SDKErrorCode.GenerationCancelled) {
    console.log('Generation was cancelled')
  }
}

Simple Generation

Quick generation interface

Streaming

Real-time token streaming

System Prompts

Control AI behavior

Best Practices

Optimization tips

Documentation Index

​Overview

​Basic Usage

​API Reference

​Parameters

​Returns

​Generation Options

​Temperature

​Max Tokens

​Stop Sequences

​System Prompts

​Examples

​Full Example with Metrics

​Thinking Models

​Cancellation

​Related

Simple Generation

Streaming

System Prompts

Best Practices

Overview

Basic Usage

API Reference

Parameters

Returns

Generation Options

Temperature

Max Tokens

Stop Sequences

System Prompts

Examples

Full Example with Metrics

Thinking Models

Cancellation

Related