> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runanywhere.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# generateStream()

> Stream tokens in real-time as they are generated

Use streaming for responsive UIs that display text as it's generated, providing a better user experience for longer responses.

## Basic Streaming

```kotlin theme={null}
RunAnywhere.generateStream("Tell me a story about AI")
    .collect { token ->
        // Display each token as it arrives
        print(token)
        textView.append(token)
    }
```

## Streaming with Metrics

Get both the token stream AND final metrics:

```kotlin theme={null}
val streamResult = RunAnywhere.generateStreamWithMetrics(
    prompt = "Explain quantum computing",
    options = LLMGenerationOptions(maxTokens = 500)
)

// Collect tokens as they arrive
streamResult.stream.collect { token ->
    textView.append(token)
}

// Get final metrics after streaming completes
val metrics = streamResult.result.await()
println("\n\nGenerated ${metrics.tokensUsed} tokens")
println("Speed: ${metrics.tokensPerSecond} tok/s")
println("Time to first token: ${metrics.timeToFirstTokenMs}ms")
```

## LLMStreamingResult

```kotlin theme={null}
data class LLMStreamingResult(
    val stream: Flow<String>,                    // Token stream
    val result: Deferred<LLMGenerationResult>    // Final metrics (awaitable)
)
```

## Example: Chat UI with Streaming

```kotlin theme={null}
class ChatViewModel : ViewModel() {
    private val _currentResponse = MutableStateFlow("")
    val currentResponse: StateFlow<String> = _currentResponse

    private val _isGenerating = MutableStateFlow(false)
    val isGenerating: StateFlow<Boolean> = _isGenerating

    fun sendMessage(prompt: String) {
        viewModelScope.launch {
            _isGenerating.value = true
            _currentResponse.value = ""

            try {
                RunAnywhere.generateStream(prompt)
                    .collect { token ->
                        _currentResponse.value += token
                    }
            } catch (e: Exception) {
                _currentResponse.value = "Error: ${e.message}"
            } finally {
                _isGenerating.value = false
            }
        }
    }

    fun cancelGeneration() {
        RunAnywhere.cancelGeneration()
    }
}
```

## Example: Jetpack Compose Integration

```kotlin theme={null}
@Composable
fun StreamingTextView(viewModel: ChatViewModel) {
    val response by viewModel.currentResponse.collectAsState()
    val isGenerating by viewModel.isGenerating.collectAsState()

    Column {
        Text(
            text = response,
            modifier = Modifier
                .fillMaxWidth()
                .padding(16.dp)
        )

        if (isGenerating) {
            LinearProgressIndicator(
                modifier = Modifier.fillMaxWidth()
            )
        }
    }
}
```

## Performance Tips

<Tip>
  Streaming provides better perceived performance for long generations: - **Time to First Token
  (TTFT)**: \~50-100ms - Users see immediate feedback instead of waiting for the full response
</Tip>

| Scenario                          | Recommended Approach |
| --------------------------------- | -------------------- |
| Short responses (under 50 tokens) | `generate()`         |
| Long responses (over 100 tokens)  | `generateStream()`   |
| Chat interfaces                   | `generateStream()`   |
| Background processing             | `generate()`         |
