Skip to main content
Use streaming for responsive UIs that display text as it’s generated, providing a better user experience for longer responses.

Basic Streaming

RunAnywhere.generateStream("Tell me a story about AI")
    .collect { token ->
        // Display each token as it arrives
        print(token)
        textView.append(token)
    }

Streaming with Metrics

Get both the token stream AND final metrics:
val streamResult = RunAnywhere.generateStreamWithMetrics(
    prompt = "Explain quantum computing",
    options = LLMGenerationOptions(maxTokens = 500)
)

// Collect tokens as they arrive
streamResult.stream.collect { token ->
    textView.append(token)
}

// Get final metrics after streaming completes
val metrics = streamResult.result.await()
println("\n\nGenerated ${metrics.tokensUsed} tokens")
println("Speed: ${metrics.tokensPerSecond} tok/s")
println("Time to first token: ${metrics.timeToFirstTokenMs}ms")

LLMStreamingResult

data class LLMStreamingResult(
    val stream: Flow<String>,                    // Token stream
    val result: Deferred<LLMGenerationResult>    // Final metrics (awaitable)
)

Example: Chat UI with Streaming

class ChatViewModel : ViewModel() {
    private val _currentResponse = MutableStateFlow("")
    val currentResponse: StateFlow<String> = _currentResponse

    private val _isGenerating = MutableStateFlow(false)
    val isGenerating: StateFlow<Boolean> = _isGenerating

    fun sendMessage(prompt: String) {
        viewModelScope.launch {
            _isGenerating.value = true
            _currentResponse.value = ""

            try {
                RunAnywhere.generateStream(prompt)
                    .collect { token ->
                        _currentResponse.value += token
                    }
            } catch (e: Exception) {
                _currentResponse.value = "Error: ${e.message}"
            } finally {
                _isGenerating.value = false
            }
        }
    }

    fun cancelGeneration() {
        RunAnywhere.cancelGeneration()
    }
}

Example: Jetpack Compose Integration

@Composable
fun StreamingTextView(viewModel: ChatViewModel) {
    val response by viewModel.currentResponse.collectAsState()
    val isGenerating by viewModel.isGenerating.collectAsState()

    Column {
        Text(
            text = response,
            modifier = Modifier
                .fillMaxWidth()
                .padding(16.dp)
        )

        if (isGenerating) {
            LinearProgressIndicator(
                modifier = Modifier.fillMaxWidth()
            )
        }
    }
}

Performance Tips

Streaming provides better perceived performance for long generations: - Time to First Token (TTFT): ~50-100ms - Users see immediate feedback instead of waiting for the full response
ScenarioRecommended Approach
Short responses (under 50 tokens)generate()
Long responses (over 100 tokens)generateStream()
Chat interfacesgenerateStream()
Background processinggenerate()