Use streaming for responsive UIs that display text as it’s generated, providing a better user experience for longer responses.
Basic Streaming
RunAnywhere.generateStream("Tell me a story about AI")
.collect { token ->
// Display each token as it arrives
print(token)
textView.append(token)
}
Streaming with Metrics
Get both the token stream AND final metrics:
val streamResult = RunAnywhere.generateStreamWithMetrics(
prompt = "Explain quantum computing",
options = LLMGenerationOptions(maxTokens = 500)
)
// Collect tokens as they arrive
streamResult.stream.collect { token ->
textView.append(token)
}
// Get final metrics after streaming completes
val metrics = streamResult.result.await()
println("\n\nGenerated ${metrics.tokensUsed} tokens")
println("Speed: ${metrics.tokensPerSecond} tok/s")
println("Time to first token: ${metrics.timeToFirstTokenMs}ms")
LLMStreamingResult
data class LLMStreamingResult(
val stream: Flow<String>, // Token stream
val result: Deferred<LLMGenerationResult> // Final metrics (awaitable)
)
Example: Chat UI with Streaming
class ChatViewModel : ViewModel() {
private val _currentResponse = MutableStateFlow("")
val currentResponse: StateFlow<String> = _currentResponse
private val _isGenerating = MutableStateFlow(false)
val isGenerating: StateFlow<Boolean> = _isGenerating
fun sendMessage(prompt: String) {
viewModelScope.launch {
_isGenerating.value = true
_currentResponse.value = ""
try {
RunAnywhere.generateStream(prompt)
.collect { token ->
_currentResponse.value += token
}
} catch (e: Exception) {
_currentResponse.value = "Error: ${e.message}"
} finally {
_isGenerating.value = false
}
}
}
fun cancelGeneration() {
RunAnywhere.cancelGeneration()
}
}
Example: Jetpack Compose Integration
@Composable
fun StreamingTextView(viewModel: ChatViewModel) {
val response by viewModel.currentResponse.collectAsState()
val isGenerating by viewModel.isGenerating.collectAsState()
Column {
Text(
text = response,
modifier = Modifier
.fillMaxWidth()
.padding(16.dp)
)
if (isGenerating) {
LinearProgressIndicator(
modifier = Modifier.fillMaxWidth()
)
}
}
}
Streaming provides better perceived performance for long generations: - Time to First Token
(TTFT): ~50-100ms - Users see immediate feedback instead of waiting for the full response
| Scenario | Recommended Approach |
|---|
| Short responses (under 50 tokens) | generate() |
| Long responses (over 100 tokens) | generateStream() |
| Chat interfaces | generateStream() |
| Background processing | generate() |