Streaming and Snapshots
I remember back in 2023, I was working for a social music startup, and I created an AI playlist generator feature. In the beginning, I was fetching the entire response at once, and it took ages with the GPT 3.5 API, and the heavy custom prompt that I had to write to get the playlist to sound like the user’s taste.
When I introduced streaming, it completely made the feature feel alive. The user could see the playlist being generated in real-time, and it was a joy to use.
But, oh boy, that was a pain to implement. I had to accumulate the schema myself, and then parse the JSON after every delta to get the playlist to update. It was a mess with the brittle implementation that I had to maintain.
But, Foundation Models framework has a snapshot approach that is different from what you might expect.
You can get a partially-filled object that you can render immediately and update as new fields arrive. With the correct animations, it feels like magic.
Prerequisites and Context
This chapter builds on the session patterns from the previous chapter. You should be comfortable creating sessions, handling basic responses, and understanding the distinction between instructions and prompts. Now you make those responses feel alive with streaming.
What You Will Learn
By the end of this chapter, you will be able to:
- Stream both plain text and structured results using snapshots
- Build responsive UIs that update as fields populate
- Handle cancellation and multiple concurrent streams
- Test streaming behavior effectively
- Choose when to stream vs generate complete responses
Why Snapshots Instead of Token Deltas?
Most AI frameworks stream raw token deltas that are tiny text chunks that you accumulate yourself. I have built chat interfaces this way, and it works fine for plain text, but becomes painful when your response has structure. You end up re-parsing JSON after every delta, hoping the accumulated text is valid.
Foundation Models’ snapshot approach is much more elegant. Instead of raw tokens, you receive complete, partially-populated objects where properties become non-nil progressively. This plays beautifully with SwiftUI: bind to the partially generated type and let the UI update naturally as fields arrive.
Using the session patterns from the previous chapter, streaming becomes a natural extension rather than a completely different approach.
Streaming Plain Text
For plain text responses, streaming works similarly to traditional chat interfaces, but with the session-based approach we established earlier. The key difference is receiving incremental content you can append to your view:
import FoundationModels
let session = LanguageModelSession()
let stream = session.streamResponse(to: prompt)
for try await chunk in stream {
print(chunk.content) // Update UI progressively
}
let final = try await stream.collect()
print("Final: (final.content)") Physiqa Example: Live Chat Streaming Pattern
The Zenther workout assistant demonstrates a clean streaming implementation that updates the UI in real-time. The key insight is letting the session’s transcript handle UI updates automatically:
@MainActor
func sendMessage(_ content: String) async {
isLoading = session.isResponding
do {
// Stream response from current session
let responseStream = session.streamResponse(to: Prompt(content))
for try await _ in responseStream {
// The streaming automatically updates the session transcript
// UI observes session.transcript for real-time updates
}
} catch {
errorMessage = handleFoundationModelsError(error)
showError = true
}
isLoading = session.isResponding
} This pattern eliminates manual content accumulation by using Foundation Models’ built-in transcript management. The UI simply observes session.transcript changes, and SwiftUI automatically updates as streaming content arrives. This approach is particularly effective for chat interfaces where conversation history is important.
Streaming Structured Results
You can also stream structured results with @Generable types, which you will explore in detail in the next chapter on structured generation. You receive PartiallyGenerated instances with optional properties that fill in over time:
import FoundationModels
import SwiftUI
@Generable
struct StoryOutline {
@Guide(description: "Story title")
let title: String
@Guide(description: "Main character")
let protagonist: Character
@Guide(description: "3–5 plot points")
let plotPoints: [String]
}
@Generable
struct Character {
@Guide(description: "Character name")
let name: String
@Guide(description: "Character background")
let background: String
}
struct StreamingStoryView: View {
@State private var partial: StoryOutline.PartiallyGenerated?
@State private var final: StoryOutline?
@State private var isStreaming = false
@State private var error: String?
var body: some View {
VStack(alignment: .leading, spacing: 16) {
if let partial {
// Title
Text(partial.title ?? "…")
.font(.title)
.redacted(reason: partial.title == nil ? .placeholder : [])
// Protagonist
Group {
Text("Protagonist")
.font(.headline)
Text(partial.protagonist?.name ?? "…")
.redacted(reason: partial.protagonist?.name == nil ? .placeholder : [])
Text(partial.protagonist?.background ?? "…")
.foregroundColor(.secondary)
.redacted(reason: partial.protagonist?.background == nil ? .placeholder : [])
}
// Plot points
Group {
Text("Plot Points")
.font(.headline)
if let points = partial.plotPoints {
ForEach(points.indices, id: .self) { i in
Text("• (points[i])")
}
} else {
ForEach(0..<3, id: .self) { _ in
Text("• …").redacted(reason: .placeholder)
}
}
}
}
if let final {
Divider()
Text("Complete!").font(.caption).foregroundColor(.green)
Text("(final.title)")
}
if let error { Text(error).foregroundColor(.red) }
HStack {
Button(isStreaming ? "Streaming…" : "Generate") {
Task { await generate() }
}
.disabled(isStreaming)
}
}
.padding()
}
private func generate() async {
isStreaming = true
error = nil
partial = nil
final = nil
let session = LanguageModelSession()
let stream = session.streamResponse(
to: "Create a story outline about a time-traveling detective",
generating: StoryOutline.self
)
do {
for try await snapshot in stream {
await MainActor.run {
partial = snapshot.content
}
}
let completed = try await stream.collect()
await MainActor.run {
final = completed.content
}
} catch LanguageModelSession.GenerationError.guardrailViolation {
await MainActor.run { error = "Content blocked by safety guardrails" }
} catch {
await MainActor.run { error = error.localizedDescription }
}
isStreaming = false
}
} This approach eliminates the parsing headaches I mentioned earlier:
PartiallyGeneratedmirrors your final type with optional properties- You can render placeholders or redacted views (skeleton UI) until fields arrive
collect()returns the complete final value when streaming ends- No manual JSON parsing or delta accumulation required
Cancellation and Backpressure
Users may change their minds mid-generation, especially with longer streaming responses. Supporting cancellation keeps your UI responsive and avoids wasted computational work - important on mobile devices:
actor StreamController<T> {
private var task: Task<T, Error>?
func start(_ operation: @escaping () async throws -> T) {
cancel()
task = Task(priority: .userInitiated) { try await operation() }
}
func cancel() {
task?.cancel()
task = nil
}
func join() async throws -> T? {
let result = try await task?.value
task = nil
return result
}
} Use this to ensure only one stream runs at a time. Always cancel the previous before starting a new one because the framework only supports one request at a time.
Case Study: Pokémon Snapshot Streaming
Here is a real-world snapshot pattern adapted from a Pokémon analysis example: a rich @Generable model, a service that streams analysis, and a SwiftUI view that renders partials.
The model
import FoundationModels
@Generable
struct PokemonAnalysis: Equatable {
@Guide(description: "An epic title for this Pokemon analysis")
let title: String
@Guide(description: "The Pokemon's name")
let pokemonName: String
@Guide(description: "The Pokemon's Pokedex number")
let pokedexNumber: Int
@Guide(description: "Primary and secondary types")
let types: [PokemonType]
@Guide(description: "Overall competitive tier rating")
let competitiveTier: CompetitiveTier
}
@Generable
struct PokemonType: Equatable {
let name: String
let colorDescription: String
}
@Generable
enum CompetitiveTier: String {
case ubers = "Ubers", overUsed = "OU (OverUsed)", underUsed = "UU (UnderUsed)",
rarelyUsed = "RU (RarelyUsed)", neverUsed = "NU (NeverUsed)", littleCup = "LC (Little Cup)"
} The streaming service
import FoundationModels
import Observation
@Observable @MainActor
final class PokemonAnalyzer {
private(set) var analysis: PokemonAnalysis.PartiallyGenerated?
private let session = LanguageModelSession()
private var currentTask: Task<Void, Error>?
func analyzePokemon(_ identifier: String) async {
currentTask?.cancel()
currentTask = Task {
let stream = session.streamResponse(
generating: PokemonAnalysis.self,
includeSchemaInPrompt: true,
options: GenerationOptions(temperature: 0.2)
) {
"Analyze: (identifier). Provide name, pokedex number, types, and tier."
}
for try await snapshot in stream {
try Task.checkCancellation()
analysis = snapshot.content
}
}
_ = try? await currentTask?.value
}
func stop() { currentTask?.cancel() }
} The SwiftUI view
import SwiftUI
import FoundationModels
struct StreamingPokemonView: View {
let analysis: PokemonAnalysis.PartiallyGenerated
var body: some View {
VStack(spacing: 12) {
Text(analysis.title ?? "…")
.font(.title).bold()
.redacted(reason: analysis.title == nil ? .placeholder : [])
if let name = analysis.pokemonName {
Text("Name: (name)")
} else {
Text("Name: …").redacted(reason: .placeholder)
}
if let number = analysis.pokedexNumber { Text("No. (number)") }
if let types = analysis.types {
Text("Types: (types.map { $0.name }.joined(separator: ", "))")
}
if let tier = analysis.competitiveTier { Text("Tier: (tier.rawValue)") }
}
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
}
} This pattern highlights how PartiallyGenerated keeps your UI responsive: render what is ready (title/name) while deeper fields (types, tier) arrive.
Stable IDs for streaming lists
When streaming arrays of nested items, give each element a stable identity so SwiftUI diffing remains smooth as fields appear. Use GenerationID() in your @Generable types and reference it in your views:
@Generable
struct AbilityAnalysis: Equatable {
var id = GenerationID
@Guide(description: "The ability's name")
let name: String
@Guide(description: "The ability's strategic use")
let strategicUse: String
}
// In your view
if let abilities = analysis.abilities {
ForEach(abilities, id: .id) { ability in
Text(ability.name ?? "…")
.redacted(reason: ability.name == nil ? .placeholder : [])
}
} The same approach works for other nested arrays (types, evolutions, matchups, etc.).
Error handling patterns
Surface friendly messages and recover where possible:
do {
for try await snapshot in stream { /* update UI */ }
} catch LanguageModelSession.GenerationError.concurrentRequests(_) {
// A previous request is still running
showMessage("Please wait for the current request to finish.")
} catch LanguageModelSession.GenerationError.rateLimited(_) {
// Background or system rate limit reached
showMessage("Too many requests right now. Please try again shortly.")
} catch LanguageModelSession.GenerationError.guardrailViolation {
showMessage("I can’t help with that request.")
} catch LanguageModelSession.GenerationError.refusal(_, _) {
showMessage("I can’t produce the requested type of answer.")
} catch {
showMessage(error.localizedDescription)
} Prewarming for responsiveness
You can nudge the system to get ready sooner:
let session = LanguageModelSession()
session.prewarm() // warming hint; not a guarantee Prewarm doesn’t guarantee immediate readiness and may be less effective in the background or under load.
When to Stream vs Generate Once
Choosing between streaming and complete generation depends on your use case:
Stream when you want responsiveness and progressive disclosure (chat, forms, outlines, previews), when users benefit from seeing partial results (long summaries, structured data), or when you need to populate multiple fields incrementally.
Generate once when the response is small and streaming adds no perceived value, or when you need all-or-nothing output for further processing.
What’s Next
You now understand Foundation Models’ unique snapshot streaming approach, which builds naturally on the session patterns from earlier chapters.
The next chapter explores generation options and sampling control - learning how to fine-tune the model’s behavior with temperature, token limits, and sampling strategies. These controls apply to both regular and streaming responses, giving you precise control over the content you just learned to stream.