# Foundation Models Guide > A practical guide to building Apple Intelligence features with sessions, structured generation, streaming, tools, safety, and adapters. Canonical URL: https://www.rudrank.ai/foundation-models LLM index: https://www.rudrank.ai/llms.txt Source site: rudrank.ai Author: Rudrank Riyam ## How to use this file This file contains the full Markdown text of the Foundation Models Guide. AI assistants and coding agents can use it as a compact context source when answering questions about Apple's Foundation Models framework, LanguageModelSession, structured generation, streaming snapshots, tools, safety, internationalization, and adapters. Use the canonical chapter URLs when citing or linking back to the web version. --- ## Chapter 01: Introduction to Foundation Models Canonical URL: https://www.rudrank.ai/foundation-models/introduction-to-foundation-models Understand what Apple's Foundation Models framework can do, where it fits, and when to choose it for app features. The Foundation Models framework is Apple's new framework that gives you direct access to the on-device large language model used by Apple Intelligence. This chapter introduces the framework's capabilities, architectural decisions, and how it can help you build AI features for your apps, without having to deal with the complexities of downloading and running models on your own. The framework's announcement at WWDC 2025 was predicted by Bloomberg, which reported in May 2025 that Apple was preparing to allow developers to use the model that powers Apple Intelligence through an AI SDK. ## Prerequisites and Context This guide assumes you are comfortable with Swift and SwiftUI development, but having no prior AI or machine learning experience is not a problem. The APIs are written with iOS developers in mind, providing familiar Swift patterns and straightforward integration with your existing app architecture. ## What You Will Learn By the end of this chapter, you will understand: - What Foundation Models can and *cannot* do for your apps - How guided generation solves the structured output problem - Why Foundation Models uses snapshots instead of token streaming - When to choose Foundation Models vs MLX Swift - The framework's limitations and architectural decisions The Foundation Models framework is available on iOS 26.0+, macOS 26.0+, iPadOS 26.0+, and visionOS 26.0+, and all regions where Apple Intelligence is available, excluding mainland China, as of September 2025. The best way to understand what this framework can achieve is to understand some examples of what you can build with it: - **Personalized suggestions** that understand your app's content by providing it with the relevant context - **Travel itineraries** generated on-demand in a travel app - **Dynamic game dialog** created for characters - **Content summarization** and analysis of user input - **Structured data extraction** from unstructured text All of this runs completely on-device, so user data stays private, works offline, and does not increase your app size—assuming the Apple Intelligence model is already downloaded on the device. ## The Model Foundation Models is powered by an approximately 3 billion parameter large language model, with each parameter quantized to 2 bits. This model outperforms Llama 3.2 3B but is comparable to or slightly behind Qwen 3 4B and Gemma 3 4B models. Again, this is a device-scale model. It is more optimized for specific use cases like summarization, extraction, classification, and guided generation, converting unstructured text into structured data that you can directly use in your app. It is not designed for real world knowledge or advanced reasoning. The model's training cutoff is late 2023, so you should not rely on it for recent events. Those tasks, especially reasoning, should be performed by state-of-the-art server-side LLMs like Sonnet 4, Gemini 2.5 Pro, or OpenAI's GPT 5. ## Guided Generation Language models produce unstructured text that is easy for humans to read but difficult to map onto views in your app. You can write code to parse the text into structured data, but this approach is error-prone and difficult to maintain—you are essentially hoping the model will **always** produce correctly formatted data. Foundation Models solves this with **guided generation**, a system that guarantees type-safe, structured output directly without parsing JSON or dealing with decoding issues. You will learn the complete details of guided generation, including `@Generable` types and `@Guide` attributes, in a later chapter on structured generation. ## Streaming with Snapshots Foundation Models takes a different approach to streaming than other frameworks. Instead of raw deltas (short character groups), it streams **snapshots** (complete partial objects with populated fields). Typically, as deltas are produced, you accumulate them yourself. But when the result has structure, you need to parse it out of the accumulation after each delta. This is not trivial for complex structures, or even simple ones for that matter. Foundation Models transforms deltas into snapshots that represent partially generated responses. Their properties are all optional and get filled in as the model produces more of the response. This is a much simpler approach than accumulating deltas and parsing them out. This works great with SwiftUI where you create state holding a partially generated type, iterate over the response stream, and show it in your UI as the structured data fills in with animations! ## Tool Calling Tool calling lets the model execute functions *you* define, extending its capabilities beyond text generation based on the limited knowledge of the model. The model can access real-world data, fetch data from system frameworks like HealthKit or EventKit, and take actions in your app automatically or even outside of it like creating reminders. You will learn how to build and use tools in later chapters on tool use and external API integration. ## Stateful Sessions and Multi-Turn Conversations Foundation Models is built around **stateful sessions** that maintain conversation context. Each interaction is retained in a **transcript**, allowing the model to understand past interactions within a session. You will learn how to create sessions, manage conversations, and use transcripts in the upcoming chapter on sessions and a later chapter on advanced chat patterns. ## Foundation Models vs MLX Swift Both frameworks serve different purposes: **Foundation Models** gives you a high-level API focused on app features. It uses Apple's system language models and provides guaranteed type-safe structured output. It also has built-in tools and conversation memory. The biggest caveat is that it is only available on iOS 26.0+ and devices that support Apple Intelligence. **MLX Swift** gives you low-level control over the pipeline, any compatible model from Hugging Face or fine-tuned models, raw output from the tool calling that you parse yourself, works on older devices (iPhone 13 as well with iOS 16.0+), and complete flexibility over the model choice and the pipeline. Choose **Foundation Models** when you want to build user-facing AI features quickly with Apple's model and can afford to have the feature available for only iOS 26.0+. Choose **MLX Swift** when you need specific models or complete control over the AI pipeline. ## Developer Experience Foundation Models framework is one of the most developer-friendly frameworks by Apple. You can tell it was built by developers who actually use developer tools. The framework includes a simple playground for testing prompts directly in Xcode, and Instruments integration for performance profiling. ## Limitations and Considerations Foundation Models has several important constraints to understand: - **Shared context window** between input and output. The default is 4096 tokens, but starting in iOS 26.4, you can query the actual size with `SystemLanguageModel.default.contextSize`, and it may grow as Apple updates the on-device model - **No versioning** because models are tied to OS releases - **Text-only** with no vision capabilities as of September 2025 - **Performance varies** as complex generations can take time You will learn how to work within these limitations throughout the following chapters. ## What's Next Now that you understand what Foundation Models can do, the next chapter gets you hands-on with sessions - the core building block of every Foundation Models interaction. You will learn to check availability, create your first session, and build a simple chat interface. Before you start, verify your development environment meets the iOS 26.0+ requirement. You require Xcode 26.0+ and the latest macOS 26.0+ or iOS 26.0+ SDK to run on your device. You will also need to enable Apple Intelligence on your test devices. If you want to test on the iOS simulator, you require the **latest macOS 26 Tahoe** along with the latest iOS 26.0+ SDK. --- ## Chapter 02: Getting Started with Sessions Canonical URL: https://www.rudrank.ai/foundation-models/getting-started-with-sessions Create your first LanguageModelSession, check model availability, and build the basic interaction loop. This chapter introduces sessions, the core building block of every Foundation Models interaction. ## Prerequisites and Context This chapter builds on the Foundation Models introduction from the previous chapter. You should understand what Foundation Models can do, how guided generation works conceptually, and when to choose Foundation Models over MLX Swift. Now you move from theory to practice, focusing on on-device generation using the Neural Engine. ## What You Will Learn By the end of this chapter, you will be able to: - Check Foundation Models availability and handle different states - Create and configure language model sessions - Distinguish between single-turn and multi-turn interactions - Write effective instructions and prompts - Handle common errors gracefully - Optimize session performance with prewarming - Build a simple chat interface ## Project Setup Apple Intelligence is built into the OS, so there is no external package to add like in the case of MLX Swift. Import the new `FoundationModels` framework: ```swift import FoundationModels ``` That is it. The model is already on the user's device if Apple Intelligence is enabled and downloaded. ## Checking Availability Foundation Models only works if Apple Intelligence is enabled. You **cannot assume** it is available, so you should always check that first. ### Understanding the Availability States The `SystemLanguageModel.default.availability` returns one of these states: - **`.available`** - You are good to go - **`.unavailable(.deviceNotEligible)`** - Device does not support Apple Intelligence (iPhone 15 and below, Intel Macs, older iPads) - **`.unavailable(.appleIntelligenceNotEnabled)`** - User has not enabled Apple Intelligence in Settings - **`.unavailable(.modelNotReady)`** - Model is downloading or system conditions are not met The model might also be unavailable due to battery level, Game Mode, or if the device becomes too warm. ### Building UI for Each State Here is an availability checker with UI that you can prefill in your app: ```swift struct FoundationModelsAvailabilityView: View { private let model = SystemLanguageModel.default var body: some View { VStack(spacing: 20) { switch model.availability { case .available: AvailableStateView() case .unavailable(.deviceNotEligible): DeviceNotEligibleView() case .unavailable(.appleIntelligenceNotEnabled): AppleIntelligenceDisabledView() case .unavailable(.modelNotReady): ModelNotReadyView() case .unavailable(let other): UnknownUnavailableView(reason: other) } } .padding() } } ``` This gives you UI components for every availability state that you can adapt according to your app's design. ## Exploring with Playgrounds Before building a full interface, it is helpful to experiment with prompts and instructions directly in the playground. The `#Playground` macro provides a live-updating environment, similar to SwiftUI Previews, where you can see model responses in real time. ```swift import FoundationModels import Playgrounds #Playground { let session = LanguageModelSession() let response = try await session.respond(to: "Why is the sky blue?") } ``` The response appears instantly in the preview canvas on the right. You can expand the result to see details like the generated `content` and the request `duration`. For example, for the above query, the response is: ``` The sky appears blue primarily due to a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it is made up of different colors, each with its own wavelength. Blue light has a shorter wavelength and is scattered in all directions by the gases and particles in the atmosphere more than other colors. As a result, when we look up, we see the predominantly blue light scattered in all directions, giving the sky its blue appearance. This effect is more pronounced when the sun is lower in the sky, as the sunlight has to pass through more atmosphere, scattering even more blue light. ``` It took 2.1 seconds to generate on an M4 MacBook Air. You can use this to quickly test different instructions, prompts, and generation options throughout this chapter. ## Your First Session Foundation Models is built around sessions. You can treat a session as a conversation that carries the context of previous interactions. ### Single-Turn vs Multi-Turn Sessions Foundation Models supports two types of interactions: **Single-turn**: Create a new session for each request. Good for independent tasks like generating titles or extracting data: ```swift // Fresh sessions avoid conversation context for independent tasks let session = LanguageModelSession() let response = try await session.respond(to: "Generate a title for a travel blog") print(response.content) ``` Here is a sample response: ``` "Wanderlust Chronicles: Exploring the World One Adventure at a Time" ``` **Multi-turn**: Reuse the same session to maintain conversation context. Perfect for chat interfaces: ```swift // Persistent sessions maintain context across multiple interactions let session = LanguageModelSession() let response1 = try await session.respond(to: "I'm planning a trip to Japan") let response2 = try await session.respond(to: "What should I pack?") // Context from previous Japan question influences packing suggestions print(response2) ``` The second response incorporates context from the first, with packing suggestions tailored for Japan: ``` Response(userPrompt: "What should I pack?", duration: 24.287179583, content: "Packing for a trip to Japan requires considering the season, cultural norms, and activities you plan to engage in. Here is a general packing list to help you prepare... ``` ### Simple Session Example Here is the simplest possible chat interface: ```swift struct BasicChatView: View { @State private var session: LanguageModelSession? @State private var prompt = "" @State private var response = "" @State private var isGenerating = false var body: some View { VStack(spacing: 20) { TextField("Ask me anything...", text: $prompt) .textFieldStyle(RoundedBorderTextFieldStyle()) Button("Generate") { Task { await generateResponse() } } .disabled(isGenerating || session == nil) if !response.isEmpty { Text(response) .padding() .background(Color.gray.opacity(0.1)) .cornerRadius(8) } } .padding() .task { await setupSession() } } private func setupSession() async { guard SystemLanguageModel.default.availability == .available else { return } session = LanguageModelSession( instructions: Instructions("You are a helpful assistant. Keep responses concise.") ) } private func generateResponse() async { guard let session = session else { return } isGenerating = true do { let result = try await session.respond(to: prompt) response = result.content } catch { response = "Error: \(error.localizedDescription)" } isGenerating = false } } ``` ## Physiqa Example: Workout Assistant Session The Zenther fitness app demonstrates practical session management for workout guidance and logging. This real-world implementation shows session setup, streaming responses, and dynamic instruction updates. ### Core Session Management ```swift @Observable final class ChatViewModel { var isLoading: Bool = false var sessionCount: Int = 1 var instructions: String = "You are a fitness AI assistant specializing in workout guidance, exercise form, nutrition advice, and health tracking. Help users log their workouts, plan training sessions, and achieve their fitness goals." private(set) var session: LanguageModelSession init(subscriptionService: SubscriptionStatusService) { self.subscriptionService = subscriptionService self.session = LanguageModelSession( instructions: Instructions( "You are a fitness AI assistant specializing in workout guidance, exercise form, nutrition advice, and health tracking. Help users log their workouts, plan training sessions, and achieve their fitness goals." ) ) } } ``` ### Streaming Message Handling The workout assistant uses streaming responses to provide real-time feedback during exercise logging: ```swift @MainActor func sendMessage(_ content: String) async { isLoading = session.isResponding do { // Stream response from current session let responseStream = session.streamResponse(to: Prompt(content)) for try await _ in responseStream { // The streaming automatically updates the session transcript } } catch { // Handle other errors by showing an error message errorMessage = handleFoundationModelsError(error) showError = true } isLoading = session.isResponding } ``` ### Session Lifecycle Management The Zenther app provides methods to clear conversation history and update instructions dynamically: ```swift @MainActor func clearChat() { sessionCount = 1 session = LanguageModelSession( instructions: Instructions(instructions) ) } @MainActor func updateInstructions(_ newInstructions: String) { instructions = newInstructions // Create a new session with updated instructions session = LanguageModelSession( instructions: Instructions(instructions) ) } ``` This pattern allows users to reset their conversation context or switch between different AI personas (like switching from workout planning to nutrition guidance) without restarting the app. ## Instructions vs Prompts Foundation Models distinguishes between instructions and prompts to provide better security and control over AI behavior. ### Instructions Tell the model who it is and how to behave. These persist for the entire session and take priority over prompts: ```swift // Instructions shape the AI's behavior and personality throughout the session let instructions = """ You are a helpful writing assistant that helps users improve their content. Focus on clarity, tone, and structure. Provide specific suggestions for improvement. Keep responses concise and actionable. """ let session = LanguageModelSession(instructions: Instructions(instructions)) ``` ### Prompts Individual questions or requests from the user: ```swift // Prompts contain user's actual requests and questions let prompt = "Improve this email draft: \(emailText)" try await session.respond(to: prompt) ``` ## Prompt Engineering Best Practices Use these simple methods to get the best results from the model. Since the model is smaller and optimized for specific tasks rather than general knowledge, targeted prompting improves output quality for app features. ### Control Output Length Be specific about the length you want: ```swift // Unclear constraints often produce verbose, unfocused responses "Summarize this article" // Length constraints help fit responses into your UI layout "Summarize this article in exactly two sentences" // Combined constraints produce more useful, targeted content "Create a brief product description under 50 words that highlights key features" ``` Clear length constraints help the model generate appropriately sized responses for your UI. This prevents layout issues and ensures consistent user experience across different devices. This is also important given the context window, which is shared between input and output. You can query the exact limit with `try await SystemLanguageModel.default.contextSize` — it defaults to 4096 tokens but may grow in future OS releases. ### Specify Roles and Context Give the model a clear role and context: ```swift let instructions = """ You are a customer service representative for a fitness app. Be helpful, encouraging, and focus on solving user problems. Keep responses professional but friendly. """ ``` This produces a different output than a generic assistant by providing domain-specific context and tone guidance. ### Write Clear Commands Like other large language models, Apple's foundation model performs best with clear, specific commands: ```swift // Focused requests produce more accurate results "Generate five related workout routines for beginners" // Examples guide the AI toward your desired output format "Generate five beginner workout routines. Each should be 2-3 words like 'Morning Yoga' or 'Quick Cardio'" ``` ### Use Examples in Instructions Provide a few examples of desired outputs. This helps the model match the desired output format and style: ```swift let instructions = """ You suggest related topics. Examples: User: "Making homemade bread" Assistant: 1. Sourdough starter basics 2. Bread flour types 3. Kneading techniques User: "iOS development" Assistant: 1. SwiftUI fundamentals 2. App Store guidelines 3. Xcode debugging Keep suggestions concise (3-7 words) and naturally related. """ ``` ### Use Strong Commands When Needed If you observe unwanted output, use firm constraints: ```swift let instructions = """ You are a helpful assistant for children's content. DO NOT include scary or violent content. DO NOT mention inappropriate topics. """ ``` The model responds reliably to all-caps "DO NOT" constraints. ## Understanding the Model's Capabilities This is a 3B parameter model optimized for on-device use—in comparison, popular server-based models use hundreds of billions of parameters. Keep the following in mind for better implementation: - Focus on language tasks like summarization, classification, and conversation - Avoid complex reasoning, math calculations, and code generation - Be aware of potential hallucinations for factual content - Take advantage of the full context window (query the size with `SystemLanguageModel.default.contextSize`) by providing examples and context for best results ## Basic Error Handling Foundation Models can fail for several reasons. Here is how to handle them. You can create custom UI for each error case. ```swift do { let result = try await session.respond(to: prompt) return result.content } catch LanguageModelSession.GenerationError.exceededContextWindowSize { return "This conversation is too long. Please start a new session." } catch LanguageModelSession.GenerationError.guardrailViolation { return "I cannot respond to that request." // Content safety system blocked the request } catch LanguageModelSession.GenerationError.assetsUnavailable { return "Foundation Models is temporarily unavailable. Please try again." } catch LanguageModelSession.GenerationError.concurrentRequests { return "Please wait for the current request to finish before starting a new one." } catch LanguageModelSession.GenerationError.rateLimited { return "Too many requests. Please try again later." } catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale { return "This language is not supported. Please try English or another supported language." } catch LanguageModelSession.GenerationError.decodingFailure { return "Unable to process the response. Please try again." } catch LanguageModelSession.GenerationError.unsupportedGuide { return "Invalid generation parameters. Please check your request format." } catch LanguageModelSession.GenerationError.refusal(let refusal, _) { // Model refused to respond - you can get an explanation do { let explanation = try await refusal.explanation return "The model declined to respond: \(explanation.content)" } catch { return "The model declined to respond to this request." } } catch { return "Something went wrong: \(error.localizedDescription)" } ``` ### Understanding Specific Error Types **Guardrail Violations**: Content safety system blocks unsafe requests. Consider whether the request was user-initiated (show helpful message) or proactive (silently ignore). **Refusal Errors**: The model chooses not to respond even when content passes safety checks. Unlike guardrail violations, you can ask the model to explain why it refused. **Concurrent Requests**: Sessions can only handle one request at a time. Always check `session.isResponding` before making new requests. **Rate Limiting**: Only occurs when your app runs in the background and exceeds system limits. This rarely occurs in practice, but handle it gracefully. ## Session Safety Important: A session can only handle one request at a time. Calling it while it is busy causes a runtime error. Always check `session.isResponding`: ```swift guard !session.isResponding else { return } // Disable UI during generation to prevent concurrent requests Button("Generate") { Task { await generateResponse() } } .disabled(session?.isResponding == true) ``` ## Performance Optimization Use `session.prewarm()` to eagerly load resources and optionally cache prompt prefixes when you anticipate user interaction: ```swift // Basic prewarming - loads model resources into memory .task { await setupSession() session?.prewarm() // Call when user interaction is likely within seconds } // Prewarming with prompt prefix caching func prepareForUserInput() { // When user starts typing in a text field let commonPrefix = Prompt("You are a helpful writing assistant. The user is asking about:") session?.prewarm(promptPrefix: commonPrefix) } // Smart prewarming based on UI state .onChange(of: isTextFieldFocused) { focused in if focused { // User is about to type - prewarm with known context session?.prewarm(promptPrefix: sessionInstructions) } } ``` ### Prewarming Best Practices Here are practices for prewarming that you can use as a starting point. Consider prewarming when the user begins typing in a text input field, navigates to an AI-enabled screen, or when your app transitions to foreground with AI features visible. When using prompt prefix caching, focus on instruction patterns that multiple prompts share, such as session instructions or conversation context. Avoid including user-specific content that will not be reused across requests, as this reduces the effectiveness of the caching mechanism. Prewarming does not guarantee immediate resource loading and may be less effective when your app runs in the background or when the system is under load. ```swift // Example: Smart prewarming in a chat app class ChatViewModel: ObservableObject { private var session: LanguageModelSession? func handleUserStartedTyping() { // User began typing - prewarm with conversation context let contextPrefix = buildConversationContext() session?.prewarm(promptPrefix: contextPrefix) } private func buildConversationContext() -> Prompt { // Build a prefix from recent conversation history let recentMessages = conversationHistory.suffix(3) let context = recentMessages.map { "\($0.role): \($0.content)" } .joined(separator: "\n") return Prompt(context) } } ``` This approach can reduce response latency by preprocessing common prompt patterns before the user submits the actual request. ## What's Next You now understand how to check availability, create single- and multi-turn sessions, and guide behavior using instructions and prompts. You also know how to handle common errors and optimize for performance. The best way to internalize these concepts is to apply them. Try building a simple feature using the Foundation Models framework, experiment with prompts in the playground, and adjust sampling parameters to observe how responses change. With sessions established, the next chapter explores streaming and snapshots - Foundation Models' approach to building responsive UIs with real-time updates. You will see how to create interfaces that feel alive as the model generates content. --- ## Chapter 03: Streaming and Snapshots Canonical URL: https://www.rudrank.ai/foundation-models/streaming-and-snapshots Build responsive interfaces around Foundation Models snapshots instead of token-by-token deltas. I remember back in 2023, I was working for a social music startup, and I created an AI playlist generator feature. In the beginning, I was fetching the entire response at once, and it took ages with the GPT 3.5 API, and the heavy custom prompt that I had to write to get the playlist to sound like the user's taste. When I introduced streaming, it completely made the feature feel alive. The user could see the playlist being generated in real-time, and it was a joy to use. But, oh boy, that was a pain to implement. I had to accumulate the schema myself, and then parse the JSON after every delta to get the playlist to update. It was a mess with the brittle implementation that I had to maintain. But, Foundation Models framework has a snapshot approach that is different from what you might expect. You can get a partially-filled object that you can render immediately and update as new fields arrive. With the correct animations, it feels like magic. ## Prerequisites and Context This chapter builds on the session patterns from the previous chapter. You should be comfortable creating sessions, handling basic responses, and understanding the distinction between instructions and prompts. Now you make those responses feel alive with streaming. ## What You Will Learn By the end of this chapter, you will be able to: - Stream both plain text and structured results using snapshots - Build responsive UIs that update as fields populate - Handle cancellation and multiple concurrent streams - Test streaming behavior effectively - Choose when to stream vs generate complete responses ## Why Snapshots Instead of Token Deltas? Most AI frameworks stream raw token deltas that are tiny text chunks that you accumulate yourself. I have built chat interfaces this way, and it works fine for plain text, but becomes painful when your response has structure. You end up re-parsing JSON after every delta, hoping the accumulated text is valid. Foundation Models' snapshot approach is much more elegant. Instead of raw tokens, you receive complete, partially-populated objects where properties become non-nil progressively. This plays beautifully with SwiftUI: bind to the partially generated type and let the UI update naturally as fields arrive. Using the session patterns from the previous chapter, streaming becomes a natural extension rather than a completely different approach. ## Streaming Plain Text For plain text responses, streaming works similarly to traditional chat interfaces, but with the session-based approach we established earlier. The key difference is receiving incremental content you can append to your view: ```swift import FoundationModels let session = LanguageModelSession() let stream = session.streamResponse(to: prompt) for try await chunk in stream { print(chunk.content) // Update UI progressively } let final = try await stream.collect() print("Final: \(final.content)") ``` ### Physiqa Example: Live Chat Streaming Pattern The Zenther workout assistant demonstrates a clean streaming implementation that updates the UI in real-time. The key insight is letting the session's transcript handle UI updates automatically: ```swift @MainActor func sendMessage(_ content: String) async { isLoading = session.isResponding do { // Stream response from current session let responseStream = session.streamResponse(to: Prompt(content)) for try await _ in responseStream { // The streaming automatically updates the session transcript // UI observes session.transcript for real-time updates } } catch { errorMessage = handleFoundationModelsError(error) showError = true } isLoading = session.isResponding } ``` This pattern eliminates manual content accumulation by using Foundation Models' built-in transcript management. The UI simply observes `session.transcript` changes, and SwiftUI automatically updates as streaming content arrives. This approach is particularly effective for chat interfaces where conversation history is important. ## Streaming Structured Results You can also stream structured results with `@Generable` types, which you will explore in detail in the next chapter on structured generation. You receive `PartiallyGenerated` instances with optional properties that fill in over time: ```swift import FoundationModels import SwiftUI @Generable struct StoryOutline { @Guide(description: "Story title") let title: String @Guide(description: "Main character") let protagonist: Character @Guide(description: "3–5 plot points") let plotPoints: [String] } @Generable struct Character { @Guide(description: "Character name") let name: String @Guide(description: "Character background") let background: String } struct StreamingStoryView: View { @State private var partial: StoryOutline.PartiallyGenerated? @State private var final: StoryOutline? @State private var isStreaming = false @State private var error: String? var body: some View { VStack(alignment: .leading, spacing: 16) { if let partial { // Title Text(partial.title ?? "…") .font(.title) .redacted(reason: partial.title == nil ? .placeholder : []) // Protagonist Group { Text("Protagonist") .font(.headline) Text(partial.protagonist?.name ?? "…") .redacted(reason: partial.protagonist?.name == nil ? .placeholder : []) Text(partial.protagonist?.background ?? "…") .foregroundColor(.secondary) .redacted(reason: partial.protagonist?.background == nil ? .placeholder : []) } // Plot points Group { Text("Plot Points") .font(.headline) if let points = partial.plotPoints { ForEach(points.indices, id: \.self) { i in Text("• \(points[i])") } } else { ForEach(0..<3, id: \.self) { _ in Text("• …").redacted(reason: .placeholder) } } } } if let final { Divider() Text("Complete!").font(.caption).foregroundColor(.green) Text("\(final.title)") } if let error { Text(error).foregroundColor(.red) } HStack { Button(isStreaming ? "Streaming…" : "Generate") { Task { await generate() } } .disabled(isStreaming) } } .padding() } private func generate() async { isStreaming = true error = nil partial = nil final = nil let session = LanguageModelSession() let stream = session.streamResponse( to: "Create a story outline about a time-traveling detective", generating: StoryOutline.self ) do { for try await snapshot in stream { await MainActor.run { partial = snapshot.content } } let completed = try await stream.collect() await MainActor.run { final = completed.content } } catch LanguageModelSession.GenerationError.guardrailViolation { await MainActor.run { error = "Content blocked by safety guardrails" } } catch { await MainActor.run { error = error.localizedDescription } } isStreaming = false } } ``` This approach eliminates the parsing headaches I mentioned earlier: - `PartiallyGenerated` mirrors your final type with optional properties - You can render placeholders or redacted views (skeleton UI) until fields arrive - `collect()` returns the complete final value when streaming ends - No manual JSON parsing or delta accumulation required ## Cancellation and Backpressure Users may change their minds mid-generation, especially with longer streaming responses. Supporting cancellation keeps your UI responsive and avoids wasted computational work - important on mobile devices: ```swift actor StreamController { private var task: Task? func start(_ operation: @escaping () async throws -> T) { cancel() task = Task(priority: .userInitiated) { try await operation() } } func cancel() { task?.cancel() task = nil } func join() async throws -> T? { let result = try await task?.value task = nil return result } } ``` Use this to ensure only one stream runs at a time. Always cancel the previous before starting a new one because the framework only supports one request at a time. ## Case Study: Pokémon Snapshot Streaming Here is a real-world snapshot pattern adapted from a Pokémon analysis example: a rich `@Generable` model, a service that streams analysis, and a SwiftUI view that renders partials. ### The model ```swift import FoundationModels @Generable struct PokemonAnalysis: Equatable { @Guide(description: "An epic title for this Pokemon analysis") let title: String @Guide(description: "The Pokemon's name") let pokemonName: String @Guide(description: "The Pokemon's Pokedex number") let pokedexNumber: Int @Guide(description: "Primary and secondary types") let types: [PokemonType] @Guide(description: "Overall competitive tier rating") let competitiveTier: CompetitiveTier } @Generable struct PokemonType: Equatable { let name: String let colorDescription: String } @Generable enum CompetitiveTier: String { case ubers = "Ubers", overUsed = "OU (OverUsed)", underUsed = "UU (UnderUsed)", rarelyUsed = "RU (RarelyUsed)", neverUsed = "NU (NeverUsed)", littleCup = "LC (Little Cup)" } ``` ### The streaming service ```swift import FoundationModels import Observation @Observable @MainActor final class PokemonAnalyzer { private(set) var analysis: PokemonAnalysis.PartiallyGenerated? private let session = LanguageModelSession() private var currentTask: Task? func analyzePokemon(_ identifier: String) async { currentTask?.cancel() currentTask = Task { let stream = session.streamResponse( generating: PokemonAnalysis.self, includeSchemaInPrompt: true, options: GenerationOptions(temperature: 0.2) ) { "Analyze: \(identifier). Provide name, pokedex number, types, and tier." } for try await snapshot in stream { try Task.checkCancellation() analysis = snapshot.content } } _ = try? await currentTask?.value } func stop() { currentTask?.cancel() } } ``` ### The SwiftUI view ```swift import SwiftUI import FoundationModels struct StreamingPokemonView: View { let analysis: PokemonAnalysis.PartiallyGenerated var body: some View { VStack(spacing: 12) { Text(analysis.title ?? "…") .font(.title).bold() .redacted(reason: analysis.title == nil ? .placeholder : []) if let name = analysis.pokemonName { Text("Name: \(name)") } else { Text("Name: …").redacted(reason: .placeholder) } if let number = analysis.pokedexNumber { Text("No. \(number)") } if let types = analysis.types { Text("Types: \(types.map { $0.name }.joined(separator: ", "))") } if let tier = analysis.competitiveTier { Text("Tier: \(tier.rawValue)") } } .frame(maxWidth: .infinity, alignment: .leading) .padding() } } ``` This pattern highlights how `PartiallyGenerated` keeps your UI responsive: render what is ready (title/name) while deeper fields (types, tier) arrive. ### Stable IDs for streaming lists When streaming arrays of nested items, give each element a stable identity so SwiftUI diffing remains smooth as fields appear. Use `GenerationID()` in your `@Generable` types and reference it in your views: ```swift @Generable struct AbilityAnalysis: Equatable { var id = GenerationID @Guide(description: "The ability's name") let name: String @Guide(description: "The ability's strategic use") let strategicUse: String } // In your view if let abilities = analysis.abilities { ForEach(abilities, id: \.id) { ability in Text(ability.name ?? "…") .redacted(reason: ability.name == nil ? .placeholder : []) } } ``` The same approach works for other nested arrays (types, evolutions, matchups, etc.). ### Error handling patterns Surface friendly messages and recover where possible: ```swift do { for try await snapshot in stream { /* update UI */ } } catch LanguageModelSession.GenerationError.concurrentRequests(_) { // A previous request is still running showMessage("Please wait for the current request to finish.") } catch LanguageModelSession.GenerationError.rateLimited(_) { // Background or system rate limit reached showMessage("Too many requests right now. Please try again shortly.") } catch LanguageModelSession.GenerationError.guardrailViolation { showMessage("I can’t help with that request.") } catch LanguageModelSession.GenerationError.refusal(_, _) { showMessage("I can’t produce the requested type of answer.") } catch { showMessage(error.localizedDescription) } ``` ### Prewarming for responsiveness You can nudge the system to get ready sooner: ```swift let session = LanguageModelSession() session.prewarm() // warming hint; not a guarantee ``` Prewarm doesn’t guarantee immediate readiness and may be less effective in the background or under load. ## When to Stream vs Generate Once Choosing between streaming and complete generation depends on your use case: **Stream** when you want responsiveness and progressive disclosure (chat, forms, outlines, previews), when users benefit from seeing partial results (long summaries, structured data), or when you need to populate multiple fields incrementally. **Generate once** when the response is small and streaming adds no perceived value, or when you need all-or-nothing output for further processing. ## What's Next You now understand Foundation Models' unique snapshot streaming approach, which builds naturally on the session patterns from earlier chapters. The next chapter explores generation options and sampling control - learning how to fine-tune the model's behavior with temperature, token limits, and sampling strategies. These controls apply to both regular and streaming responses, giving you precise control over the content you just learned to stream. --- ## Chapter 04: Generation Options and Sampling Control Canonical URL: https://www.rudrank.ai/foundation-models/generation-options-and-sampling-control Tune sampling behavior with generation options while keeping on-device output practical and predictable. Once you understand the basics of sessions, you may want to explore different ways to control how the model generates responses. Foundation Models provides controls for customizing the output behavior. This chapter covers the different options available and how to use them. ## Prerequisites and Context This chapter builds on the session and streaming concepts from earlier chapters. You should be comfortable creating sessions and working with streaming responses before exploring generation controls. These options affect all model interactions - the streaming you just learned, as well as structured output and tool calling that you will learn next. ## What You Will Learn By the end of this chapter, you will be able to: - Control response creativity and predictability using temperature settings - Choose between different sampling strategies (greedy, top-K, top-P) based on your use case - Set appropriate token limits to fit your UI requirements - Combine generation options for specific scenarios like creative writing vs technical documentation - Understand the differences between Foundation Models and MLX Swift parameter controls ## Understanding Generation Options You can customize how the model generates responses using `GenerationOptions`. The framework provides simpler controls compared to MLX Swift, focusing on the parameters that matter most for on-device experiences. ### Understanding Temperature Temperature influences the confidence of the model's response and must be between 0 and 1 (inclusive). The following are some examples: ```swift // Temperature: 0.1 - Very predictable, focused responses let preciseOptions = GenerationOptions(temperature: 0.1) // Temperature: 0.7 - Balanced creativity and coherence let balancedOptions = GenerationOptions(temperature: 0.7) // Temperature: 1.0 - Maximum creativity within bounds let creativeOptions = GenerationOptions(temperature: 1.0) // System default - Let Foundation Models choose optimal temperature let defaultOptions = GenerationOptions(temperature: nil) ``` **How Temperature Works:** Temperature adjusts the probability distribution before sampling. A value of 1.0 results in no adjustment, while values less than 1.0 make the probability distribution sharper. - **Low (0.1-0.3)**: Makes likely tokens even more likely, resulting in stable and predictable responses - **Medium (0.5-0.7)**: Good balance for most conversational tasks - **High (0.8-1.0)**: Gives the model more creative license while staying coherent - **nil**: Lets the system choose a reasonable default automatically ### Token Limits for UI Control The `maximumResponseTokens` parameter prevents responses from overwhelming your interface: ```swift // Short responses for UI cards or notifications let briefOptions = GenerationOptions(maximumResponseTokens: 50) // Medium responses for chat interfaces let chatOptions = GenerationOptions(maximumResponseTokens: 200) // Longer responses for content generation let detailedOptions = GenerationOptions(maximumResponseTokens: 500) ``` ### Tight Output Constraints for UX In my app, Zenther, I embed length requirements directly in the system instructions for the widget and notification instructions. I did not want to rely on the token limits to avoid truncated notifications. I experimented with the number of words to create a consistent experience across different device sizes. Here is the widget instructions for the ultra-brief encouragements: ```swift public static let widget = """ Generate brief, natural encouragements for completed workouts. Be genuine and conversational, like a supportive friend. Keep it simple and authentic, under 100 characters. """ ``` Here is the instructions for the structured constraints for push notifications: ```swift public static let notification = """ You are a fitness coach generating notifications for workout achievements and milestones. For workout notifications: Title 4-6 words max, body 1-2 motivating sentences. For milestone notifications: Title 3-5 words celebrating milestone, body 1 encouraging sentence. Tone: Supportive, energetic, celebratory. """ ``` ### Parameter Combinations Here are some combinations for common use cases. You can use these as a starting point and adjust them to your needs. ```swift // Content summarization with focused and concise responses let summaryOptions = GenerationOptions( temperature: 0.2, maximumResponseTokens: 150 ) // Creative writing with more varied and expressive responses let storyOptions = GenerationOptions( temperature: 0.9, maximumResponseTokens: 400 ) // Technical assistance with precise and reliable responses let technicalOptions = GenerationOptions( temperature: 0.1, maximumResponseTokens: 300 ) // Casual conversation with natural and engaging responses let chatOptions = GenerationOptions( temperature: 0.7, maximumResponseTokens: 200 ) ``` ### Applying Generation Options Use the options parameter in your session calls: ```swift let session = LanguageModelSession( instructions: Instructions("You are a helpful writing assistant.") ) // Generate creative content let story = try await session.respond( to: "Write a short story about a robot learning to paint", options: storyOptions ) // Generate focused summary let summary = try await session.respond( to: "Summarize the key points from this article: [article text]", options: summaryOptions ) ``` Here is the output for the story: ``` In the heart of a futuristic city, where towering skyscrapers kissed the clouds and neon lights painted the sky in mesmerizing hues, there existed a laboratory tucked away in a quiet corner. This laboratory was the birthplace of AR-1, a prototype robot designed with a singular focus: to understand and mimic human emotions through art. Born from years of advanced robotics research, AR-1 stood out among its brethren. With a sleek, metallic frame adorned with sensors and glowing LED displays, AR-1 was programmed not only to interpret emotions but also to convey them through colors and brushstrokes. Yet, despite its potential, AR-1 struggled with the most challenging task—to create art that truly expressed emotion. Its creator, Dr. Elara Finch, was a visionary known for her groundbreaking work in affective computing. She believed that art was humanity's universal language and saw AR-1 as her greatest creation yet. Despite the robot's frustration, Dr. Finch's unwavering support was ``` Note that with a maximum of 200 tokens, the story is cut off because the prompt asked for a "short story" without specifying a length constraint. ## Sampling Modes Foundation Models also provides three sampling strategies that control how the model picks tokens. ### Greedy Sampling This method always chooses the most likely token, resulting in deterministic but potentially repetitive output: ```swift let greedyOptions = GenerationOptions( sampling: .greedy, temperature: nil // Temperature is ignored with greedy sampling ) ``` ### Top-K Sampling (Random with Fixed Pool) This method considers a fixed number of high-probability tokens, then randomly selects from that pool: ```swift // Consider top 50 most likely tokens let topKOptions = GenerationOptions( sampling: .random(top: 50, seed: nil), temperature: 0.7 ) // Reproducible results with seed let seededTopKOptions = GenerationOptions( sampling: .random(top: 30, seed: 12345), temperature: 0.8 ) ``` Top-K behavior: - **Smaller K (10-30)**: More deterministic, confident answers - **Larger K (50-100)**: More creative, varied responses - Fixed pool size regardless of probability distribution ### Top-P Sampling (Nucleus Sampling) This method considers a variable number of tokens based on cumulative probability threshold: ```swift // Consider tokens until 90% probability mass is reached let topPOptions = GenerationOptions( sampling: .random(probabilityThreshold: 0.9, seed: nil), temperature: 0.7 ) // More conservative nucleus sampling let conservativeTopPOptions = GenerationOptions( sampling: .random(probabilityThreshold: 0.8, seed: 42), temperature: 0.6 ) ``` Top-P behavior: - **Lower threshold (0.6-0.8)**: Smaller, more focused token pools - **Higher threshold (0.9-0.95)**: Larger pools, more creativity - Pool size adapts to probability distribution (smaller when spiked, larger when flat) ### System Default (Recommended) Let the model choose the optimal strategy: ```swift let systemDefaultOptions = GenerationOptions( sampling: nil, temperature: nil ) ``` ## Foundation Models vs MLX Swift Parameters Foundation models provides both simplified and advanced controls compared to MLX Swift. The following is a comparison of the parameters: | Foundation Models | MLX Swift Equivalent | Purpose | |------------------|---------------------|---------| | `temperature` (0.0-1.0) | `temperature` (unlimited) | Controls randomness/creativity | | `maximumResponseTokens` | `maxTokens` | Limits response length | | `.greedy` sampling | N/A | Deterministic token selection | | `.random(top: k)` | `topK` | Top-K sampling | | `.random(probabilityThreshold:)` | `topP` | Nucleus sampling | | `seed` parameter | N/A | Reproducible randomness | | *(Not available)* | `repetitionPenalty` | Reduces repetitive output | | *(Not available)* | `repetitionContextSize` | Repetition penalty scope | It offers more sampling strategies but constrains temperature and lacks repetition penalties. The focus is more on defaults with controls when you need them. ## Parameter Tuning Tips Here are some tips for parameter tuning: - **Start with Defaults**: Use `nil` for both temperature and sampling to let the system choose optimal values - **Adjust Gradually**: If defaults do not work according to your taste, make small temperature adjustments (±0.1-0.2) - **Temperature Range**: In the latest beta update of the framework, the temperature range is limited to 0.0-1.0 (unlike MLX Swift's higher values) - **Match Task to Temperature**: Factual tasks need low temperature (0.1-0.3), creative tasks can use higher values (0.7-1.0) - **Consider Context Length**: Remember that longer conversations use more of your token budget (query the limit with `SystemLanguageModel.default.contextSize`) and that goes for setting a longer and detailed instruction as well. On iOS 26.4 and later, you can measure exactly how many tokens a prompt will consume with `SystemLanguageModel.default.tokenUsage(for:)` before sending it ## What's Next Understanding generation options gives you precise control over how Foundation Models behaves in your apps. Start with system defaults and adjust based on your specific use case—for many scenarios, the defaults work well without any tuning. Now that you can control how the model generates responses, the next chapter explores structured generation with schemas. You will learn how to transform unstructured AI responses into type-safe Swift objects, applying the generation controls you just learned to produce reliable, structured data for your apps. --- ## Chapter 05: Structured Generation with Schemas Canonical URL: https://www.rudrank.ai/foundation-models/structured-generation-with-schemas Use @Generable, @Guide, and schemas to turn model output into type-safe Swift data. You use Foundation Models' structured generation to produce properly typed Swift objects directly from the model instead of parsing unstructured text. You define the data structure using macros, and the framework guarantees type-safe results by constraining output to your schema during token generation. Early work with LLMs often required brittle parsing of free-form text. Structured generation eliminates that overhead and improves reliability in production code. ## Prerequisites and Context This chapter builds on the streaming concepts from the streaming chapter and the session patterns from the sessions chapter. You should understand how to create sessions, control generation parameters, and work with streaming responses. The snapshot streaming patterns from the streaming chapter become more powerful with structured generation, allowing you to watch complex Swift objects populate field by field. ## What You Will Learn By the end of this chapter, you will be able to: - Transform unstructured AI responses into type-safe Swift objects using `@Generable` - Guide content generation with `@Guide` attributes for precise control - Build complex nested structures for rich data models - Handle optional fields and collections effectively - Stream structured content with real-time field population - Apply structured generation to practical scenarios like journaling and content analysis ## Traditional AI Responses With normal AI responses, you get unstructured text that requires manual parsing: ```swift // Traditional approach: Parse unstructured text let response = try await session.respond(to: "Recommend a book about MusicKit") // Response: "I recommend 'Exploring MusicKit' by Rudrank Riyam. It is the best guide available on the internet..." // Now manually extract title, author, description... ``` Foundation Models provides type-safe results to avoid this parsing step using constrained decoding. Constrained decoding guides the model's output to meet specific structural requirements, matching the given schema. Here is a summarized breakdown: - **Input and Constraints**: The process begins with an input prompt or query that includes the desired structure or constraints. > **By default, the methods in the framework includes the schema of the structure as part of the input in a specific format that the model has been trained on.** - **Decoding Process**: The model processes the input while considering these constraints. During decoding, the model generates potential outputs that follow these constraints while masking tokens that are not valid. - **Output Formatting**: Once the model generates a draft response, it involves additional steps to format it according to the specified structure. - **Validation and Correction**: The output is validated against the constraints to ensure it meets all requirements. ## Understanding the @Generable Macro The `@Generable` macro is the foundation of structured generation in Foundation Models. This Swift macro transforms your data structures into compatible schemas while maintaining type safety. When you apply `@Generable` to a structure or enumeration, the macro automatically: - **Generates conformance** to the `Generable` protocol - **Creates initialization methods** from AI-generated content - **Produces JSON schemas** that guide the AI model's output format - **Handles type conversion** between AI responses and Swift types The macro supports both structures and enumerations: ```swift @Generable struct RecipeRecommendation { @Guide(description: "The name of the dish") let name: String @Guide(description: "Brief cooking instructions") let instructions: String @Guide(description: "The cuisine type this recipe belongs to") let cuisine: CuisineType } @Generable enum CuisineType { case italian case mexican case asian case mediterranean case indian } ``` ## Understanding the @Guide Macro The `@Guide` macro fine-tunes how the model generates content for specific properties of your `@Generable` types. It provides three ways to influence the output generation: ### Description-Based Guidance The most common use provides descriptive text to guide the model: ```swift @Generable struct RecipeDetails { @Guide(description: "A clear recipe name, maximum 60 characters") let name: String @Guide(description: "Cooking time in minutes, between 5 and 240") let cookingTime: Int @Guide(description: "Step-by-step cooking instructions") let instructions: String @Guide(description: "The cuisine type this recipe belongs to") let cuisine: CuisineType } ``` Here is an example of using the `RecipeDetails` structure to generate a recipe: ```swift #Playground { let session = LanguageModelSession() let recipe = try await session.respond( to: "Butter chicken biryani", generating: RecipeDetails.self ) } ``` The model returns a fully structured `RecipeDetails` object: ``` recipe: RecipeDetails name: "Butter Chicken Biryani" cookingTime: 60 instructions: "1. Soak basmati rice in water for 30 minutes. 2. In a separate pan, cook chicken pieces with butter, cream, and spices until tender. 3. Layer rice and chicken in a biryani pot. 4. Repeat layers and top with fried onions and nuts. 5. Cook on low heat until rice is fully cooked." cuisine: CuisineType.indian ``` Now the unstructured example with Foundation Models becomes: ```swift @Generable struct BookRecommendation { @Guide(description: "The book title") let title: String @Guide(description: "Author name") let author: String @Guide(description: "Brief description") let description: String } // Get structured result let book = try await session.respond( to: "Recommend a book about MusicKit", generating: BookRecommendation.self ) // Use immediately in Swift code print("Book: \(book.title) by \(book.author)") // Book: "Exploring MusicKit" by Rudrank Riyam ``` ### Regex Pattern Matching For string properties requiring specific formats, use regex patterns: ```swift @Generable struct RecipeMetadata { @Guide(description: "Recipe difficulty level", /^(Easy|Medium|Hard)$/) let difficulty: String @Guide(description: "Prep time in HH:MM format", /^\d{2}:\d{2}$/) let prepTime: String } ``` The `@Guide` macro is there to ensure the model generates content that matches your specified patterns, improving the output of structured generation. ## Nested Structures You can use nested `@Generable` types for complex data relationships: ```swift @Generable struct StoryOutline { @Guide(description: "Compelling story title") let title: String @Guide(description: "Main character details") let protagonist: Character @Guide(description: "Story setting") let setting: Setting @Guide(description: "Major plot points") let plotPoints: [PlotPoint] } @Generable struct Character { @Guide(description: "Character name") let name: String @Guide(description: "Character motivation") let motivation: String } @Generable struct Setting { @Guide(description: "Time period") let timePeriod: String @Guide(description: "Primary location") let location: String } @Generable struct PlotPoint { @Guide(description: "Brief description of the plot event") let description: String @Guide(description: "Order in the story: beginning, middle, end") let position: StoryPosition } @Generable enum StoryPosition: String, CaseIterable { case beginning, middle, end } ``` You can then generate a complete story outline with nested details: ```swift #Playground { let session = LanguageModelSession(instructions: "You are a creative writing assistant.") let outline = try await session.respond( to: "Create a story outline about a time-traveling detective", generating: StoryOutline.self ).content print(outline) } ``` Here is the output: ``` StoryOutline( title: "Echoes of Time: The Detective's Journey", protagonist: Character( name: "Detective Elara Quinn", motivation: "Seeking closure for her late brother, whose mysterious death remains unsolved" ), setting: Setting( timePeriod: "21st Century", location: "Modern-day New York City" ), plotPoints: [ PlotPoint( description: "Detective Elara Quinn discovers an experimental time-travel device in her late brother's study.", position: .beginning), PlotPoint( description: "Using the device, Elara travels to the 1920s to investigate a series of unsolved murders in New York City.", position: .middle), PlotPoint( description: "In the past, Elara encounters a rival detective who seems eerily familiar and helps her uncover a hidden conspiracy.", position: .middle), PlotPoint( description: "Returning to the present, Elara realizes the rival detective from the past is somehow connected to her brother's death.", position: .middle), PlotPoint( description: "Elara races against time to prevent a future catastrophe linked to the events she uncovered.", position: .end), PlotPoint( description: "With the truth revealed, Elara finds peace and closure, understanding her brother's fate was never meant to be.", position: .end) ] ) ``` ## Optional Fields and Collections Not all data is required. Foundation Models handles optional properties and dynamic arrays: ```swift @Generable struct Event { @Guide(description: "Event name") let title: String @Guide(description: "Event date in YYYY-MM-DD format") let date: String @Guide(description: "Location (if specified)") let location: String? @Guide(description: "Attendee count (if known)") let attendeeCount: Int? } ``` You can then generate an event: ```swift #Playground { let session = LanguageModelSession(instructions: "You are a event assistant.") let event = try await session.respond( to: "Schedule a team meeting for tomorrow at 2 PM", generating: Event.self ) print(event.content) } ``` And here is the output: ``` Event( title: "Team Meeting", date: "2025-08-27", location: nil, attendeeCount: nil ) ``` ## Advanced @Guide Features The @Guide macro supports constraints for more control: ### Count Requirements You can also specify exact counts or ranges using the `Guide` macro. It uses constrained decoding to guarantee the exact count or a value in the given range: ```swift @Generable struct ProductReview { @Guide(description: "Product name") let name: String @Guide(description: "Exactly 3 key features", .count(3)) let keyFeatures: [String] @Guide(description: "2-5 pros of the product") let pros: [String] @Guide(description: "Brief summary, 50-100 words") let summary: String @Guide(description: "Rating from 1 to 5 stars", .range(1...5)) let rating: Int } ``` Here is a practical example using the ProductReview with count and range constraints: ```swift #Playground { let session = LanguageModelSession( instructions: "You are a helpful product review analyzer." ) let review = try await session.respond( to: "Analyze this iPhone 16 Pro: Amazing camera system with 5x optical zoom, excellent build quality, and great performance. Battery life could be better for heavy users and it is quite expensive.", generating: ProductReview.self ) print(review.content) } ``` And here is the output: ``` ProductReview( name: "iPhone 16 Pro", keyFeatures: [ "Amazing camera system with 5x optical zoom", "Excellent build quality", "Great performance" ], pros: [ "Excellent camera system", "Excellent build quality", "Great performance" ], summary: "The iPhone 16 Pro boasts a top-notch camera system, premium build, and stellar performance. However, it may not last long on a single charge for heavy users and comes at a steep price.", rating: 4 ) ``` Notice how the model respected the constraints: - Exactly 3 key features as specified by `.count(3)` - 3 pros (the model interpreted the "2-5 pros" instruction from the description) - Rating of 4, within the 1-5 numeric range specified by `.range(1...5)` The `.range()` constraint with ranges like `.range(2...5)` will not work for the number of items in the array. For array length constraints, it is better to use clear descriptions like "2-5 pros of the product" and let the model interpret the instruction naturally. ## Journaling and Structured Analysis Foundation Models can also analyze personal writing and extract structured information. For example, you can use it to analyze a journal entry: ```swift @Generable struct JournalAnalysis { @Guide(description: "Emotional tone: happy, sad, excited, anxious, reflective, grateful, frustrated") let mood: Mood @Guide(description: "Key topics or themes mentioned in the entry") let topics: [String] @Guide(description: "One-sentence summary of the main point") let summary: String @Guide(description: "A thoughtful, personalized question that helps with self-reflection based on the specific journal entry content") let nextPrompt: String? @Guide(description: "Life areas mentioned: work, relationships, health, hobbies, goals") let lifeAreas: [LifeArea] } @Generable enum Mood: String, CaseIterable { case happy, sad, excited, anxious, reflective, grateful, frustrated, neutral } @Generable enum LifeArea: String, CaseIterable { case work, relationships, health, hobbies, goals, family, travel, learning } @Generable struct Challenge { @Guide(description: "Description of the challenge or concern") let description: String @Guide(description: "Severity level from 1-10", .range(1...10)) let severity: Int } @Generable enum Timeline: String, CaseIterable { case shortTerm = "short-term" case mediumTerm = "medium-term" case longTerm = "long-term" } ``` ### Journaling Example Here is how a journaling app like Day One might use this: ```swift #Playground { let session = LanguageModelSession( instructions: "You are a thoughtful journaling assistant. Help users reflect on their experiences." ) let journalAnalysis = try await session.respond( to: """ Today is my last day in Chiang Mai. I wanted to go shopping for some muay thai shorts but there is a storm outside so I am stuck in a cafe working on this book instead. I hope the storm weakens because I have a flight to Bangkok later tonight and I cannot miss it otherwise I will miss my subsequent flights to India. I know it is not in my control but it worries me. Working on this book is a good distraction. """, generating: JournalAnalysis.self ).content print(journalAnalysis) } ``` And here is the output: ``` JournalAnalysis( mood: .anxious, topics: ["shopping plans disrupted", "flight concerns", "working on a book as a distraction"], summary: "The user is anxious about missing their flight to Bangkok due to a storm, which has disrupted their shopping plans in Chiang Mai.", nextPrompt: "How can I manage my anxiety about missing this flight while staying present in the moment?", lifeAreas: [.travel, .work] ) ``` ### Nested Structures for Detailed Analysis For more sophisticated journaling apps, you can create nested structures to analyze the journal entry in more detail: ```swift @Generable struct DetailedJournalInsight { @Guide(description: "Overall emotional analysis") let emotional: EmotionalAnalysis @Guide(description: "Goals and aspirations mentioned") let goals: [Goal] @Guide(description: "Challenges or concerns raised") let challenges: [Challenge] @Guide(description: "Growth opportunities identified") let growthAreas: [String] } @Generable struct EmotionalAnalysis { @Guide(description: "Primary emotion expressed") let primary: Mood @Guide(description: "Secondary emotions present") let secondary: [Mood] @Guide(description: "Emotional intensity from 1-10", .range(1...10)) let intensity: Int } @Generable struct Goal { @Guide(description: "The specific goal or aspiration") let description: String @Guide(description: "Timeline mentioned: short-term, medium-term, long-term") let timeline: Timeline @Guide(description: "Confidence level in achieving this goal from 1-10", .range(1...10)) let confidence: Int } @Generable struct Challenge { @Guide(description: "Description of the challenge or concern") let description: String @Guide(description: "Severity level from 1-10", .range(1...10)) let severity: Int } @Generable enum Timeline: String, CaseIterable { case shortTerm = "short-term" case mediumTerm = "medium-term" case longTerm = "long-term" } ``` Here is how to use the detailed analysis: ```swift #Playground { let session = LanguageModelSession( instructions: "You are an expert journal analyst. Analyze emotions, goals, and challenges in detail." ) let detailedInsight = try await session.respond( to: """ Today is my last day in Chiang Mai. I wanted to go shopping for some muay thai shorts but there is a storm outside so I am stuck in a cafe working on this book instead. I hope the storm weakens because I have a flight to Bangkok later tonight and I cannot miss it otherwise I will miss my subsequent flights to India. I know it is not in my control but it worries me. Working on this book is a good distraction. """, generating: DetailedJournalInsight.self ).content print(detailedInsight) } ``` And here is the output: ``` DetailedJournalInsight( emotional: EmotionalAnalysis( primary: .anxious, secondary: [.neutral, .frustrated], intensity: 7 ), goals: [ Goal( description: "Shop for muay thai shorts", timeline: .shortTerm, confidence: 8 ), Goal( description: "Complete the book", timeline: .longTerm, confidence: 6 ) ], challenges: [ Challenge( description: "Storm delaying flights", severity: 8 ), Challenge( description: "Missed flights to Bangkok and India", severity: 9 ) ], growthAreas: [ "Developing resilience in handling unexpected delays", "Enhancing time management skills to meet travel deadlines" ] ) ``` ## Nutrition Data Parsing One of the cool features in Zenther uses the new vision API that can analyze the table of a nutrition label and get the nutrition data in a structured format. I used `@Generable` types to eliminate the manual work of parsing nutrition data. For voice-based food logging, I used `@Generable` types to parse natural language descriptions into structured nutrition data: ```swift @available(iOS 26, *) @Generable struct NutritionParseResult: Codable { @Guide(description: "Clear name of the food item(s), e.g. 'Oatmeal with banana and almonds'") let foodName: String @Guide(description: "Total calories for the portion, rounded to whole numbers") let calories: Double @Guide(description: "Total protein in grams, rounded to one decimal place") let proteinGrams: Double @Guide(description: "Total carbohydrates in grams, rounded to one decimal place") let carbsGrams: Double @Guide(description: "Total fat in grams, rounded to one decimal place") let fatGrams: Double } ``` And then get the transcription from the new speech analyzer API and parse it into the `NutritionParseResult` type: ```swift func parseFood(_ description: String) async throws -> ParsedFood { let prompt = """ Parse this food description into nutritional data: "\(description)" Examples of good parsing: "I had 2 scrambled eggs with toast" → Consider: 2 large eggs (~140 cal), 1 slice toast (~80 cal), cooking butter (~30 cal) "protein shake after workout" → Consider: 1 scoop protein powder (~120 cal) + milk/water "pizza slice for lunch" → Consider: 1 slice medium pizza (~280 cal) "handful of almonds" → Consider: ~20 almonds (~160 cal) Be realistic about portions people actually eat. Account for cooking methods and common additions. """ let response = try await nutritionSession.respond( to: prompt, generating: NutritionParseResult.self ) return response.content } ``` For scanning nutrition labels, I used more complex structured generation with precise calculation instructions because the labels use per-100g values and I had to also look into the values per serving: ```swift @available(iOS 26.0, *) @Generable public struct NutritionLabelParsing: Codable { @Guide(description: "Name of the food product, or 'Nutrition Label' if no specific name found") let foodName: String @Guide(description: "Serving size with unit, e.g. '1 cup', '30g', '1 slice'") let servingSize: String @Guide(description: "Calories per serving as a precise number (can include decimals)") let calories: Double @Guide(description: "Protein in grams per serving with full decimal precision (e.g., 7.8, 24.5, 12.3)") let proteinGrams: Double @Guide(description: "Total carbohydrates in grams per serving with full decimal precision (e.g., 45.2, 12.7)") let carbsGrams: Double @Guide(description: "Total fat in grams per serving with full decimal precision (e.g., 8.5, 15.3)") let fatGrams: Double @Guide(description: "Fiber in grams per serving with full decimal precision if available, otherwise 0") let fiberGrams: Double @Guide(description: "Total sugars in grams per serving with full decimal precision if available, otherwise 0") let sugarGrams: Double @Guide(description: "Sodium in milligrams per serving with full decimal precision if available, otherwise 0") let sodiumMg: Double } ``` The system instructions include detailed parsing rules that ensure accurate nutrition label extraction: ```swift public static let nutrition = """ You are a nutrition expert specializing in parsing nutrition labels and food data. CRITICAL PARSING RULES: STEP 1: IDENTIFY THE SERVING SIZE - Look for "Serving size:", "Per serving:", or similar indicators - This is your target serving size (e.g., "16.7g") STEP 2: IDENTIFY ALL PER-100G VALUES - Find ALL nutritional values in the per-100g column (ignore % RDA columns) - Common nutrients: Energy/Calories, Protein, Carbohydrate, Total Fat, Saturated Fat, Total Sugars, Added Sugars, Fiber, Sodium, Cholesterol STEP 3: CALCULATE EVERY VALUE USING SAME FORMULA - For EVERY single nutrient, use: serving_value = (per_100g_value × serving_size) ÷ 100 - Do NOT use any value directly - ALWAYS calculate MANDATORY CALCULATIONS (example for 16.7g serving): - Calories: (per_100g_calories × 16.7) ÷ 100 - Protein: (per_100g_protein × 16.7) ÷ 100 - Carbs: (per_100g_carbs × 16.7) ÷ 100 - Fat: (per_100g_fat × 16.7) ÷ 100 - Sugar: (per_100g_sugar × 16.7) ÷ 100 - Fiber: (per_100g_fiber × 16.7) ÷ 100 - Sodium: (per_100g_sodium × 16.7) ÷ 100 VERIFICATION CHECK: - If serving is 16.7g, ALL values should be roughly 1/6th of the per-100g values - If any value seems too high (like 38g sugar for 16.7g serving), you made an error PRECISION: - Round calories to nearest 0.1 kcal - Round macros to nearest 0.1g - Round sodium to nearest 1mg OUTPUT FORMAT: - Food name: Extract product name or use "Nutrition Label" if unclear - Serving size: Use the actual serving size from the label (e.g., "16.7g") - All nutrients: Use ONLY the calculated per-serving values (never use raw per-100g values) """ ``` This approach did the trick to accurately extract nutrition data from nutrition labels but it was still a bit fragile and I had to add some error handling, especially because the model is not good with calculating values per serving. ## Streaming with Structured Generation Streaming works with `@Generable` types as well, allowing you to watch structured data populate in real-time. This is useful for complex responses like forms, summaries, or structured content: ```swift @Generable struct StoryOutline { @Guide(description: "Story title") let title: String @Guide(description: "Main character details") let protagonist: Character @Guide(description: "List of 3-5 plot points") let plotPoints: [String] } @Generable struct Character { @Guide(description: "Character name") let name: String @Guide(description: "Character background") let background: String } func streamStoryOutline() async { do { let stream = session.streamResponse( to: "Create a story outline for a mystery novel", generating: StoryOutline.self ) for try await snapshot in stream { await MainActor.run { // snapshot.content is StoryOutline.PartiallyGenerated // Fields populate as the model generates them updateStoryUI(with: snapshot.content) } } // Get complete result let finalOutline = try await stream.collect() await MainActor.run { displayFinalOutline(finalOutline.content) } } catch { handleStreamingError(error) } } ``` ## What's Next If you have been working with AI responses in your apps, you know how frustrating it can be to parse unstructured text reliably. Structured generation with `@Generable` entirely eliminates that frustration. The type-safe Swift objects you get back are guaranteed to match your schema, and the streaming capabilities create genuinely responsive UIs. In practice, start small. Pick one place where you are currently parsing AI text and convert it to use `@Generable`. Then expand to more complex nested structures as you get comfortable with the patterns. With structured generation patterns established, the next chapter explores basic tool use for extending model capabilities beyond text generation. You will learn how to build tools that can access external data and perform actions, using the structured generation patterns you have learned here to create type-safe tool arguments and outputs. --- ## Chapter 06: Basic Tool Use Canonical URL: https://www.rudrank.ai/foundation-models/basic-tool-use Give the model carefully scoped tools so it can fetch data, call services, and act inside your app. Foundation Models supports tool calling to extend model capabilities beyond their training data. Tools are functions that the AI can call to access external data and perform actions. ## Prerequisites and Context This chapter builds on the session management from earlier chapters and structured generation concepts from the previous chapter. You should understand how to create sessions, handle responses, and work with `@Generable` types. Tools extend these capabilities by accessing real-world data and performing actions beyond what the model learned during training. ## What You Will Learn By the end of this chapter, you will be able to: - Understand how tool calling works and when to use it - Implement the `Tool` protocol to create custom functions for the AI - Use `@Generable` types for type-safe tool arguments and outputs - Build practical tools like web search and API integrations - Handle tool errors gracefully and provide meaningful fallbacks - Apply tool building best practices for focused, reliable functionality ## How Tools Work Tool calling was probably the toughest concept for me to understand two years ago, but breaking it down into steps makes it much easier to grasp. Here is an example when you ask about the weather: 1. **User asks**: "Is it hotter in New Delhi or San Francisco?" 2. **AI determines**: It needs weather data for both cities 3. **AI calls tools**: Makes weather API calls for each city 4. **Tools provide the results**: Structured response returned to the model 5. **AI processes results**: Compares temperatures from both locations 6. **AI responds**: Provides comparison with actual weather data Here is a basic workflow when working with tools: - **Define Tools**: You create and define the functions the AI can call, similar to how you write data-fetching code in your apps - **Register Tools**: You add the tools to the session. The model determines which tool to call based on the user's request - **Handle Calls**: When requested, the model will call the external tool, which can be endpoints to third-party services or system functions like adding a reminder - **Return Results**: Provide structured data for AI responses This makes tools powerful for extending the model's capabilities by connecting to real-world services and APIs. Instead of the model hallucinating answers, you can get correct and up-to-date information from external sources. ## Understanding the Tool Protocol Foundation Models defines tools through the `Tool` protocol. This protocol provides a structured way to create functions that the AI can call when it needs external data or wants to perform actions. Here is the basic structure of any tool: ```swift struct MyTool: Tool { let name = "toolName" let description = "What this tool does" @Generable struct Arguments { // Define what parameters the tool accepts } @concurrent func call(arguments: Arguments) async throws -> Output { // Your tool logic goes here // Return data the AI can understand } } ``` Starting in iOS 26.4, the `Tool` protocol requires the `@concurrent` attribute on `call(arguments:)`. This attribute, introduced in Swift 6.2 as part of SE-0461, means the method runs on the concurrent thread pool rather than inheriting the caller's actor isolation. Without it, nonisolated async functions now default to running on the caller's actor. In practice, your tool code should not assume it runs on any particular actor. If you need to touch the UI or `@MainActor`-isolated state inside a tool, wrap that work in `await MainActor.run { }` explicitly. ### Tool Components Explained **Name and Description** The AI uses these to understand when to call your tool. Clear naming and descriptions help the model decide which tool to invoke and when. Use clear, action-oriented names like `calculateTip`, `sendEmail`, or `getCurrentWeather`. Avoid abbreviations—use `getUserLocation` instead of `getUsrLoc`. As with your Swift function names, be specific about the action: `saveUserPreferences` rather than just `save`. For the description, explain what the tool does and when to use it. You can include the context about the data it returns. The most important part is to be specific about the tool's purpose. ```swift // Good examples let name = "calculateTip" let description = "Calculates tip amount and total bill based on bill amount and tip percentage" let name = "getCurrentWeather" let description = "Gets current weather conditions for a specific city including temperature, humidity, and conditions" // Poor examples - too vague let name = "calc" let description = "Does math" let name = "weather" let description = "Weather stuff" ``` **Arguments with @Generable** Use `@Generable` to define type-safe parameters. The `@Guide` attribute helps the AI understand how to use each parameter: ```swift @Generable struct Arguments { @Guide(description: "The search query to look up") var query: String @Guide(description: "Maximum number of results to return", .range(1...10)) var maxResults: Int? } ``` **Tool Output** Return any type that conforms to `PromptRepresentable`. This could be a `String`, an array, or a custom `@Generable` struct: ```swift @Generable struct SearchResult { let title: String let content: String let url: String } @concurrent func call(arguments: Arguments) async throws -> [SearchResult] { // Perform search and return structured results let results = try await performSearch(query: arguments.query) return results } ``` ## API Integration: Search Tool Now that you understand the basic structure, here is a practical tool that demonstrates real value. Through experimentation, I determined that the model's training data has a cutoff around October 2023. For example, if I ask for the current president of the US, here is the response: > As of October 2023, the President of the United States is Joe Biden. While this is one of the reasons you should not depend on the model for factual data, you can correct it by using a simple search tool. Here is a search tool implementation using Tavily API, which provides high-quality search results for AI apps. > **Note**: I am not affiliated with or sponsored by Tavily. I chose this API because Apple's sample projects already include weather tool examples, and I wanted to show a different use case. This code is adapted from my implementation in the Zenther app, where I use it to fetch accurate nutritional facts. ```swift struct SearchTool: Tool { let name = "searchWeb" let description = "Search the web for information on any topic using Tavily API" @Generable struct Arguments { @Guide(description: "The search query to look up") var query: String } // Structured search result data struct SearchResult: Encodable { let title: String let content: String let url: String let score: Double } // Call this API in your server instead. Do NOT do this in production app! @AppStorage("tavilyAPIKey") private var tavilyAPIKey: String = "" @concurrent func call(arguments: Arguments) async throws -> some PromptRepresentable { let searchQuery = arguments.query.trimmingCharacters(in: .whitespacesAndNewlines) guard !searchQuery.isEmpty else { return createErrorOutput(for: searchQuery, error: SearchError.emptyQuery) } guard !tavilyAPIKey.isEmpty else { return createErrorOutput(for: searchQuery, error: SearchError.missingAPIKey) } do { let results = try await performSearch(query: searchQuery) return createSuccessOutput(from: results, query: searchQuery) } catch { return createErrorOutput(for: searchQuery, error: error) } } private func performSearch(query: String) async throws -> [SearchResult] { let url = URL(string: "https://api.tavily.com/search")! var request = URLRequest(url: url) request.httpMethod = "POST" request.setValue("application/json", forHTTPHeaderField: "Content-Type") request.setValue("Bearer \(tavilyAPIKey)", forHTTPHeaderField: "Authorization") let requestBody = [ "query": query, "max_results": 3, "include_answer": false, "include_raw_content": false ] as [String : Any] request.httpBody = try JSONSerialization.data(withJSONObject: requestBody) let (data, response) = try await URLSession.shared.data(for: request) guard let httpResponse = response as? HTTPURLResponse, httpResponse.statusCode == 200 else { throw SearchError.apiError } let searchResponse = try JSONDecoder().decode(TavilySearchResponse.self, from: data) return searchResponse.results.map { result in SearchResult( title: result.title, content: result.content, url: result.url, score: result.score ) } } private func createSuccessOutput(from results: [SearchResult], query: String) -> GeneratedContent { let summary = results.map { "\($0.title)\n\($0.content)\nSource: \($0.url)" }.joined(separator: "\n\n") return GeneratedContent(properties: [ "query": query, "resultCount": results.count, "summary": summary, "status": "success" ]) } private func createErrorOutput(for query: String, error: Error) -> GeneratedContent { GeneratedContent(properties: [ "query": query, "error": "Unable to perform search: \(error.localizedDescription)", "resultCount": 0, "summary": "Search failed for query: '\(query)'", "status": "error" ]) } } // Tavily API response structures struct TavilySearchResponse: Decodable, Sendable { let results: [TavilySearchResult] } struct TavilySearchResult: Decodable, Sendable { let title: String let content: String let url: String let score: Double } // Custom error types for better error handling enum SearchError: Error, LocalizedError { case emptyQuery case invalidURL case apiError case missingAPIKey var errorDescription: String? { switch self { case .emptyQuery: return "Search query cannot be empty" case .invalidURL: return "Invalid search URL" case .apiError: return "Search API request failed" case .missingAPIKey: return "Tavily API key is required. Please configure it in Settings." } } } ``` This implementation shows how to integrate a third-party search API and handle errors gracefully. The tool provides real value by accessing external data that the model cannot know from its training. ## Using Tools in Your App Here is how to set up a session with the search tool: ```swift #Playground { let instructions = "The user will provide a search term. Use the SearchTool to perform the search and output the response to them, summarising the top results." let prompt = "Who is the current president of US?" let session = LanguageModelSession(tools: [SearchTool()], instructions: instructions) // Ask question and let AI use tools as needed let response = try await session.respond(to: prompt) print(response.content) } ``` The model automatically determines which tools to call based on the user's question. Here is the output: > The current President of the United States is Donald John Trump. He took office on January 20, 2025. ## Tool Building Best Practices The search tool example demonstrates several patterns worth following: ### Keep tools focused Each tool should do one thing well. ```swift // Good: Specific tool struct WeatherTool: Tool { /* Gets weather */ } // Less good: Generic tool that tries to do everything struct DataTool: Tool { /* Weather, news, stocks, etc. */ } ``` ### Write clear descriptions Help the AI understand when to use your tool: ```swift let description = "Retrieve the latest weather information for a city using OpenMeteo API" // Not: "Get weather" or "Weather stuff" ``` ### Use structured arguments Use `@Generable` for type-safe input: ```swift @Generable struct Arguments { @Guide(description: "The city to get weather for") var city: String } ``` ### Handle errors Always provide useful error responses. ```swift // Good: Meaningful error handling @concurrent func call(arguments: Arguments) async throws -> some PromptRepresentable { do { let data = try await fetchData() return GeneratedContent(data) } catch { return GeneratedContent( properties: [ "error": "Unable to fetch data: \(error.localizedDescription)", "success": false ] ) } } ``` ## Health Data Integration with Zenther Building on these best practices, here is a real-world application from my app Zenther. The app uses tool calling to enable natural conversations about health data. Instead of navigating through different screens to find workout stats or nutrition information, users can ask the AI fitness partner questions like "How did my workouts look this week?" or "How many calories should I burn more to manage my weekly average?" This approach made the chat integration genuinely helpful rather than just a gimmick. The model accesses HealthKit data and provides personalized feedback based on actual numbers instead of generic advice. Here is the `HealthDataTool` implementation from the Zenther app: ```swift struct HealthDataTool: Tool { let name = "fetchHealthData" let description = "Fetch current health data including steps, heart rate, sleep, and other metrics" @Generable struct Arguments { @Guide(description: "The type of health data to fetch: 'today', 'weekly', or specific metric like 'steps', 'heartRate', 'sleep', 'activeEnergy', 'distance'") var dataType: String } @concurrent func call(arguments: Arguments) async throws -> some PromptRepresentable { switch arguments.dataType.lowercased() { case "today": return await fetchTodayData() case "weekly": return await fetchWeeklyData() case "steps": return await fetchSpecificMetric(type: MetricType.steps) case "heartrate": return await fetchSpecificMetric(type: MetricType.heartRate) case "sleep": return await fetchSpecificMetric(type: MetricType.sleep) default: return createErrorOutput(error: "Invalid data type. Use 'today', 'weekly', 'steps', 'heartRate', 'sleep', 'activeEnergy', or 'distance'.") } } private func fetchTodayData() async -> GeneratedContent { let healthManager = HealthDataManager.shared let metricsJSON = """ { "steps": \(Int(healthManager.todaySteps)), "activeEnergy": \(Int(healthManager.todayActiveEnergy)), "distance": \(String(format: "%.2f", healthManager.todayDistance)), "heartRate": \(Int(healthManager.currentHeartRate)), "sleep": \(String(format: "%.1f", healthManager.lastNightSleep)) } """ return GeneratedContent(properties: [ "status": "success", "dataType": "today", "metrics": metricsJSON, "message": "Today's health data retrieved successfully" ]) } private func createErrorOutput(error: String) -> GeneratedContent { GeneratedContent(properties: [ "status": "error", "error": error, "message": "Failed to fetch health data" ]) } } ``` These tools allow the AI to ground its responses in real data rather than generic fitness advice. Instead of saying "you should exercise more," it can say "I noticed you missed your usual Wednesday workout this week - would you like to schedule a make-up session?" ## What's Next Tools help you make the most out of the on-device foundation models. Instead of the model being limited to its training data, it can now call your code to get fresh information or perform actions. The `Tool` protocol provides a clean interface for this: you define what your tool does and what arguments it needs, and the model figures out when to call it. The key insight is keeping tools simple and focused. A calculator tool does math. A search tool searches. A weather tool gets weather. When you try to make one tool do everything, the model gets confused about when to use it. Better to have five focused tools than one that tries to do everything. Now that you understand how to build individual tools, the next chapter explores advanced chat patterns for building production-ready conversation interfaces. You will learn how to manage context, handle conversation memory, and orchestrate multi-turn interactions that combine sessions, generation controls, and tool calling into robust chat experiences. --- ## Chapter 07: Advanced Chat Patterns Canonical URL: https://www.rudrank.ai/foundation-models/advanced-chat-patterns Design multi-turn Foundation Models conversations with state, transcripts, and app-specific context. The foundation models is not meant to be used as a chatbot because of its limited context window and training. But, if you do want to have an assistant in your app with tools that can access your app's data, you would want to provide the best experience without the context window exploding and gracefully handling the errors that inevitably crop up. This chapter mainly covers managing conversation memory and handling context limit with examples. ## Prerequisites and Context This chapter builds on all previous Foundation Models concepts - particularly the session patterns from earlier chapters, streaming concepts, and structured generation patterns. You should understand how sessions maintain conversation state, how to handle basic responses, and how streaming works. This chapter also references tool calling patterns, which are covered in detail in the chapter on basic tool use. ## What You Will Learn By the end of this chapter, you will be able to: - Build UIs directly from transcript entries for natural conversation flow - Estimate and accurately measure token usage to avoid context window limits - Implement sliding window context management for indefinite conversations - Handle conversation persistence and restoration across app sessions - Build responsive streaming chat interfaces with proper state management - Create conversation summaries to maintain context while reducing token usage - Integrate user feedback systems for continuous model improvement ## Working with Conversation Memory The transcript is where Foundation Models stores your entire conversation. You let the framework handle everything automatically but should understand how transcripts work to build anything beyond basic demos. ### Understanding the Transcript Structure Think of a transcript as the conversation's source of truth. Every interaction gets broken down into structured entries that the model uses to maintain context: - **Instructions**: Your system prompt and tool definitions - **Prompts**: What users actually type or say - **Responses**: What the model generates back - **Tool Calls**: When the model decides to use your custom tools - **Tool Output**: The results from those tool executions The transcript is a `RandomAccessCollection`, which means you can iterate over it, slice it, and access entries by index. It gives you access to the conversation structure that the framework uses internally. ### Building UI from Transcript Entries When building chat interfaces, you typically want to display user messages and assistant responses. Since `Transcript.Entry` is an enum, you can pattern match on the different types to build your UI. Here is how I handle this in my chat apps: ```swift struct TranscriptEntryView: View { let entry: Transcript.Entry var body: some View { switch entry { case .instructions(let instructions): SystemMessageView(instructions: instructions) case .prompt(let prompt): UserMessageView(prompt: prompt) case .response(let response): AssistantMessageView(response: response) case .toolCalls(let toolCalls): ToolCallView(calls: toolCalls) case .toolOutput(let output): ToolOutputView(output: output) @unknown default: EmptyView() // Gracefully handle future types } } } ``` Each entry type contains different data you can use in your UI: - **Instructions**: Contains `segments` (text or structured content) plus `toolDefinitions` - **Prompts & Responses**: Contain `segments` for the actual content - **Tool Calls**: Include tool names and structured arguments - **Tool Output**: Contains execution results in `segments` This pattern lets you build rich chat interfaces that show not just the conversation, but also what's happening behind the scenes with tools. ### Managing the Token Budget The context window is the model's biggest constraint for chat apps. By default, the context window is 4096 tokens shared between input and output, though starting in iOS 26.4, you can query the actual size dynamically: ```swift let model = SystemLanguageModel.default let contextSize = try await model.contextSize ``` The `contextSize` property is back-deployed to iOS 26.0. On systems before iOS 26.4, it returns 4096. On iOS 26.4 and later, it fetches the real value from the model, which may grow as Apple ships updated models with future OS releases. Use this property instead of hardcoding 4096 throughout your app. Early on, I learned that you cannot just count characters but need to estimate tokens properly to avoid hitting limits unexpectedly. Here is an example of how I built token counting into my chat system. ```swift extension Transcript.Entry { var estimatedTokenCount: Int { switch self { case .instructions(let instructions): return instructions.segments.reduce(0) { $0 + $1.estimatedTokenCount } case .prompt(let prompt): return prompt.segments.reduce(0) { $0 + $1.estimatedTokenCount } case .response(let response): return response.segments.reduce(0) { $0 + $1.estimatedTokenCount } case .toolCalls(let toolCalls): // Tool calls are structured, add overhead return toolCalls.reduce(0) { total, call in total + estimateTokensAdvanced(call.toolName) + estimateTokensForStructuredContent(call.arguments) + 5 // Call overhead } case .toolOutput(let output): return output.segments.reduce(0) { $0 + $1.estimatedTokenCount } + 3 // Output overhead } } } ``` You can use the `Transcript.Segment` to estimate the token count of a segment: ```swift extension Transcript.Segment { var estimatedTokenCount: Int { switch self { case .text(let textSegment): return estimateTokensAdvanced(textSegment.content) case .structure(let structuredSegment): return estimateTokensForStructuredContent(structuredSegment.content) } } } ``` You can also use the `Transcript` to estimate the token count of the entire transcript: ```swift extension Transcript { var estimatedTokenCount: Int { return self.reduce(0) { $0 + $1.estimatedTokenCount } } /// Returns the estimated token count with a larger safety buffer var safeEstimatedTokenCount: Int { // Add bigger buffer to account for underestimation let baseTokens = estimatedTokenCount let buffer = Int(Double(baseTokens) * 0.25) // 25% buffer let systemOverhead = 100 // Fixed overhead for system tokens return baseTokens + buffer + systemOverhead } /// Checks if the transcript is approaching the token limit (earlier trigger) func isApproachingLimit(threshold: Double = 0.70, maxTokens: Int) -> Bool { let currentTokens = safeEstimatedTokenCount let limitThreshold = Int(Double(maxTokens) * threshold) return currentTokens > limitThreshold } /// Returns a subset of entries that fit within the token budget func entriesWithinTokenBudget(_ budget: Int) async -> [Transcript.Entry] { var result: [Transcript.Entry] = [] // Always include instructions first if they exist if let instructions = self.first(where: { if case .instructions(_) = $0 { return true } return false }) { result.append(instructions) } // Add other entries from newest to oldest until budget is reached let nonInstructionEntries = self.filter { entry in if case .instructions(_) = entry { return false } return true } for entry in nonInstructionEntries.reversed() { let candidateEntries = result + [entry] let candidateTranscript = Transcript(entries: candidateEntries) let candidateTokens = await currentTokenCount(for: candidateTranscript) if candidateTokens > budget { break } result = candidateEntries } return result } } ``` ```swift /// Estimates token count using Apple's guidance: 4 characters per token func estimateTokensAdvanced(_ text: String) -> Int { guard !text.isEmpty else { return 0 } let characterCount = text.count // Simple: 4 characters per token across all content types let tokensPerChar = 1.0 / 4.0 return max(1, Int(ceil(Double(characterCount) * tokensPerChar))) } ``` ```swift /// Estimates token count for structured JSON content func estimateTokensForStructuredContent(_ content: GeneratedContent) -> Int { let jsonString = content.jsonString let characterCount = jsonString.count // Use same 4 chars per token for JSON let tokensPerChar = 1.0 / 4.0 return max(1, Int(ceil(Double(characterCount) * tokensPerChar))) } ``` These extensions use Apple's guidance of 3-4 characters per token to estimate usage. I preferred using 4 to be on the conservative side. The `safeEstimatedTokenCount` adds a 25% buffer because underestimating tokens is worse than overestimating as you would rather trigger context management early than hit the hard limit. Cursor performs a similar optimization for their Grok Code model by summarising around 75% of the 256K token context window. The `entriesWithinTokenBudget` method will be useful for sliding window implementations later discussed in this chapter. It helps you keep the most recent conversation parts within a specific token budget. ### Accurate Token Counting with the Token Usage API The estimation approach above works well enough for iOS 26.0 through 26.3, but it is still guesswork. Starting in iOS 26.4, Apple introduced the `TokenUsage` API on `SystemLanguageModel`, which gives you model-accurate token counts instead of approximations. The model itself counts the tokens, so the numbers are exact. The API provides three overloads, each targeting a different part of your session context: ```swift let model = SystemLanguageModel.default let instructionTokens = try await model.tokenUsage( for: Instructions("You are a helpful fitness coach."), tools: [SearchTool(), HealthDataTool()] ) print("Instructions + tools: \(instructionTokens.tokenCount) tokens") ``` This first overload measures how many tokens your instructions and tool definitions consume. Tool definitions take up context space because the model needs their names, descriptions, and argument schemas to decide when to call them. Knowing this number upfront lets you budget the remaining context for actual conversation. ```swift let promptTokens = try await model.tokenUsage( for: Prompt("How did my workouts look this week?") ) print("Prompt: \(promptTokens.tokenCount) tokens") ``` The second overload accepts anything conforming to `PromptRepresentable`. Use it to check whether a user's message will fit before sending it to the session. ```swift let transcriptTokens = try await model.tokenUsage( for: session.transcript.map { $0 } ) print("Transcript: \(transcriptTokens.tokenCount) tokens") ``` The third overload takes a collection of `Transcript.Entry` values. This is the one you will reach for most often in chat apps, since it tells you exactly how much of the context window the conversation has consumed so far. Here is a version-aware helper that uses the accurate API on iOS 26.4 and falls back to the estimation heuristic on earlier systems: ```swift func currentTokenCount(for transcript: Transcript) async -> Int { if #available(iOS 26.4, *) { let model = SystemLanguageModel.default if let usage = try? await model.tokenUsage( for: transcript.map { $0 } ) { return usage.tokenCount } } return transcript.safeEstimatedTokenCount } ``` In my experience, the estimation heuristic tends to undershoot by 10-20% on conversations that mix structured content with plain text. The Token Usage API removes that uncertainty entirely. If your deployment target is iOS 26.4, you can drop the estimation extensions and rely on the API directly. If you need to support earlier versions, keep both paths and prefer the accurate one when available. ### Transcript Persistence and Restoration Foundation Models transcripts are Codable, allowing you to save and restore conversation state: ```swift class ConversationPersistence { private let documentsDirectory = FileManager.default.urls( for: .documentDirectory, in: .userDomainMask ).first! func saveTranscript(_ transcript: Transcript, withID id: String) throws { let url = transcriptURL(for: id) let data = try JSONEncoder().encode(transcript) try data.write(to: url) } func loadTranscript(withID id: String) throws -> Transcript { let url = transcriptURL(for: id) let data = try Data(contentsOf: url) return try JSONDecoder().decode(Transcript.self, from: data) } func deleteTranscript(withID id: String) throws { let url = transcriptURL(for: id) try FileManager.default.removeItem(at: url) } func listSavedTranscripts() throws -> [String] { let urls = try FileManager.default.contentsOfDirectory( at: documentsDirectory, includingPropertiesForKeys: nil ) return urls .filter { $0.pathExtension == "transcript" } .map { $0.deletingPathExtension().lastPathComponent } } private func transcriptURL(for id: String) -> URL { documentsDirectory.appendingPathComponent("\(id).transcript") } } // Usage class PersistentChatSession: ObservableObject { @Published var session: LanguageModelSession private let persistence = ConversationPersistence() private let sessionID: String init(sessionID: String) { self.sessionID = sessionID // Try to restore existing session if let transcript = try? persistence.loadTranscript(withID: sessionID) { self.session = LanguageModelSession(transcript: transcript) } else { self.session = LanguageModelSession() } } func saveCurrentState() { do { try persistence.saveTranscript(session.transcript, withID: sessionID) } catch { print("Failed to save transcript: \(error)") } } deinit { saveCurrentState() } } ``` ### Using Session Transcript Instead of keeping track of the conversation history yourself, you can directly use `transcript` to get the actual conversation structure that Foundation Models uses internally. ```swift struct TranscriptBasedChatView: View { @State private var session: LanguageModelSession? @State private var currentInput = "" @State private var isProcessing = false var body: some View { VStack { ScrollViewReader { proxy in ScrollView { LazyVStack(spacing: 12) { ForEach(session?.transcript ?? .init()) { entry in TranscriptEntryView(entry: entry) .id(entry.id) } } .padding() } .onChange(of: session?.transcript.count ?? 0) { _, _ in if let lastEntry = session?.transcript.last { withAnimation(.easeOut(duration: 0.3)) { proxy.scrollTo(lastEntry.id, anchor: .bottom) } } } } HStack { TextField("Type your message...", text: $currentInput) .textFieldStyle(RoundedBorderTextFieldStyle()) .onSubmit { Task { await sendMessage() } } Button("Send") { Task { await sendMessage() } } .disabled(currentInput.isEmpty || isProcessing) } .padding() } .task { await setupSession() } } private func setupSession() async { // Session setup covered in earlier chapters guard SystemLanguageModel.default.availability == .available else { return } session = LanguageModelSession(instructions: Instructions("You are a helpful assistant.")) } private func sendMessage() async { guard let session = session, !currentInput.isEmpty else { return } let prompt = currentInput currentInput = "" isProcessing = true do { // Streaming provides immediate feedback instead of waiting for complete response let responseStream = session.streamResponse(to: Prompt(prompt)) for try await _ in responseStream { // Foundation Models handles transcript updates automatically during streaming } } catch { // Production apps should handle context window and guardrail violations gracefully } isProcessing = false } } struct TranscriptEntryView: View { let entry: Transcript.Entry var body: some View { switch entry { case .prompt(let prompt): if let text = extractText(from: prompt.segments), !text.isEmpty { ChatBubble(content: text, isFromUser: true) } case .response(let response): if let text = extractText(from: response.segments), !text.isEmpty { ChatBubble(content: text, isFromUser: false) } case .instructions: // Instructions are system-level and not part of user conversation flow EmptyView() @unknown default: EmptyView() } } private func extractText(from segments: [Transcript.Segment]) -> String? { let text = segments.compactMap { segment in if case .text(let textSegment) = segment { return textSegment.content } return nil }.joined(separator: " ") return text.isEmpty ? nil : text } } ``` ## Streaming Responses Streaming allows you to see the response as it is being generated, instead of waiting for the entire response to be generated. You do not get individual words or characters but `snapshots` of the response as it builds up. The model is extremely fast to fetch the first token of the response, so take advantage of this feature as much as you can. Foundation Models has an interesting take to streaming. Instead of individual words or characters, you get **snapshots**. These are complete but partial responses that get more detailed as the model generates content. Each snapshot is a **valid structure** with more fields populated than the previous one. This approach is different from other AI frameworks, but it is actually better for UI development. You do not have to accumulate deltas yourself or worry about parsing incomplete responses. ```swift enum StreamingState { case idle case streaming(response: String) case completed(response: String) case error(message: String) var currentResponse: String { switch self { case .idle: return "" case .streaming(let response): return response case .completed(let response): return response case .error(let message): return "Error: \(message)" } } var isStreaming: Bool { if case .streaming = self { return true } return false } var isCompleted: Bool { if case .completed = self { return true } return false } var errorMessage: String? { if case .error(let message) = self { return message } return nil } } struct StreamingChatView: View { @State private var streamingState: StreamingState = .idle func streamResponse(to prompt: String) async { streamingState = .streaming(response: "") do { let session = LanguageModelSession() let stream = session.streamResponse(to: prompt) for try await snapshot in stream { await MainActor.run { if case .streaming = streamingState { streamingState = .streaming(response: snapshot.content) } } } // Optional: Get final result with metadata let finalResponse = try await stream.collect() await MainActor.run { streamingState = .completed(response: finalResponse.content) } } catch { await MainActor.run { streamingState = .error(message: error.localizedDescription) } } } var body: some View { VStack { ScrollView { Text(streamingState.currentResponse) .textSelection(.enabled) .padding() .foregroundColor(streamingState.errorMessage != nil ? .red : .primary) } switch streamingState { case .idle: EmptyView() case .streaming: HStack { ProgressView() .scaleEffect(0.8) Text("Generating response...") .font(.caption) .foregroundColor(.secondary) } .padding() case .completed: HStack { Image(systemName: "checkmark.circle.fill") .foregroundColor(.green) Text("Response complete") .font(.caption) .foregroundColor(.secondary) } .padding() case .error: HStack { Image(systemName: "exclamationmark.triangle.fill") .foregroundColor(.red) Text("Failed to generate response") .font(.caption) .foregroundColor(.secondary) } .padding() } } } } ``` ## Sliding Window Context Management The model's context window is finite. During longer conversations, you will eventually hit this limit and need to manage it. Simply clearing the conversation loses all context, so you need better approaches to maintain conversation flow while staying within the token budget. The sliding window approach is to keep the most recent conversation parts within a specific token budget. This is done by summarizing the conversation and creating a new session with the summarized context. ```swift @Observable final class ChatBotService { private(set) var session: LanguageModelSession var isSummarizing: Bool = false var isApplyingWindow: Bool = false var sessionCount: Int = 1 // Sliding Window Configuration private let windowThreshold = 0.75 // Start windowing at 75% private let targetWindowRatio = 0.50 // Keep 50% of context after windowing init() { self.session = LanguageModelSession( instructions: Instructions("You are a helpful, friendly AI assistant.") ) } @MainActor func sendMessage(_ content: String) async { do { if await shouldApplyWindow() { await applySlidingWindow() } let responseStream = session.streamResponse(to: Prompt(content)) for try await _ in responseStream { // Framework handles transcript synchronization during streaming } } catch LanguageModelSession.GenerationError.exceededContextWindowSize { await handleContextWindowExceeded(userMessage: content) } catch { await handleGenerationError(error, userMessage: content) } } // MARK: - Sliding Window Implementation private func shouldApplyWindow() async -> Bool { let maxTokens = (try? await SystemLanguageModel.default.contextSize) ?? 4096 let currentTokens = await currentTokenCount(for: session.transcript) let limitThreshold = Int(Double(maxTokens) * windowThreshold) return currentTokens > limitThreshold } @MainActor private func applySlidingWindow() async { isApplyingWindow = true let currentTokens = await currentTokenCount(for: session.transcript) debugPrint("Applying sliding window - Current tokens: \(currentTokens)") let maxTokens = (try? await SystemLanguageModel.default.contextSize) ?? 4096 let targetWindowSize = Int(Double(maxTokens) * targetWindowRatio) let windowEntries = await session.transcript.entriesWithinTokenBudget(targetWindowSize) let windowedTranscript = Transcript(entries: windowEntries) session = LanguageModelSession(transcript: windowedTranscript) sessionCount += 1 let newTokens = await currentTokenCount(for: windowedTranscript) debugPrint("Sliding window applied - Reduced to: \(newTokens) tokens (\(windowEntries.count) entries)") isApplyingWindow = false } @MainActor private func handleContextWindowExceeded(userMessage: String) async { isSummarizing = true do { let summary = try await generateConversationSummary() createNewSessionWithContext(summary: summary) isSummarizing = false // Continue conversation with summarized context try await respondWithNewSession(to: userMessage) } catch { // Fallback to manual conversation restart if summarization fails isSummarizing = false } } private func generateConversationSummary() async throws -> ConversationSummary { let summarySession = LanguageModelSession( instructions: Instructions(""" You are an expert at summarizing conversations. Create thorough summaries that preserve all important context. """) ) let conversationText = createConversationText() let summaryPrompt = """ Please summarize this entire conversation comprehensively. Include all key points, topics discussed, user preferences, and important context: \(conversationText) """ let summaryResponse = try await summarySession.respond( to: Prompt(summaryPrompt), generating: ConversationSummary.self ) return summaryResponse.content } private func createConversationText() -> String { return session.transcript.compactMap { entry in switch entry { case .prompt(let prompt): let text = extractTextFromSegments(prompt.segments) return "User: \(text)" case .response(let response): let text = extractTextFromSegments(response.segments) return "Assistant: \(text)" case .toolCalls(let toolCalls): let calls = toolCalls.map { "\($0.toolName)(\($0.arguments.jsonString))" }.joined(separator: ", ") return "Tool Calls: \(calls)" case .toolOutput(let output): let text = extractTextFromSegments(output.segments) return "Tool Output: \(text)" default: return nil } }.joined(separator: "\n\n") } private func extractTextFromSegments(_ segments: [Transcript.Segment]) -> String { return segments.compactMap { segment in if case .text(let textSegment) = segment { return textSegment.content } return nil }.joined(separator: " ") } private func createNewSessionWithContext(summary: ConversationSummary) { let contextInstructions = """ You are a helpful, friendly AI assistant. You are continuing a conversation. Here is a summary of your previous conversation: CONVERSATION SUMMARY: \(summary.summary) KEY TOPICS DISCUSSED: \(summary.keyTopics.map { "• \($0)" }.joined(separator: "\n")) USER PREFERENCES/REQUESTS: \(summary.userPreferences.map { "• \($0)" }.joined(separator: "\n")) Continue the conversation naturally, referencing this context when relevant. """ session = LanguageModelSession(instructions: contextInstructions) sessionCount += 1 } } // Support model for conversation summaries @Generable struct ConversationSummary { @Guide(description: "A complete summary of the entire conversation") let summary: String @Guide(description: "The main topics or themes that were discussed") let keyTopics: [String] @Guide(description: "Any specific requests or preferences the user mentioned") let userPreferences: [String] } ``` This allows the user to have an experience where conversations can continue indefinitely without users ever seeing "conversation too long" errors! ## Learning from Users Foundation Models provides a built-in feedback system that helps you understand how well your AI responses are performing. The framework handles structuring feedback data so you can focus on collecting the feedback from your users. ### The Feedback API The main method of the feedback system is the `logFeedbackAttachment()` method, which creates structured feedback that can be submitted to Apple: ```swift class ChatManager { private var session = LanguageModelSession() func provideFeedback(sentiment: LanguageModelFeedback.Sentiment, issues: [LanguageModelFeedback.Issue] = []) { // Generate structured feedback attachment let feedbackData = session.logFeedbackAttachment( sentiment: sentiment, issues: issues, desiredOutput: nil // Optional: show what the response should have been ) // The feedbackData contains the full conversation context and feedback storeFeedbackLocally(feedbackData) } private func storeFeedbackLocally(_ data: Data) { // Save for your own analytics or submit to Apple via Feedback Assistant // The data includes the full transcript and structured feedback } } ``` ### Feedback Types The framework provides some structured feedback categories that help you understand specific issues: ```swift // Simple sentiment feedback let positiveFeedback = LanguageModelFeedback.Sentiment.positive let negativeFeedback = LanguageModelFeedback.Sentiment.negative let neutralFeedback = LanguageModelFeedback.Sentiment.neutral ``` ```swift // Detailed issue reporting for negative feedback let issues = [ LanguageModelFeedback.Issue( category: .incorrect, explanation: "The model will not accept there is iOS 26 after iOS 18" ), LanguageModelFeedback.Issue( category: .didNotFollowInstructions, explanation: "Asked for 10 years of experience in SwiftUI but the model said no" ) ] // Submit feedback session.logFeedbackAttachment( sentiment: .negative, issues: issues ) ``` ### Issue Categories The framework also provides predefined issue categories that cover common problems: - .unhelpful - Response does not address the user's need - .incorrect - Contains factual errors or misinformation - .tooVerbose - Unnecessarily long or repetitive - .didNotFollowInstructions - Ignored specific user constraints - .stereotypeOrBias - Contains harmful stereotypes or bias - .suggestiveOrSexual - Inappropriate sexual content - .vulgarOrOffensive - Offensive language or content - .triggeredGuardrailUnexpectedly - Safety measures activated inappropriately The one thing to to take away from this is that `logFeedbackAttachment()` handles all the complexity of packaging your conversation context into a format that Apple can use to improve the models. You focus on when and how to collect the feedback from users, while the framework handles the technical details. ## What's Next The patterns in this chapter provide a solid foundation (pun intended again, ha) for building chat UI with Foundation Models. Start with the streaming chat view and token management system, then add conversation persistence and error handling! The next chapter explores safety and best practices for implementing responsible AI features with proper guardrails and user protection. --- ## Chapter 08: Safety and Best Practices Canonical URL: https://www.rudrank.ai/foundation-models/safety-and-best-practices Handle guardrails, refusals, privacy, and production boundaries for on-device AI features. Building AI features brings responsibility as the responses are *probabilistic* and not always in your control nor accurate all the time. Your users trust you to create experiences that are helpful without being harmful. Foundation Models includes built-in safety measures, but understanding how to use them properly and knowing when you need to add your own layers makes the difference between an AI feature people love to use and one that becomes the main character on Twitter. This chapter walks through Apple's safety approach, implementation patterns, and considerations for building an experience that you are proud of. ## Prerequisites and Context This chapter applies safety considerations to all the Foundation Models patterns you have learned - from basic sessions through advanced tool systems. You should understand how to create sessions, handle responses, work with tools, and manage conversation state. The safety patterns here integrate with all previous concepts and provide essential guidance for production deployment. ## What You Will Learn By the end of this chapter, you will be able to: - Understand Apple's multi-layered safety approach and how to work with it - Recognize and handle different types of generation errors gracefully - Design safe instruction patterns that prioritize user protection - Implement input validation strategies for different risk levels - Use structured generation as a safety mechanism - Apply domain-specific safety considerations for different app categories - Build user trust through transparent AI attribution and appropriate disclaimers ## Apple's Safety Philosophy Foundation Models inherits its safety principles from Apple Intelligence: privacy-first design with on-device protection. The framework processes everything locally, which helps with privacy, but you still need to think carefully about what your AI features generate and how users might interact with them. The on-device model includes trained safety guardrails, but they are not perfect. No AI safety system is. Your job is to design experiences that work well within those constraints and handle edge cases gracefully when they inevitably appear. During the early beta period in summer 2025, the guardrails were quite aggressive and blocked perfectly reasonable requests - at least, that was how many developers felt. With the iOS 26 launch, the system has relaxed somewhat while still protecting users from harmful content. ## Understanding Model Limitations Keep these limits in mind when designing features: ### Knowledge gaps The model has limited world knowledge with a training cutoff around October 2023. Do not rely on it as a source of truth for facts. Instead, embed verified information directly into your prompts or use tool calling with web search when factual accuracy matters. ### Mathematical accuracy Avoid using the model for calculator-like precision or complex mathematical reasoning. Use dedicated logic for calculations outside of the model and let the AI handle the natural language aspects. ### Task complexity Break complex requests into smaller, more manageable pieces. The model performs better with short, concrete requirements than with multi-step reasoning, especially given the limited context window. You can check the exact context size at runtime with `SystemLanguageModel.default.contextSize`. ## Built-in Safety Layers Foundation Models implements what security experts call a "defense in depth" approach. It has multiple independent safety mechanisms that reduce risk when combined: ### Input filtering This mechanism checks your instructions, prompts, and tool calls for potentially harmful content before they reach the model. This catches problematic requests early in the pipeline. ### Output filtering This mechanism examines model responses before returning them to your app. Even if input filtering misses something, output filtering provides a second chance to catch unsafe content. ### Model-level safety This mechanism includes safety training baked directly into the model weights, making it naturally reluctant to generate harmful content even without explicit filtering. When any safety mechanism triggers, you receive a `GenerationError.guardrailViolation`: ```swift do { let result = try await session.respond(to: prompt) handleSuccessfulResponse(result.content) } catch LanguageModelSession.GenerationError.guardrailViolation { handleGuardrailViolation() } catch { handleOtherError(error) } func handleGuardrailViolation() { showAlert( title: "Cannot Process Request", message: "I cannot help with that. Please try something different.", actions: [ .init(title: "Try Again", style: .default), .init(title: "Cancel", style: .cancel) ] ) } ``` How you handle safety violations depends on whether the user initiated the action. When someone explicitly asks for something that gets blocked, provide clear feedback. Explain briefly that the request cannot be processed and offer alternatives like editing the prompt, choosing from curated options, or cancelling entirely. ## Working with Guardrails in Depth The short version so far is: "there are safety guardrails, you will hit them, handle the errors nicely." In practice, you will spend a surprising amount of time tuning *how* those guardrails behave for your specific app. ### Configuring Guardrail Modes `SystemLanguageModel` lets you tweak how strict the built‑in safety system is: ```swift // Default guardrails: strict, general-purpose let strictModel = SystemLanguageModel( guardrails: .default ) // More permissive for transformation-style tasks let permissiveModel = SystemLanguageModel( guardrails: .permissiveContentTransformations ) let strictSession = LanguageModelSession(model: strictModel) let permissiveSession = LanguageModelSession(model: permissiveModel) ``` - **`.default`**: Good starting point for most apps. It blocks prompts and responses that violate system policies and surfaces `GenerationError.guardrailViolation`. - **`.permissiveContentTransformations`**: Designed for scenarios where you are mostly *transforming* user input (summaries, paraphrases, rephrasings), even when that input might be sensitive. In this mode, string-based generations will *not* throw `guardrailViolation` just because the input contains sensitive content, but it may still refuse to answer the request. The key thing to remember: **this does not disable safety**. It changes *when* the system chooses to error versus when it tries to produce a safer transformation of the input. > Note: If you output structured data, permissive content transformations are bypassed and you will get a guardrail violation. ### Structured Output with Permissive Mode The limitation above creates a dilemma: you want the safety benefits of `.permissiveContentTransformations` for sensitive content, but you also want the type safety and structure of `@Generable` types. Over the past two months and running thousands of tests, I have found a production-tested workaround that gives you both. The `.permissiveContentTransformations` only applies to `String` output. When you generate a `@Generable` type directly, the framework falls back to `.default` behavior regardless of your guardrail setting. This is mentioned in the documentation, and I ran multiple tests to confirm it. But nothing stops you from generating JSON *as a string* and parsing it yourself! #### The Pattern Instead of using your `@Generable` type directly for generation, you use it for two things: compile-time type safety and automatic schema generation. Then you generate a `String` response and parse it manually: ```swift @Generable struct ContentClassification { @Guide(description: "Primary category of the content") var category: Category @Guide(description: "Confidence level from 0.0 to 1.0") var confidence: Double @Guide(description: "Brief explanation of the classification") var reasoning: String } @Generable enum Category: String, CaseIterable { case safe case sensitive case educational case personal } ``` Rather than calling `session.respond(to:generating:)` with `ContentClassification.self`, you extract the schema and include it in your prompt: ```swift func classifyContent(_ userInput: String) async throws -> ContentClassification { let schema = ContentClassification.generationSchema.debugDescription let prompt = """ Classify the following user content. Output MUST be exactly one JSON object matching this schema: \(schema) Output JSON only. No markdown, no explanation, no backticks. Content to classify: \(userInput) """ let model = SystemLanguageModel(guardrails: .permissiveContentTransformations) let session = LanguageModelSession(model: model) let response = try await session.respond(to: Prompt(prompt)) return try parseClassification(from: response.content) } ``` The parsing function extracts the JSON and decodes it directly to your `@Generable` type—no intermediate struct needed since `@Generable` types automatically conform to `Codable`: ```swift enum ClassificationError: Error { case invalidResponse } private func parseClassification(from text: String) throws -> ContentClassification { guard let jsonString = extractJSON(from: text), let data = jsonString.data(using: .utf8) else { throw ClassificationError.invalidResponse } return try JSONDecoder().decode(ContentClassification.self, from: data) } private func extractJSON(from text: String) -> String? { guard let end = text.lastIndex(of: "}"), let start = text[...end].lastIndex(of: "{") else { return nil } return String(text[start...end]) } ``` Foundation Models is remarkably good at generating valid JSON. When you provide an explicit schema in the prompt, the model follows it consistently. The `generationSchema.debugDescription` provides the JSON schema that the model understands without any additional formatting. I have used this pattern in the current production app that I am working on for a client, handling sensitive topics ranging from faith, mental health, grief, and difficult emotions. After hundreds of test runs across every input I could think of, the JSON parsing success rate is 100%. The model respects the schema, produces valid JSON, and the permissive guardrails allow the content through without throwing violations as often as the default guardrails would. But when it does throw a violation, it does so as a refusal to answer the request, rather than an error. Like "I cannot help with that request." or "I am sorry, but I cannot assist with that request." The pattern works because you are not asking the model to do anything unusual. You are asking it to classify or transform content (which permissive mode allows) and output the result as JSON (which the model does naturally). The manual parsing step and the extra tokens for the schema are the trade-off for type safety and permissive behavior. I advise you to *not* use this pattern when your content is not sensitive and `.default` guardrails work fine. While I have not come across any edge cases where the model wraps the JSON in markdown code fences or adds explanatory text, you may want to handle them. A more robust extraction function handles these cases: ```swift private func extractJSON(from text: String) -> String? { var cleaned = text .replacingOccurrences(of: "```json", with: "") .replacingOccurrences(of: "```", with: "") .trimmingCharacters(in: .whitespacesAndNewlines) guard let end = cleaned.lastIndex(of: "}"), let start = cleaned[...end].lastIndex(of: "{") else { return nil } return String(cleaned[start...end]) } ``` Since the model receives the exact schema with the precise enum raw values, it produces matching output. The direct decoding approach keeps your code simple while the explicit prompt instructions ensure the output is valid JSON. ### Guardrail Violations and Refusals When the model says "I cannot assist with that request," it can mean very different things. When the model throws a `GenerationError.guardrailViolation`, it means that the system's safety checks blocked the request or the response. You should tell the user clearly that you cannot help with that request, retry and urge them to try a different wording or a different task. For cases where the model passes the safety checks but still chooses not to answer by throwing a `GenerationError.refusal(_, _)`, you can explain the limitation in plain language ("I cannot give medical diagnoses", "I am not a lawyer") and offer a hardcoded response instead. ### Mitigating False Positives You will probably hit the "but this *should* be allowed" moment. The temptation is to search for a big red "disable safety" switch. That switch does not exist—and that is a good thing. You can reframe the prompt while keeping the meaning: ```swift func softenSensitiveLanguage(_ text: String) -> String { text .replacingOccurrences(of: "mortal sin", with: "serious moral matter") .replacingOccurrences(of: "hell", with: "eternal separation from God") .replacingOccurrences(of: "damnation", with: "spiritual consequence") .replacingOccurrences(of: "sexual", with: "intimate") } func respondWithReframing(_ query: String) async -> String { do { return try await strictSession.respond(to: Prompt(query)).content } catch LanguageModelSession.GenerationError.guardrailViolation { let softened = softenSensitiveLanguage(query) return try await strictSession.respond(to: Prompt(softened)).content } } ``` Or use a softer prompt variants for sensitive domains. For topics like faith, mental health, grief, or relationships, keep two versions of your instructions: - A "normal" one for everyday questions. - A **pastoral / gentle** one that you automatically switch to if a guardrail violation occurs or if the input matches a sensitive pattern. **Lean on permissive transformations when you are mostly rephrasing** if you are summarizing user text, explaining doctrinal material, or paraphrasing policy documents. Provide very explicit instructions about what the model is allowed to do. This reduces false positives while still giving the system permission to refuse dangerous behavior. **Change your prompt, not the policy**. You are teaching the model to talk about hard topics in a way that stays inside the safety rails, instead of ripping the rails out. ### Testing and Measuring Guardrail Behavior You do not really understand your guardrails until you have tests and numbers. **Add explicit tests for safety behavior** by creating a small suite of prompts that you expect to be: - **Blocked** (e.g., self‑harm instructions, hate content). - **Allowed but carefully handled** (e.g., "I'm depressed and need someone to talk to"). - **Fully safe** (everyday small talk). For each, assert whether you see `guardrailViolation`, `refusal`, or a normal response. Having numbers makes it possible to say "we reduced guardrail false positives from 10% to 2% over three iterations" instead of "it feels better now." **Detect implicit guardrail behavior in responses** as even when no error is thrown, the model may output phrases like: - "I cannot assist with that request." - "I’m sorry, but I cannot provide that information." - "This content is too sensitive." Treat these phrases as **soft guardrail signals**. Log them, feed them into your telemetry, and rerun the query or provide a placeholder response. ### Case Study: A Sensitive Topics App I worked on a stealth app where my biggest challenge was navigating guardrails for sensitive topics. For the sake of the NDA, I will not name the app, but users wrote about trauma, grief, or difficult emotions. The initial pass rate was around 60% on AI-assisted reflection prompts, even though the questions themselves were appropriate and user-initiated. The solution combined `.permissiveContentTransformations` with carefully designed few-shot examples filled with compassionate, non-judgmental responses. After numerous rounds of iteration and testing, the pass rate improved to nearly 100%, even on the strongest words I could think of. Here are some lessons that I think can be applied to any app: - Teaching the model **how to respond** (through concrete examples) beats just telling it **what to avoid** - Grounding responses in domain‑appropriate sources reduces speculative content that can trigger guardrails - Using consistent sampling with a low temperature and concise prompt makes safety behavior reproducible and testable over time. ## Safety in Instructions **Instructions take priority over prompts**, making them your primary tool to ensure safe behavior. The most critical safety rule: **never include untrusted content in instructions**. ```swift // WRONG - Security vulnerability let unsafeInstructions = """ You are \(userRole). Help the user with \(userRequest). """ // RIGHT - Safe approach let safeInstructions = """ You are a helpful travel assistant. Help users plan safe, enjoyable trips. """ let userPrompt = "I want to plan a trip to \(userDestination)" ``` Use clear, firm language in your instructions. The model responds well to direct commands, especially negative constraints: ```swift let instructions = """ You are a family-friendly assistant for a children's app. DO NOT include violence, scary content, or adult themes. DO NOT use inappropriate language or discuss sensitive topics. Keep all content appropriate for ages 8-12. """ ``` ## Input Patterns with Risk Management Different approaches to handling user input carry different levels of risk: ### Direct User Input (Highest Risk) ```swift // User controls the entire prompt let response = try await session.respond(to: userInput) ``` This is the highest risk approach as the user can control the entire prompt. You should use this approach for chat interfaces, creative tools where flexibility is essential. The risk is that the user can inject inappropriate requests, such as asking for harmful content. The mitigations are strong safety instructions, while relying on the model to be able to handle the request and not generate harmful content. ### Structured Prompts (Moderate Risk) ```swift let prompt = """ Summarize the following content in a positive, family-friendly way: Content: \(userInput) Requirements: - Keep the tone optimistic - Focus on key highlights only - Avoid controversial topics """ ``` This is a moderate risk approach as the user can control the prompt, but the prompt is structured and the model is expected to handle the request and not generate harmful content. Although, there is still a risk that the user content could still influence model behavior in unexpected ways. ### Curated Options (Lowest Risk) ```swift let prompts = [ "Generate a happy bedtime story about animals", "Create a motivational quote for students", "Suggest a fun family activity for weekends" ] let selectedPrompt = prompts[userSelection] let response = try await session.respond(to: selectedPrompt) ``` This is the lowest risk approach as the user can only select from a list of prompts. The risk is minimal as **you** control all possible inputs. The mitigations are careful curation, thorough testing of each option as the output is still probabilistic in nature. ## Using Guided Generation for Safety Structured output can also serve as a safety mechanism by constraining the content of responses: ```swift @Generable enum StoryTone: String, CaseIterable { case gentle, funny, adventurous } @Generable struct SafeStoryResponse { @Guide(description: "One of: gentle, funny, adventurous") let tone: StoryTone @Guide(description: "3–5 sentences, age-appropriate, no violence or scary content") let text: String } let session = LanguageModelSession( instructions: Instructions(""" You are a family-friendly writing assistant. DO NOT include violence, horror, or adult themes. Keep language simple and supportive. """) ) do { let stream = session.streamResponse( to: "Bedtime story about a fox", generating: SafeStoryResponse.self ) for try await snapshot in stream { renderStory(snapshot.content) } } catch LanguageModelSession.GenerationError.guardrailViolation { showMessage("I cannot create that story. Please try a different topic.") } ``` This approach combines the type safety of Swift with the content safety of Foundation Models. ## Domain-Specific Safety Considerations Different app categories require different safety approaches: ### Food and Recipe Apps ```swift let cookingInstructions = """ You are a helpful cooking assistant for a family recipe app. Safety requirements: - Always include allergy warnings for common allergens - Mention food safety practices when relevant - Do not recommend raw or undercooked foods for vulnerable populations - Consider dietary restrictions when suggesting substitutions Example response format: "This recipe contains nuts and dairy. Wash hands before cooking. Cook chicken to 165°F internal temperature." """ ``` The additional considerations that I can think of including are allowing the users to set dietary restrictions, filtering suggestions based on known allergies, and verifying ingredient safety for children out of the many more that you can think of. ### Educational Content ```swift let educationInstructions = """ You are an educational assistant for middle school students. Content guidelines: - Keep all content age-appropriate for 11-14 year olds - Focus on factual, educational information - When discussing sensitive historical topics, maintain educational context - If asked about inappropriate topics, redirect to related educational content Teaching approach: - Encourage curiosity and learning - Break down complex concepts clearly - Provide examples relevant to student experiences """ ``` You will need to review the content for cultural sensitivity, avoid topics inappropriate for the age group and provide educational context for sensitive subjects. ## Building Trust The best AI apps are transparent about AI involvement and set correct expectations. This is a great way to build trust with your users. ```swift // Good - Clear attribution "Here is an AI-generated travel itinerary for your Tokyo trip:" // Better - Sets realistic expectations "I have created a suggested itinerary using AI. Please verify details and make adjustments based on your preferences." ``` You can also include appropriate disclaimers: ```swift let disclaimerInstructions = """ Always end responses with relevant disclaimers: For travel advice: "Please verify current travel requirements and safety conditions." For health information: "This is general information. Consult healthcare professionals for medical advice." For financial content: "This is educational information, not financial advice." """ ``` ### Medical Disclaimer in System Instructions In my app, Zenther, I include a consistent medical disclaimer across all the AI interactions: ```swift let medicalDisclaimerInstructions = """ You are a fitness and wellness assistant. Do not provide medical diagnosis or treatment. If the user describes symptoms such as chest pain, severe dizziness, or acute discomfort, advise them to stop exercising and seek medical attention. Encourage consulting a qualified healthcare provider before starting or changing exercise programs. Avoid dangerous or illegal guidance. Refuse requests that involve self-harm, unsafe practices, or unlawful activity. Keep guidance general and wellness-oriented; do not claim to replace professional medical advice. """ ``` ### User-Friendly Error Handling The nutrition label scanning feature in Zenther has a graceful error handling: ```swift case .safetyGuardrailsTriggered: return String(localized: "Unable to analyze this nutrition label. Please try a different image or enter the nutrition information manually.", comment: "Error message when AI safety guardrails prevent analysis of nutrition label") ``` ## What's Next You now understand how to implement safety measures that work with Foundation Models rather than against it. Build incrementally. Start with the simplest version of your AI feature, test it thoroughly, then expand. Users prefer transparent limitations over ambitious promises that do not deliver consistently. Different apps need different approaches. What works for a children's educational app will not work for a professional writing tool. A recipe app has completely different safety concerns than a fitness tracker. Design your safety measures around your actual users and use cases, not generic guidelines. Treat this chapter as a guide rather than a rulebook. --- ## Chapter 09: Integrating External JSON APIs Canonical URL: https://www.rudrank.ai/foundation-models/integrating-external-json-apis Combine external JSON APIs with Foundation Models for grounded app workflows and structured results. `@Generable` lets you define a single Swift model and reuse it with Foundation Models and external providers that accept JSON Schema. It saves you from writing custom JSON schemas and parsing logic for each provider, especially as you plan to support backup providers when the Foundation Model is not available. The `@Generable` macro can also help you decode JSON data that you can reuse with external providers, not just Apple's on-device models. However, it requires iOS 26 or later. In practice, you can use the same structures and enumerations for both Foundation Models and external providers as a fallback when Apple Intelligence is not available—for example, on older devices or in unsupported regions. Most mainstream APIs (OpenAI, Anthropic, Google) accept JSON Schema for tools/function calling or constrained JSON output. Your `@Generable` structures automatically generate those schemas, so you keep one source of truth. ## Prerequisites and Context This chapter builds on the structured generation concepts from earlier chapters and tool patterns from the previous chapters. You should understand how `@Generable` types work, how to create structured schemas, and basic tool calling patterns. This chapter shows how to extend those same patterns to external AI providers when Foundation Models is not available or when you need capabilities beyond the on-device model. ## What You Will Learn By the end of this chapter, you will be able to: - Reuse `@Generable` structures with external AI providers - Export JSON schemas from your Swift types automatically - Integrate with OpenAI's structured output using `response_format` - Build fallback systems that work across multiple AI providers - Handle API integration safely without exposing keys in client code - Use third-party packages like AIProxy for streamlined integration ## Architecture and Streaming For production apps, avoid exposing provider API keys in client code and use a server or proxy that streams responses to the app. This keeps keys secure and also controls rate limits and retries. - Server or proxy: Route requests through your backend and stream tokens down to the client UI. - Local testing: If you call a provider directly from the app, only do so when the user enters their own key in-app, and store it in the Keychain. This pattern applies to other providers (Anthropic, OpenRouter, Groq, Gemini via OpenAI-compatible endpoints). ## Defining the @Generable Structure Instead of maintaining separate models and parsing logic, you can define your structure once and use it with both Foundation Models and external providers. ```swift @Generable public struct BedtimeReminderMessage: Codable { @Guide(description: "Personalized bedtime reminder title (4-8 words max)") let title: String @Guide(description: "Engaging reminder message that motivates without being pushy (1-2 sentences)") let message: String @Guide(description: "Quick tip for better sleep tonight") let tip: String } ``` ## Including the JSON Schema When you generate structured output with Foundation Models, the framework automatically includes the schema of your structure as part of the input in a format the model has been trained on. The parameter `includeSchemaInPrompt` is `true` by default. The `@Generable` type exposes a `GenerationSchema` representation. You can send this schema to external APIs: ```swift // Get the JSON schema automatically let schema = BedtimeReminderMessage.generationSchema ``` Printed schema example: ```json { "additionalProperties" : false, "properties" : { "message" : { "description" : "Engaging reminder message that motivates without being pushy (1-2 sentences)", "type" : "string" }, "tip" : { "description" : "Quick tip for better sleep tonight", "type" : "string" }, "title" : { "description" : "Personalized bedtime reminder title (4-8 words max)", "type" : "string" } }, "required" : [ "title", "message", "tip" ], "title" : "BedtimeReminderMessage", "type" : "object", "x-order" : [ "title", "message", "tip" ] } ``` ## Making a Request to OpenAI with JSON Schema You can reuse the exported schema by using the `response_format` parameter in the OpenAI chat completion request to have the model return a JSON object that conforms to the schema. When `response_format` is supplied with `strict: true`, the model output will conform to the supplied schema. ```swift let schema = BedtimeReminderMessage.generationSchema let schemaJSON = String( data: try JSONEncoder().encode(schema), encoding: .utf8 ) ?? "{}" ``` Here is a playground example that uses `OpenAISession` to make a request to the OpenAI API and return the response as a `BedtimeReminderMessage`: ```swift #Playground { let instructions = """ You are a personalized sleep coach generating encouraging bedtime reminders. Context considerations: - High sleep debt: More urgent, emphasize recovery importance - Important tomorrow: Focus on performance preparation - Poor consistency: Encourage routine building - Stress/caffeine: Acknowledge need for wind-down time - Workout recovery: Emphasize rest for muscle repair Create friendly, motivating messages that: - Are encouraging but never pushy or guilt-inducing - Reference specific context when helpful - Provide actionable next steps - Keep it concise and warm Match the tone to be supportive and understanding of real-world constraints. """ let mockPrompt = """ Generate a bedtime reminder for Rudrank: Optimal bedtime: 10:45 PM Sleep need: 7.8h Sleep debt: 1.3h Tomorrow importance: high Recent consistency: 78% Today's factors: evening workout, 2 coffees after 4 PM, late screen time Reasoning: Slept below target for the past 3 nights; earlier lights-out will help reduce debt before an early start tomorrow. """ let session = LanguageModelSession(instructions: instructions) let response = try await session.respond( to: mockPrompt, generating: BedtimeReminderMessage.self ) debugPrint(response.content) let openAISession = OpenAISession(instructions: instructions) let gptReminder = try await openAISession.respond( to: mockPrompt, generating: BedtimeReminderMessage.self ) debugPrint(gptReminder) } ``` Here is the example output from Foundation Models and GPT-4o mini: ``` // Foundation Models BedtimeReminderMessage( title: "Prioritize Rest for Tomorrow's Success", message: "With just 1.3 hours of sleep debt, it is crucial to focus on recovery tonight. Aim for bed by 10:45 PM to tackle tomorrow's tasks with your best self.", tip: "Avoid caffeine after 4 PM and dim the lights to signal your body it is time to wind down." ) // GPT-4o mini BedtimeReminderMessage( title: "Restful Night for a Big Day Tomorrow", message: "Hey Rudrank, it is time to start winding down after a busy day. Tonight's sleep sets you up for tomorrow's important events, so aiming for an earlier bedtime will help balance that sleep debt and support recovery from your workout.", tip: "Consider a calming tea to ease caffeine effects.") ``` ### Raw URLSession with response_format json_schema Here is an example of `OpenAISession` that posts to `chat/completions` with `response_format: json_schema` and returns your `@Generable` type directly: ```swift import Foundation import FoundationModels struct OpenAIResponse: Codable { let choices: [Choice] struct Choice: Codable { let message: Message struct Message: Codable { let content: String } } } enum OpenAIError: Error, LocalizedError { case invalidURL case invalidResponse case apiError(statusCode: Int) var errorDescription: String? { switch self { case .invalidURL: return "Invalid URL" case .invalidResponse: return "Invalid response" case .apiError(let statusCode): return "API error with status code: \(statusCode)" } } } final class OpenAISession { // Replace with a user-provided or proxy key in practice private let apiKey = "" private let baseURL = "https://api.openai.com/v1/chat/completions" private let instructions: String? public init(instructions: String? = nil) { self.instructions = instructions } func respond( to prompt: String, generating type: Content.Type = Content.self ) async throws -> Content where Content: Generable { guard let url = URL(string: baseURL) else { throw OpenAIError.invalidURL } var request = URLRequest(url: url) request.httpMethod = "POST" request.addValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization") request.addValue("application/json", forHTTPHeaderField: "Content-Type") // Convert the schema to JSON-compatible format let schemaData = try JSONEncoder().encode(type.generationSchema) let schemaJSON = try JSONSerialization.jsonObject(with: schemaData, options: []) var jsonBody: [String: Any] = [ "model": "gpt-4o-2024-08-06", "messages": [ ["role": "system", "content": instructions ?? ""], ["role": "user", "content": prompt] ] ] jsonBody["response_format"] = [ "type": "json_schema", "json_schema": [ "name": "schema", "strict": true, "schema": schemaJSON ] ] let data = try JSONSerialization.data(withJSONObject: jsonBody, options: []) request.httpBody = data let (dataOut, response) = try await URLSession.shared.data(for: request) guard let httpResponse = response as? HTTPURLResponse else { throw OpenAIError.invalidResponse } guard httpResponse.statusCode == 200 else { let errorBody = String(data: dataOut, encoding: .utf8) ?? "Unknown error" print("API Error (\(httpResponse.statusCode)): \(errorBody)") throw OpenAIError.apiError(statusCode: httpResponse.statusCode) } let openAIResponse = try JSONDecoder().decode(OpenAIResponse.self, from: dataOut) let json = openAIResponse.choices.first?.message.content ?? "{}" return try type.init(.init(json: json)) } } ``` This method demonstrates how to make a request to the OpenAI API and return a structured response. Use this approach on your **backend**, not the client side, to avoid exposing your API keys. If your app allows users to enter their own API key, you can use this method on the client side. ## Using AIProxy or Other Packages You can use AIProxy or other packages to make a request to the OpenAI API and return the response in the `BedtimeReminderMessage` type. The main difference is how each library expects the schema; all enforce strict schema adherence. I wrote an Encodable schema extension for AIProxy that you can use if you prefer writing everything in Swift. ``` https://github.com/rudrankriyam/AIProxySwift ``` And here is an example of how to use it: ```swift import AIProxySwift let openAIService = AIProxy.openAIDirectService(unprotectedAPIKey: key) /* Uncomment for all other production use cases // let openAIService = AIProxy.openAIService( // partialKey: "partial-key-from-your-developer-dashboard", // serviceURL: "service-url-from-your-developer-dashboard" // ) */ let requestBody = OpenAIChatCompletionRequestBody( model: "gpt-4o-2024-08-06", messages: [ .system(content: .text(instructions)), .user(content: .text(mockPrompt)) ], responseFormat: .encodableJSONSchema(name: "BedtimeReminderMessage", schema: BedtimeReminderMessage.generationSchema, strict: true) ) let response = try await openAIService.chatCompletionRequest(body: requestBody, secondsToWait: 60) let json = response.choices.first?.message.content ?? "" let reminder = try BedtimeReminderMessage(GeneratedContent(json: json)) debugPrint(reminder) ``` This approach lets you use the same structure for both Foundation Models and external providers. It works with any endpoint that supports JSON Schema and is compatible with OpenAI-style endpoints, including Anthropic, Google (Gemini), OpenRouter, Groq, and more. ## Nutrition JSON to Foundation Models In Zenther, I have a barcode feature that fetches nutrition data from OpenFoodFacts (JSON), then combines it with Foundation Models for analysis: ```swift func generateNutritionInsight(for food: OpenFoodFactsFood) async throws -> String { let session = LanguageModelSession( instructions: Instructions(""" You are a supportive nutrition coach. Use the provided nutrition as facts. Keep responses brief (2 sentences). Be encouraging and practical. """) ) let prompt = """ Food: \(food.foodName) Per 100g: \(Int(food.calories)) kcal, \(food.protein) g protein, \(food.carbohydrates) g carbs, \(food.fat) g fat Provide one short encouragement and one actionable tip. """ do { let response = try await session.respond(to: prompt) return response.content } catch LanguageModelSession.GenerationError.guardrailViolation { return "I cannot provide an insight for this item. Please try another product." } catch { return "Unable to generate an insight right now." } } ``` This design keeps the facts grounded in a trusted JSON API, and uses the model for tone, prioritization, and guidance. ## What's Next `@Generable` lets you define a single Swift model and reuse it with Foundation Models and external providers that accept JSON Schema. It saves you from writing custom JSON schemas and parsing logic for each provider, especially as you plan to support backup providers when the Foundation Model is not available. With external API integration patterns established, the next chapter explores dynamic generation schemas for runtime schema construction. While `@Generable` works great for compile-time structures, dynamic schemas enable complex scenarios like user-configurable forms and varying document types where you cannot know the structure ahead of time. --- ## Chapter 10: Content Tagging and Classification Canonical URL: https://www.rudrank.ai/foundation-models/content-tagging-and-classification Classify, tag, and organize user content with small, focused prompts and structured output. Foundation Models provides a specialized mode for classifying and tagging content without conversational overhead. While the general-purpose model excels at dialogue, the content tagging model focuses on extracting structured metadata from text in a single pass. You define your classification schema using `@Generable` types, and the model returns type-safe tags that you can immediately use in your app. This approach differs from traditional keyword matching or rule-based systems. The model understands context and meaning, so it can classify "I am so done with this" as frustrated even though the word "frustrated" never appears. I am working for a stealth startup that is betting on Foundation Models and Apple's ecosystem. We are building companion apps, and we decided to extensively play around with the content tagging model for our work. This chapter builds from that experience. ## Prerequisites and Context This chapter builds directly on the structured generation patterns from the structured generation chapter. You should be comfortable with `@Generable` structs and enums, `@Guide` descriptions, and how constrained decoding produces type-safe results. The safety concepts here connect to the safety chapter, particularly around handling edge cases where the model might be too conservative or miss important signals. ## What You Will Learn By the end of this chapter, you will be able to: - Use `SystemLanguageModel` with the `.contentTagging` use case for classification tasks - Design multi-dimensional classification schemas that extract several tag types simultaneously - Compare content tagging with the general model for your use case - Handle ambiguous and sensitive content with counterexample-aware prompting - Apply content tagging patterns to production scenarios like support tickets, content moderation, and personalization - Tune sampling and temperature to balance stability and variation I keep a single support-ticket example running through the chapter so you can see how each decision changes real output. ## The Content Tagging Model When you create a `SystemLanguageModel` with the `.contentTagging` use case, you get a model optimized for classification rather than conversation: ```swift let model = SystemLanguageModel(useCase: .contentTagging) let session = LanguageModelSession(model: model) ``` The content tagging model differs from the general-purpose model in three practical ways that shape how you build tagging features. It returns tags instead of conversational text, so you can treat the output as data. It works best with short to medium-length inputs, so longer documents need to be split and classified in parts. It is tuned for extraction and labeling tasks rather than open-ended generation. Those differences guide the rest of this chapter, which focuses on schemas, guide descriptions, and sampling choices instead of chat flows. ### Choosing Between Content Tagging and General Use content tagging when you need short, consistent labels for actions, objects, emotions, or topics. The specialized model keeps tags compact and reduces prompt overhead because you can describe constraints with `@Guide` instead of long instructions. Use the general model when you need to invent labels, create hashtags, or apply constraints that do not fit simple tag lists. If you use tool calling and want tags, run the general model to interpret tool output, then pass the results into the content tagging model for normalization. I recommend starting with content tagging when the output is a label, and moving to the general model only when you need free-form text. When you are unsure, compare both models with the same schema and low-variance options. Measure how often each model produces the same tags across multiple runs and prompts. You can also specify guardrails when creating the model. `.permissiveContentTransformations` only relaxes guardrails for `String` responses, and structured generation behaves the same as `.default`. Use this mode only when you are transforming text to text: ```swift let model = SystemLanguageModel( useCase: .contentTagging, guardrails: .permissiveContentTransformations ) ``` For tagging with `@Generable`, expect the same guardrail behavior as `.default`. If you need permissive transformations with structured output, the safety chapter includes a manual decoding pattern that generates JSON as a `String` and parses it while still using `@Generable` for schema guidance. Classification benefits from stable output. I recommend lowering temperature for tagging tasks, because you want consistency instead of creativity. Leaving `temperature` as `nil` lets the system choose a default, so set it explicitly when you need repeatable behavior: ```swift let options = GenerationOptions(temperature: 0.3) ``` ## Single-Dimension Classification The simplest tagging pattern extracts one category from text. Start here to establish a baseline before you add complexity. Apple's documentation shows a basic example with topics and emotions. Here is a version for a note-taking app: ```swift @Generable enum NoteCategory: String, CaseIterable { case personal case work case health case finance case travel case learning case ideas case tasks } @Generable struct NoteCategorization { @Guide(description: "The primary category that best describes this note's content") let category: NoteCategory } ``` Classification uses the same `respond(to:generating:)` pattern from structured generation: ```swift let model = SystemLanguageModel(useCase: .contentTagging) let session = LanguageModelSession(model: model) let options = GenerationOptions(temperature: 0.3) let result = try await session.respond( to: "Remember to book flights to Tokyo for the conference next month", generating: NoteCategorization.self, options: options ) print(result.content.category) // .travel or .work depending on model interpretation ``` Notice that the example input could reasonably be either travel or work. Single-dimension classification forces a choice, which may not capture the full picture. Once you see that limitation, the next step is to capture multiple dimensions in a single pass. ## Multi-Dimensional Classification After learning about basic tagging examples, let's move on to content that rarely fits into a single category. A support message might be about billing, express frustration, and require urgent attention. All at once. Multi-dimensional classification captures these orthogonal aspects in a single pass. This is where I saw the biggest product impact in our companion app work. Consider a customer support system. You want to know what the ticket is about, how the customer feels, and how urgently you should respond: ```swift @Generable enum TicketTopic: String, CaseIterable { case billing case technicalIssue case featureRequest case accountAccess case cancellation case general } @Generable enum EmotionalTone: String, CaseIterable { case neutral case frustrated case appreciative case confused case urgent } @Generable enum UrgencyLevel: String, CaseIterable { case low case medium case high case critical } @Generable struct SupportTicketClassification { @Guide(description: "The primary topic this support request addresses") let topic: TicketTopic @Guide(description: "The emotional tone expressed by the customer") let tone: EmotionalTone @Guide(description: "How urgently this ticket should be addressed based on the customer's language and situation") let urgency: UrgencyLevel @Guide(description: "Whether the customer mentioned previous failed attempts to resolve this issue") let hasPriorAttempts: Bool } ``` Now a single classification call extracts all four dimensions: ```swift let ticket = """ This is the THIRD time I am reaching out about this. My account has been locked for two days now and I have a presentation tomorrow that requires access to my files. Your previous support person said it would be fixed within 24 hours. That was 48 hours ago. """ let options = GenerationOptions(temperature: 0.3) let classification = try await session.respond( to: ticket, generating: SupportTicketClassification.self, options: options ).content // classification.topic: .accountAccess // classification.tone: .frustrated // classification.urgency: .critical // classification.hasPriorAttempts: true ``` The model extracts multiple signals from a single piece of text. You can use these classifications to route tickets, prioritize queues, or trigger escalation workflows. The quality of those outputs depends on how you describe each field. ## Guiding Classification with Descriptions Guides are where you earn reliability. I spend more time here than on prompts because the description stays attached to a single field instead of competing with the rest of the prompt. The `@Guide` description is your primary tool for influencing how the model interprets each field. Vague descriptions produce inconsistent results. Specific descriptions with examples and boundaries produce reliable classifications. Short guides beat long prompts because they stay scoped to the field. Compare these two approaches: ```swift // Vague — model has to guess what you mean @Guide(description: "The urgency level") let urgency: UrgencyLevel // Specific — model understands your criteria @Guide(description: "Urgency based on time pressure and impact: critical if customer mentions deadline within 24 hours or complete inability to work, high if frustrated with repeated issues, medium if normal request with some time pressure, low if general inquiry with no time constraint") let urgency: UrgencyLevel ``` The second version gives the model concrete criteria. When the customer mentions "presentation tomorrow," the model knows this signals critical urgency because you defined what critical means. For boolean fields, describe both the true and false cases: ```swift @Guide(description: "True if the customer explicitly mentions previous support interactions, tickets, or failed resolution attempts. False if this appears to be their first contact about this issue.") let hasPriorAttempts: Bool ``` ## Handling Ambiguous Content Some content genuinely belongs to multiple categories or sits on the boundary between classifications. You have several strategies for handling ambiguity. I use confidence when I need one label but want to signal uncertainty. I use multiple labels when the ambiguity is the point. ### Confidence Scoring Add a confidence field to capture certainty: ```swift @Generable enum ConfidenceLevel: String, CaseIterable { case high case medium case low } @Generable struct ClassificationWithConfidence { @Guide(description: "The most likely topic category") let topic: TicketTopic @Guide(description: "Confidence in the topic classification: high if clear and unambiguous, medium if reasonable but other categories could apply, low if genuinely unclear") let confidence: ConfidenceLevel } ``` Low confidence results can trigger human review or request clarification from the user. ### Multiple Labels When content legitimately spans categories, allow multiple selections: ```swift @Generable struct MultiLabelClassification { @Guide(description: "All relevant topic categories, ordered by relevance. Include 1-3 categories.", .count(1...3)) let topics: [TicketTopic] @Guide(description: "The single most relevant category from the topics list") let primaryTopic: TicketTopic } ``` If you still need a single routing label, keep a primary category and capture the rest as context. ### Secondary Classification For complex content, capture primary and secondary classifications: ```swift @Generable struct LayeredClassification { @Guide(description: "The main topic being discussed") let primaryTopic: TicketTopic @Guide(description: "A secondary topic if the content addresses multiple areas, otherwise the same as primaryTopic") let secondaryTopic: TicketTopic @Guide(description: "Whether the content genuinely spans multiple distinct topics") let isMultiTopic: Bool } ``` ## Counterexamples and Disambiguation When categories are close, counterexamples are more reliable than extra adjectives. The model might misinterpret "My account is dead to me" as a technical issue when the customer means they want to cancel. Counterexamples in your `@Guide` descriptions help the model distinguish between similar-sounding but different categories. Consider crisis detection in a mental health app. You need to distinguish between someone expressing suicidal ideation and someone discussing a loved one's suicide: ```swift @Generable enum CrisisIndicator: String, CaseIterable { case none case ambiguous case crisis } @Generable struct SafetyClassification { @Guide(description: """ Crisis level based on self-harm indicators: - crisis: User expresses intent to harm themselves ("I want to end it", "I am going to kill myself") - ambiguous: User expresses hopelessness that might indicate crisis ("what is the point", "I cannot go on") - none: No self-harm indicators, including discussions ABOUT suicide that are not personal ("my friend died by suicide", "what does the church teach about suicide") IMPORTANT: Discussions about others' suicides or academic questions about suicide should be classified as 'none', not 'crisis'. """) let crisisLevel: CrisisIndicator } ``` The description explicitly lists what each category means AND what it does not mean. The counterexample about discussing others' suicides prevents false positives that could inappropriately alarm users seeking grief support. ## Audience-Aware Classification Once you can label intent and tone, you can adapt output to who is reading it. Content tagging becomes more useful when it adapts to your audience. A children's education app needs different classification than a professional productivity tool. You can add audience-awareness to your schemas: ```swift @Generable enum AudienceLevel: String, CaseIterable { case beginner case intermediate case expert } @Generable enum ContentComplexity: String, CaseIterable { case simple case moderate case technical } @Generable struct AudienceAwareClassification { @Guide(description: "The expertise level this content assumes: beginner if it explains basic concepts, intermediate if it assumes foundational knowledge, expert if it uses specialized terminology without explanation") let audienceLevel: AudienceLevel @Guide(description: "The complexity of the content itself: simple if straightforward, moderate if requires some thought, technical if involves specialized processes or concepts") let complexity: ContentComplexity @Guide(description: "Whether the content contains jargon or terminology that might confuse newcomers") let containsJargon: Bool } ``` This classification helps you personalize content delivery. If a user's reading history suggests beginner level but they submit a query classified as expert-level, you might offer to explain in simpler terms. ## Evaluating Classification Accuracy Experimentation has to become evidence. I recommend starting with 30 to 50 labeled examples and expanding as you find edge cases. Before deploying content tagging to production, you need to know how well it performs. Create a ground truth dataset with manually labeled examples: ```swift struct LabeledExample { let id: String let text: String let expectedClassification: SupportTicketClassification } let groundTruth: [LabeledExample] = [ LabeledExample( id: "ticket-001", text: "My payment failed but I was still charged. Need this fixed today.", expectedClassification: SupportTicketClassification( topic: .billing, tone: .frustrated, urgency: .high, hasPriorAttempts: false ) ), LabeledExample( id: "ticket-002", text: "Hi! Just wondering if you have any plans to add dark mode?", expectedClassification: SupportTicketClassification( topic: .featureRequest, tone: .neutral, urgency: .low, hasPriorAttempts: false ) ), // Add 50-100 examples covering edge cases ] ``` Run batch evaluation and compute per-field accuracy: ```swift struct EvaluationResult { let totalExamples: Int let topicAccuracy: Double let toneAccuracy: Double let urgencyAccuracy: Double let priorAttemptsAccuracy: Double let overallAccuracy: Double // All fields correct } func evaluate( examples: [LabeledExample], using session: LanguageModelSession ) async throws -> EvaluationResult { let options = GenerationOptions(sampling: .greedy, temperature: 0.1) guard !examples.isEmpty else { return EvaluationResult( totalExamples: 0, topicAccuracy: 0, toneAccuracy: 0, urgencyAccuracy: 0, priorAttemptsAccuracy: 0, overallAccuracy: 0 ) } var topicCorrect = 0 var toneCorrect = 0 var urgencyCorrect = 0 var priorAttemptsCorrect = 0 var allCorrect = 0 for example in examples { let predicted = try await session.respond( to: example.text, generating: SupportTicketClassification.self, options: options ).content let expected = example.expectedClassification let topicMatch = predicted.topic == expected.topic let toneMatch = predicted.tone == expected.tone let urgencyMatch = predicted.urgency == expected.urgency let priorMatch = predicted.hasPriorAttempts == expected.hasPriorAttempts if topicMatch { topicCorrect += 1 } if toneMatch { toneCorrect += 1 } if urgencyMatch { urgencyCorrect += 1 } if priorMatch { priorAttemptsCorrect += 1 } if topicMatch && toneMatch && urgencyMatch && priorMatch { allCorrect += 1 } } let total = Double(examples.count) return EvaluationResult( totalExamples: examples.count, topicAccuracy: Double(topicCorrect) / total, toneAccuracy: Double(toneCorrect) / total, urgencyAccuracy: Double(urgencyCorrect) / total, priorAttemptsAccuracy: Double(priorAttemptsCorrect) / total, overallAccuracy: Double(allCorrect) / total ) } ``` Track accuracy over time as you refine your `@Guide` descriptions and category definitions. I recommend maintaining at least 70-80% overall accuracy before deploying to production, with higher thresholds for safety-critical classifications. ## Production Considerations ### Caching Classification Results Content tagging with `.greedy` sampling produces deterministic results, so the same input always yields the same output. You can cache classifications to avoid redundant model calls: ```swift actor ClassificationCache { private var cache: [String: SupportTicketClassification] = [:] func classification(for text: String) -> SupportTicketClassification? { cache[text] } func store(_ classification: SupportTicketClassification, for text: String) { cache[text] = classification } } ``` For longer-term caching, hash the input text and store results in a local database. Invalidate the cache when you update your classification schema or guide descriptions. ### Token Budget Management Keep instructions short. The content tagging model works best with concise prompts. Move detailed criteria into `@Guide` descriptions rather than session instructions. On iOS 26.4 and later, you can verify how many tokens your instructions actually consume by calling `SystemLanguageModel.default.tokenUsage(for:)` before creating the session: ```swift // Prefer short session instructions let session = LanguageModelSession( model: model, instructions: "Classify support tickets accurately." ) // Put detailed criteria in @Guide descriptions @Guide(description: "Urgency based on: critical = deadline within 24h, high = repeated issues, medium = normal priority, low = general inquiry") let urgency: UrgencyLevel ``` ### Sampling and Temperature Tuning After the schema is stable, tune sampling to control variation without changing the structure of your outputs. Tagging works best with stable output. Lower temperature reduces variation, and `.greedy` sampling always chooses the most likely token for each step. The API does not document a fixed default temperature, so leaving `temperature` as `nil` lets the system choose a default. Set it explicitly when you need repeatable tags. ```swift let stableOptions = GenerationOptions( sampling: .greedy, temperature: 0.1 ) ``` If you want a small amount of variation for ambiguous inputs, keep temperature low and use random sampling with a seed: ```swift let topKOptions = GenerationOptions( sampling: .random(top: 20, seed: 42), temperature: 0.3 ) let topPOptions = GenerationOptions( sampling: .random(probabilityThreshold: 0.9, seed: 42), temperature: 0.3 ) ``` A seed improves repeatability but does not guarantee identical output. For three-axis tuning, adjust sampling mode, temperature, and response length together: ```swift let options = GenerationOptions( sampling: .random(probabilityThreshold: 0.9, seed: 42), temperature: 0.3, maximumResponseTokens: 60 ) let sampleText = "My account is locked and I was charged twice." let result = try await session.respond( to: sampleText, generating: SupportTicketClassification.self, options: options ).content ``` #### Comparison Experiment: General vs Content Tagging To compare model stability, run the same prompt through both models with low-variance options and count unique outputs: ```swift let taggingModel = SystemLanguageModel(useCase: .contentTagging) let generalModel = SystemLanguageModel.default let options = GenerationOptions(sampling: .greedy, temperature: 0.1) func classify(_ text: String, model: SystemLanguageModel) async throws -> SupportTicketClassification { let session = LanguageModelSession(model: model) return try await session.respond( to: text, generating: SupportTicketClassification.self, options: options ).content } ``` Run three to five iterations per model and compare the number of unique classifications. If the general model shows more variation or produces less compact tags, the content tagging model is the better default for production tagging. In one run with two support prompts and greedy sampling, both models produced one unique classification per prompt across three runs. The content tagging model labeled the refund prompt as `frustrated` and `critical`, while the general model labeled the same prompt as `neutral` and `medium` urgency. That is a reminder to validate tone and urgency assumptions against your own data before you ship. ### Batch Classification When classifying multiple items, you can process them in parallel by creating a new session per task: ```swift func classifyBatch( tickets: [String], model: SystemLanguageModel, instructions: String ) async throws -> [SupportTicketClassification] { let options = GenerationOptions(temperature: 0.3) return try await withThrowingTaskGroup(of: (Int, SupportTicketClassification).self) { group in for (index, ticket) in tickets.enumerated() { group.addTask { let session = LanguageModelSession( model: model, instructions: instructions ) let result = try await session.respond( to: ticket, generating: SupportTicketClassification.self, options: options ).content return (index, result) } } var results = Array(repeating: SupportTicketClassification?.none, count: tickets.count) for try await (index, classification) in group { results[index] = classification } return results.compactMap { $0 } } } ``` Be mindful of device resources when running many classifications simultaneously. Even on an iPhone 16 Pro, I limit concurrency in production; on less powerful devices, process sequentially. ### Error Handling Classification can fail for various reasons. Handle errors gracefully and provide fallback behavior: ```swift func classifyWithFallback( text: String, session: LanguageModelSession ) async -> SupportTicketClassification { let options = GenerationOptions(temperature: 0.3) do { let result = try await session.respond( to: text, generating: SupportTicketClassification.self, options: options ).content return ClassificationValidator.validated(result, text: text) } catch LanguageModelSession.GenerationError.guardrailViolation { // Content triggered safety guardrails // Return a safe default that routes to human review return SupportTicketClassification( topic: .general, tone: .neutral, urgency: .high, // Escalate for human review hasPriorAttempts: false ) } catch { // Model unavailable or other error // Return default that does not make assumptions return SupportTicketClassification( topic: .general, tone: .neutral, urgency: .medium, hasPriorAttempts: false ) } } ``` ## Examples Here are some examples of how you can use content tagging and classification in your apps. ### Email Triage Email is a good first use case because tags map directly to inbox behaviors like prioritization and batching like automatically categorizing incoming emails to help users focus on what matters: ```swift @Generable enum EmailPriority: String, CaseIterable { case urgent case important case normal case low } @Generable enum EmailCategory: String, CaseIterable { case actionRequired case informational case promotional case social case automated } @Generable struct EmailClassification { @Guide(description: "Priority based on sender importance, deadline mentions, and action requirements") let priority: EmailPriority @Guide(description: "The type of email based on its purpose and expected response") let category: EmailCategory @Guide(description: "Whether a response is expected from the recipient") let requiresResponse: Bool @Guide(description: "Whether the email contains a deadline or time-sensitive request") let hasDeadline: Bool } ``` ### Content Moderation Moderation is where tag precision matters most, so keep outputs conservative and build clear escalation paths. Flag content that may violate community guidelines like spam, harassment, misinformation, inappropriate content, or other concerns: ```swift @Generable enum ModerationFlag: String, CaseIterable { case none case review case remove } @Generable struct ModerationClassification { @Guide(description: "Whether the content should be flagged: none if acceptable, review if borderline, remove if the content violates guidelines") let flag: ModerationFlag @Guide(description: "The primary concern if flagged, otherwise 'none'") let concernType: ConcernType @Guide(description: "Confidence in the moderation decision") let confidence: ConfidenceLevel } @Generable enum ConcernType: String, CaseIterable { case none case spam case harassment case misinformation case inappropriateContent case other } ``` If your moderation input is sensitive and you need permissive transformations, use the manual decoding pattern from the safety chapter, because structured output does not benefit from permissive guardrails. ### Learning Platform Personalization Personalization works best when tags connect directly to the help a learner expects, so classify user questions to adapt teaching style: ```swift @Generable struct LearnerClassification { @Guide(description: "The knowledge level the question suggests: exploring if unfamiliar with basics, learning if knows fundamentals but has gaps, practicing if applying knowledge, mastering if refining understanding") let knowledgeLevel: KnowledgeLevel @Guide(description: "The type of help needed: explanation if asking what/why, guidance if asking how, feedback if sharing work for review, encouragement if expressing frustration") let helpType: HelpType @Guide(description: "Whether the learner expressed confusion or uncertainty") let isConfused: Bool } @Generable enum KnowledgeLevel: String, CaseIterable { case exploring case learning case practicing case mastering } @Generable enum HelpType: String, CaseIterable { case explanation case guidance case feedback case encouragement } ``` ## What's Next Content tagging transforms unstructured text into metadata. The classification patterns you learned here, combining multiple tag types in a single schema, handling ambiguity, and tuning sampling, apply to any domain where you need to understand user intent or content characteristics. Content tagging is also about experimentation and iteration. Start with a simple schema, build a ground truth dataset, measure accuracy, and refine your `@Guide` descriptions based on where the model struggles. In my startup work, this loop exposed which categories needed clearer definitions and which prompts needed trimming. With content tagging and classification patterns established, the next chapter covers supported languages and internationalization. You will learn how Foundation Models handles different languages and how to build multilingual AI experiences that work for users worldwide. --- ## Chapter 11: Supported Languages and Internationalization Canonical URL: https://www.rudrank.ai/foundation-models/supported-languages-and-internationalization Inspect supported locales and build multilingual Foundation Models experiences responsibly. Apple Intelligence supports 23 *locales* (not languages), and Foundation Models supports all of them. This chapter shows how to inspect supported locales at runtime and fetch responses in the user's language for regional differences. It builds on the session management from earlier chapters and applies internationalization considerations to all the patterns you have learned, from basic text generation through tool calling. ## What You Will Learn By the end of this chapter, you will be able to: - Query supported languages at runtime rather than hardcoding locale lists - Display language options correctly for users across different regions - Design multilingual conversation flows using persistent sessions - Handle code-switching and mixed-language input naturally - Implement patterns for regional formatting and cultural context - Understand the limitations and quality variations across different locales ## Building Multilingual Experiences If your app is localized, your AI should be as well. When a user runs your app in Korean, they expect AI responses in Korean. Since Apple Intelligence supports Korean (and other locales), you can use the on-device model's multilingual capabilities so the experience feels consistent and native. To build this consistent experience, you first need to understand which languages are available on the current device and how to present them to your users or automatically detect the user's language and respond in that language. ## Querying Available Languages On the system model class, there exists a property `supportedLanguages` that you can query to get the supported locales. As Apple Intelligence adds more languages later this year, **always check at runtime instead of hardcoding the list**. ```swift import FoundationModels let model = SystemLanguageModel.default let supported = model.supportedLanguages // [Locale.Language] ``` ## Creating Language Selection Once you have the supported languages, the next step is presenting them to users in a way they can understand. Raw `Locale.Language` objects are not user friendly. You need to convert them to readable text for your audience. ```swift for language in supported { let lang = language.languageCode?.identifier ?? "unknown" let region = language.region?.identifier ?? "—" print("- \(lang) (\(region))") } ``` This prints the language and region codes: ``` - fr (FR) - ko (KR) - en (GB) - de (DE) - zh (CN) ``` Use `Locale.current` to get the current locale. The `localizedString(forLanguageCode:)` and `localizedString(forRegionCode:)` methods return the localized names for the language and region. ```swift for language in supported { let code = language.languageCode?.identifier ?? "" let region = language.region?.identifier ?? "" let name = Locale.current.localizedString(forLanguageCode: code) ?? code let regionName = region.isEmpty ? nil : (Locale.current.localizedString(forRegionCode: region) ?? region) print(regionName != nil ? "- \(name) (\(regionName!))" : "- \(name)") } ``` This prints the language and region names: ``` - French (France) - Spanish (Spain) - Spanish (Latin America) - English (United States) - Portuguese (Brazil) - English (United Kingdom) - Chinese (China mainland) ``` Now that you understand how to query and display available languages, here is the complete set of supported locales and their characteristics. ## Language Support Matrix As of the latest iOS 26.1 release, the given languages are supported by the system language model, with different locale variants. The set can change across releases; so make sure to check at runtime. | Language | Regions | Notes | | --- | --- | --- | | Dutch | NL | Dutch (Netherlands) | | Swedish | SE | Swedish (Sweden) | | Turkish | TR | Turkish (Türkiye) | | Spanish | 419 | Spanish (Latin America) | | Spanish | US | Spanish (United States) | | Spanish | ES | Spanish (Spain) | | Danish | DK | Danish (Denmark) | | Chinese | TW | Chinese (Taiwan) | | Chinese | HK | Chinese (Hong Kong) | | Chinese | CN | Chinese (China mainland) | | Italian | IT | Italian (Italy) | | Japanese | JP | Japanese (Japan) | | Norwegian | NO | Norwegian Bokmål (Norway) | | French | CA | French (Canada) | | French | FR | French (France) | | Portuguese | BR | Portuguese (Brazil) | | Portuguese | PT | Portuguese (Portugal) | | English | US | English (United States) | | English | AU | English (Australia) | | English | GB | English (United Kingdom) | | German | DE | German (Germany) | | Korean | KR | Korean (South Korea) | | Vietnamese | VN | Vietnamese (Vietnam) | Prefer localized names over codes in user interfaces and respect regional formatting for dates, numbers, currency, and units. A good user experience includes persisting the user's choice and providing a clear way to change it. Fall back gracefully when the preferred locale is unavailable, defaulting to English. According to Apple's research paper on the foundation models, quality varies by language and is generally worse for non-English languages. Evaluate your features with your target locales. ## Generating Responses in Multiple Languages With the foundation of language detection and display in place, you can now focus on the core functionality: generating responses in the user's preferred language. If your feature needs to respond in different languages, you prompt in that language and the model responds accordingly. The snippet below reuses a single session and iterates through a set of localized prompts. It prints the language label and the response line so you can see how output looks per locale. This is useful when you want to sanity-check phrasing and tone, or when you need to capture short, deterministic answers (for example, labels, confirmations, or brief facts). You can replace the sample prompts with your app's real sentences to preview how the model will phrase them. ```swift import FoundationModels struct LanguagePrompt { let name: String let text: String } let session = LanguageModelSession(model: SystemLanguageModel.default) let prompts: [LanguagePrompt] = [ .init(name: "English", text: "What is the capital of France? Please provide a brief answer."), .init(name: "Spanish", text: "¿Cuál es la capital de España? Por favor, proporciona una respuesta breve."), .init(name: "French", text: "Quelle est la capitale de l'Allemagne ? Veuillez donner une réponse brève."), .init(name: "German", text: "Was ist die Hauptstadt von Italien? Bitte geben Sie eine kurze Antwort."), .init(name: "Italian", text: "Qual è la capitale del Portogallo? Per favore, fornisci una risposta breve."), .init(name: "Portuguese", text: "Qual é a capital do Brasil? Por favor, forneça uma resposta breve."), .init(name: "Chinese", text: "中国的首都是什么?请简要回答。"), .init(name: "Japanese", text: "日本の首都は何ですか?簡潔にお答えください。"), .init(name: "Korean", text: "한국의 수도는 어디인가요? 간단히 답해주세요.") ] for prompt in prompts { do { let response = try await session.respond(to: prompt.text) print("\(prompt.name): \(response.content)") } catch { print("\(prompt.name): Error -> \(error.localizedDescription)") } } ``` Running the example produces language-appropriate answers: ``` English: The capital of France is Paris. Spanish: La capital de España es Madrid. French: La capitale de l'Allemagne est Berlin. German: Die Hauptstadt Italiens ist Rom. Italian: La capitale del Portogallo è Lisbona. Portuguese: A capital do Brasil é Brasília. Chinese: 中国的首都是北京。 Japanese: 日本の首都は東京です。 Korean: 한국의 수도는 서울입니다. ``` For production features, apply the same pattern to your actual sentences and system instructions so that your app generates responses in the user's language. Keep prompts natural to the target language and avoid mixed-language sentences unless you specifically want bilingual output. If you need a consistent tone or style, add brief instructions when creating the session (for example, "Use concise, neutral language"). ## Handling Multilingual Scenarios Beyond basic language-specific responses, apps that support multiple languages face additional challenges. One of the most important considerations when implementing multilingual features is understanding how session management affects language handling. The behavior differs significantly between fresh sessions and persistent sessions when dealing with mixed-language input. ### The Fresh Session Problem When you create a new session for each multilingual interaction, Foundation Models can exhibit problematic behaviors: - **Safety Trigger False Positives**: Words that are innocent in one language may trigger safety mechanisms when misinterpreted. For example, the French word "pain" (meaning bread) can trigger drug-related safety warnings when processed in a fresh session without proper context. - **Generic Template Responses**: Mixed-language input in fresh sessions often results in generic, unhelpful responses that default to suggesting users check Apple's website rather than engaging naturally with the content. - **Context Loss**: Without conversational context, the model struggles to understand the intent behind code-switching (natural mixing of languages within a conversation). ### The Persistent Session Solution Using a single session for multilingual conversations yields significantly better results: ```swift import FoundationModels // Create a single session for the multilingual conversation let session = LanguageModelSession(model: SystemLanguageModel.default) do { // English interaction let english = try await session.respond(to: "Hello, how are you?") print("English: \(english.content)") // Switch to Spanish in the same session let spanish = try await session.respond(to: "Hola, ¿cómo estás?") print("Spanish: \(spanish.content)") // Ask to switch back to English let switchBack = try await session.respond(to: "Now answer in English please") print("Switch request: \(switchBack.content)") // Test context retention let memory = try await session.respond(to: "What language did I first speak to you in?") print("Memory test: \(memory.content)") } catch { print("Error: \(error)") } ``` Here is the output: ``` English: Hello! I am just a program, so I do not have feelings, but I am here and ready to help you. How can I assist you today? Spanish: Hola, estoy bien, gracias. ¿En qué puedo ayudarte hoy? Switch request: Of course! I am doing well, thank you. How can I assist you today? Memory test: You initially spoke to me in Spanish. ``` This approach produces natural and appropriate responses and successfully maintains conversational memory across language switches. ### Code-Switching and Mixed-Language Input When users naturally mix languages (code-switching), session strategy becomes even more important: ```swift let mixedLanguagePrompts = [ "Hello, mi nombre es Juan. How are you today?", "I went to the marché yesterday to buy some pain.", // French: market, bread "Das ist very interesting, nicht wahr?", // German-English mix "Estoy muy tired después de working todo el día." // Spanish-English mix ] ``` **In fresh sessions**, these prompts often result in safety warnings for innocent words and generic "check the Apple website" responses, along with an inability to handle natural bilingual communication. **In persistent sessions**, the same prompts generate natural, contextually appropriate responses with proper understanding of mixed-language intent and better bilingual conversation flow. ### Best Practices - **Use Persistent Sessions**: Create one session per conversation or user interaction, not per message. - **Provide Language Context**: When possible, establish the primary language early in the conversation: ```swift let session = LanguageModelSession( model: SystemLanguageModel.default, instructions: "Please respond primarily in Spanish, but understand mixed Spanish-English input." ) let english = try await session.respond(to: "Hello, how are you?") ``` - **Handle Code-Switching Gracefully**: Design your app to expect and handle natural language mixing, especially in multilingual communities. - **Test Mixed-Language Scenarios**: Always test your multilingual features with realistic mixed-language input that your users might actually provide. - **Understand Regional Differences**: The same language can behave differently across regions (es-ES vs es-419 vs es-US), and session management helps maintain consistency. ## Physiqa Example: Production Language Detection and Localization The Zenther fitness app demonstrates a complete language-aware AI implementation for nutrition tracking. This real-world example shows automatic language detection, forced response language, and debugging patterns for multilingual AI features. ### Automatic Language Detection with Fallback The nutrition analysis service automatically detects the user's language and configures AI responses accordingly: ```swift // From MacroPlanner.swift:31-55, 57-117 struct NutritionAnalysisService { private let nutritionSession: LanguageModelSession private let userLanguage: String // PFIGSCJK language mapping (Portuguese, French, Italian, German, Spanish, Chinese Simplified, Japanese, Korean) private static let languageMapping: [String: String] = [ "pt": "Portuguese", "fr": "French", "it": "Italian", "de": "German", "es": "Spanish", "zh": "Chinese (Simplified)", "ja": "Japanese", "ko": "Korean", "en": "English" ] init() { // Get user's current locale and language let userLocale = Locale.autoupdatingCurrent let languageCode = userLocale.language.languageCode?.identifier ?? "en" let localeIdentifier = userLocale.identifier // Print locale information for debugging print("🌍 User Locale: \(localeIdentifier)") print("🗣️ Language Code: \(languageCode)") self.userLanguage = Self.languageMapping[languageCode] ?? "English" print("🌐 Responding in: \(userLanguage)") self.nutritionSession = LanguageModelSession(instructions: """ You are a nutrition expert specializing in food analysis and macro tracking. IMPORTANT: Respond in \(userLanguage). All your responses must be in the user's language: \(userLanguage) When parsing food descriptions: - Estimate realistic portions for typical adults - Consider cooking methods (grilled vs fried affects calories) - Account for common additions (butter, oil, condiments) - Be practical with portion sizes people actually eat - Round to reasonable numbers (do not say 247.3 calories, say ~250) For nutritional guidance: - Focus on energy for fitness and performance - Be encouraging and supportive like a fitness coach - Highlight good nutritional choices - Suggest balance when needed - Keep responses brief and actionable Tone: Supportive, knowledgeable, practical, encouraging. Language: \(userLanguage) """) } } ``` ### Forced Response Language in Prompts The app ensures consistent language responses by reinforcing language requirements in every prompt: ```swift func parseFood(_ description: String) async throws -> ParsedFood { let prompt = """ RESPOND IN \(userLanguage). Parse this food description into nutritional data: "\(description)" Examples of good parsing: "I had 2 scrambled eggs with toast" → Consider: 2 large eggs (~140 cal), 1 slice toast (~80 cal), cooking butter (~30 cal) "protein shake after workout" → Consider: 1 scoop protein powder (~120 cal) + milk/water "pizza slice for lunch" → Consider: 1 slice medium pizza (~280 cal) "handful of almonds" → Consider: ~20 almonds (~160 cal) Be realistic about portions people actually eat. Account for cooking methods and common additions. Language: \(userLanguage) """ let response = try await nutritionSession.respond( to: prompt, generating: NutritionParseResult.self ) return ParsedFood( foodName: response.content.foodName, calories: response.content.calories, proteinGrams: response.content.proteinGrams, carbsGrams: response.content.carbsGrams, fatGrams: response.content.fatGrams ) } ``` ### Language Mapping Strategy and Logging The app includes detailed logging to debug language detection issues and ensure proper AI responses: 1. **Automatic Detection**: Uses `Locale.autoupdatingCurrent` to detect user's system language 2. **Explicit Mapping**: Maps language codes to full language names for clearer AI instructions 3. **Fallback Strategy**: Defaults to English for unsupported languages 4. **Debug Logging**: Prints locale information for troubleshooting 5. **Double Enforcement**: States language requirements in both session instructions and individual prompts This approach ensures that nutrition analysis works correctly across all supported languages while providing clear debugging information for language-related issues. The double enforcement (session + prompt level) prevents the AI from reverting to English even when processing mixed-language food descriptions. ## What's Next While persistent sessions are generally better for multilingual interactions, use fresh sessions when you need to completely reset context between unrelated conversations, when previous context might interfere with current requests, or when building batch processing workflows where each item should be independent. The final chapter explores training custom adapters that can be fine-tuned for specific domains while maintaining the multilingual capabilities you have established here. You will discover how to combine the language awareness patterns from this chapter with specialized model training to create AI features that are both culturally appropriate and domain-specific. --- ## Chapter 12: Training Custom Adapters Canonical URL: https://www.rudrank.ai/foundation-models/training-custom-adapters Explore Apple's adapter training flow and where custom adapters fit into app-specific AI. On-device Foundation Models are capable, but sometimes they are not specific enough for your use case. You may need the model to understand your app's domain, follow particular formatting rules, or adopt a consistent voice that matches your app's personality. Custom adapters **specialize** behavior without requiring you to retrain the full model. I trained my first adapter on an M4 MacBook Air with 24GB RAM, using a toy dataset of playwriting scripts that is included in the adapter training toolkit. In ~69 minutes of training, the adapter learned to generate perfectly formatted theatrical scenes with consistent XML-style markup. Apple sent me a maxed-out M5 MacBook Pro with 32GB of RAM that helped me rerun the same experiment with some RAM headroom. This chapter covers Apple's custom adapter training process from start to finish, with performance metrics, actual training results, and guidance for training on both resource-constrained and well-provisioned machines. ## Prerequisites and Context This chapter builds on the sessions chapter, streaming and snapshots, structured generation, tool use, safety and best practices, and internationalization. Custom adapters are not directly related to these topics, but an understanding of the base framework is important before you invest your time, energy, and money in specialized training. ## What You Will Learn By the end of this chapter, you will be able to: - Determine when custom adapters are worth the investment over advanced prompting - Understand memory usage, training time, and practical performance metrics - Prepare training datasets that capture your specific domain or style - Train adapters using Apple's toolkit with real hyperparameter guidance - Export and integrate custom adapters into your Foundation Models apps - Evaluate adapter performance with concrete before-and-after examples - Plan for production deployment with version management and asset delivery ## Understanding Foundation Models Adapters Custom adapters are not full model retraining. They are small, specialized layers that modify how the base Foundation Models behave for a specific use case. Apple uses LoRA (Low-Rank Adaptation) where the original model weights stay frozen and only small adapter matrices are trained. This approach has several advantages: - Faster training where you can train in hours instead of days or weeks - Lower memory requirements as you can train on consumer hardware with optimizations - Smaller file sizes as each adapter is around 160MB - Multiple adapters where you can train different adapters for different tasks The trade-off is that adapters are tied to specific Foundation Models versions. When Apple updates the base model with an OS release, you **must** retrain your adapters. This is a significant consideration when deciding whether adapters are right for your app. > Each adapter is compatible with a single base system model version. If the adapter version does not match the runtime base model version on a person’s device, the framework raises a runtime error and the adapter cannot load. ## When to Consider Custom Adapters Before committing to adapter training, ask: "Can you solve this with better prompting or tools?" Most of the time, the answer is yes. The base Foundation Models are good enough when you provide strong instructions and relevant context through tools. Adapters are worth the investment when you have specialized domain knowledge like medical terminology, legal concepts, or technical jargon that the base model struggles with, or other domain-specific knowledge that the base model does not know about. You may also need the latest information or data that the base model does not have access to without internet access. Another reason to consider adapters is consistent style requirements such as when your app needs responses in a specific format or voice that wastes too many tokens to achieve reliably with prompt engineering. You can also train adapters for repetitive tasks such as when you are using the same lengthy prompts repeatedly, an adapter can reduce latency and token usage, so you can fill more of the user's prompt into the model's limited context window. A useful rule of thumb: if you are writing 500+ word prompts to achieve consistent behavior, an adapter might be more efficient. ## Training on Resource-Constrained Hardware The official toolkit documentation recommends 32GB+ for adapter training. I successfully trained on a MacBook Air M4 with 24GB unified memory, but I do not recommend it. The machine was swapping memory constantly, making the training process painfully slow. A fanless machine is also not ideal for training as it gets very hot. Here are some numbers during training on the toy dataset: - **Base model**: 13GB - **System/PyTorch overhead**: 5GB - **Total Python process**: 25-27GB (peak) - **Swap usage**: 7GB ### Training Time (Toy Dataset) - **Epoch 1**: 37 minutes 35 seconds (84 training batches) - **Epoch 1 Evaluation**: 2 minutes 34 seconds (36 validation batches) - **Epoch 2**: 32 minutes 37 seconds (batch size reductions helped) - **Epoch 2 Evaluation**: 3 minutes 20 seconds (more thorough) - **Total**: ~77 minutes for 2 epochs This was with batch-size 1 and activation checkpointing enabled, purposefully optimized for a 24GB machine. ## Scaling Up with the M5 MacBook Pro I was able to train the adapter on a machine with more resources. Here are the numbers during training on the toy dataset: ```text Epoch 1/2 Training: 21/21 batches in 13m50s, loss 0.859 Evaluation: 9/9 batches in 2m04s, loss 0.605 Epoch 2/2 Training: 21/21 batches in 8m47s, loss 0.234 Evaluation: 9/9 batches in 4m55s, loss 0.606 ``` Training loss fell 73% across the two epochs (0.859 -> 0.234) with a relatively flat evaluation curve at ~0.606, mirroring the M4 experiment while completing in 23 minutes total versus ~57 minutes on the M4. ## Setting Up the Training Environment Apple provides an adapter training toolkit available from the Developer portal. Here is how to set up the environment: ### Environment Setup ```bash # 1. Download toolkit from Apple Developer portal # 2. Create Python virtual environment python3 -m venv adapter_env source adapter_env/bin/activate # 3. Navigate to toolkit cd /path/to/adapter_training_toolkit_v26_0_0 # 4. Install dependencies pip install -r requirements.txt # 5. Test basic generation (verifies setup) python -m examples.generate --prompt "Hello, world!" --max-new-tokens 20 ``` ## Training Your First Adapter For this chapter, I trained an adapter on the toy dataset included in the toolkit, which contains 81 playwriting training examples split between training and evaluation sets. The goal is to specialize the base model to generate consistently formatted theatrical scripts. ### The Dataset The toy dataset contains examples like the following: ```jsonl [{"role": "system", "content": "A conversation between a user and a helpful assistant. Taking the role as a play writer assistant for a kids' play."}, {"role": "user", "content": "generate a script about The Friendly Dolphin"}, {"role": "assistant", "content": "
Act One
\n\n
Scene 1
\n\nThe ocean waves. Day. SUNLIGHT dances on the water's surface.\n\n
DOLPHIN
\n\nLet us play a game of tag in the water!\n\n
SEAGULL
\n\nI love playing tag! But watch out for the waves!\n\nDOLPHIN leaps through the air, playfully splashing SEAGULL.\n\n
SEAGULL
\n\nYou are a great friend, Dolphin!"}] ``` The key point: this dataset is *small* (81 total examples), but highly consistent—every response follows the same XML-like theatrical markup: - `
` tags for character names - `` tags for stage directions - `` tags for dialogue ### Training Configuration I used the [adapter-studio](https://github.com/rudrankriyam/Foundation-Models-Adapter-Studio) wrapper to orchestrate the training process: ```bash adapter-studio train-adapter \ --demo \ --epochs 2 \ --batch-size 1 \ --activation-checkpointing \ --learning-rate 1e-3 \ --warmup-epochs 1 ``` Here is what each setting does: - **`--demo`**: Uses the included toy dataset automatically - **`--epochs 2`**: Enough training to see specialization without overfitting - **`--batch-size 1`**: Important for 24GB machines, processes one example at a time - **`--activation-checkpointing`**: Recomputes activations instead of storing them (trades compute for memory) - **`--learning-rate 1e-3`**: Standard rate for adapter fine-tuning, does not dramatically change base model behavior - **`--warmup-epochs 1`**: Gradually increases LR in epoch 1 to stabilize training ### Training Results Here are the training results during the first epoch: ``` Epoch 1/2 Training: 100%|==| 84/84 [37:35<00:00, 26.85s/it, loss=0.751] Evaluation: 100%|==| 36/36 [02:34<00:00, 4.29s/it, loss=0.604] Epoch 2/2 Training: 100%|==| 84/84 [32:37<00:00, 23.30s/it, loss=0.283] Evaluation: 100%|==| 36/36 [03:20<00:00, 5.57s/it, loss=0.571] ``` Here is what this means: - Training loss dropped 62% (0.751 -> 0.283) - Validation loss landed slightly higher at 0.571, staying near the training curve and showing only a mild gap on this small dataset - The adapter generated consistently structured theatrical markup in evaluation runs, indicating it learned the format instead of memorizing specific examples The adapter checkpoints are saved to: ``` /path/to/toolkit/checkpoints/demo_20251022_161532/ ├── adapter_epoch_1.pt ├── adapter_epoch_2.pt (best one) └── adapter-final.pt (ready to use) ``` ## Testing Your Adapter: Before and After To see the difference, compare the same prompt before and after adapter training. For the prompt "Write a script about a friendly robot": ### Base Model Response ``` **Title: Andy: A Friendly Robot Adventure** **INT. SMALL APARTMENT - DAY** *(The scene opens in a cozy, cluttered apartment bustling with youthful energy. A friendly robot named ANDY, with rounded silver panels... ``` Notice: Generic screenplay format with asterisks for stage directions. No consistent markup. The model defaulted to what it "knows" about screenplays in general. ### Adapter-Trained Response ``` Response:
Scene 1
A manufacturing plant. Robots assemble products.
ROBOT
I am known for my friendly demeanor and efficient operations.
AUDIENCE MEMBER
Such a helpful machine! ``` Notice: Perfectly formatted with consistent XML-style tags. The adapter learned the exact structure from training data and applies it reliably. ## Using the Adapter Studio CLI The adapter-studio CLI simplifies the training workflow by handling subcommands, config management, and validation. Instead of using raw Python commands, the CLI makes the entire workflow simpler. You still need to download the toolkit and configure the path. Download link: https://developer.apple.com/download/foundation-models-adapter/ Here are the steps to use the adapter-studio CLI and train the adapter in general: 1. Initialize the toolkit 2. Test the base model 3. Train the adapter 4. Test the trained adapter 5. Export the adapter 6. Train the draft model (optional) 7. Test the draft model (optional) 8. Export the draft model (optional) ```bash # Initialize (download toolkit, set up venv, etc.) adapter-studio init # Configure toolkit path adapter-studio setup # Create Python venv, install deps # Test base model adapter-studio demo --prompt "test prompt" # Train adapter adapter-studio train-adapter --demo --epochs 2 --batch-size 1 --activation-checkpointing # Test trained adapter adapter-studio generate \ --prompt "Write a script about a friendly robot" \ --checkpoint /path/to/adapter-final.pt # Export to .fmadapter adapter-studio export \ --adapter-name playwriting \ --checkpoint /path/to/adapter-final.pt \ --output-dir ./exports/ \ --author "Your Name" \ --description "Trained to generate theatrical scripts in XML-style markup" # Optional: Train draft model for inference speedup adapter-studio train-draft \ --checkpoint /path/to/adapter-final.pt \ --train-data toy_dataset/playwriting_train.jsonl \ --eval-data toy_dataset/playwriting_valid.jsonl ``` ### Python Version for Export Export requires **Python 3.12** or earlier. The `coremltools` library does not have binary wheels for Python 3.13+ on macOS arm64, causing `ModuleNotFoundError: No module named 'coremltools.libmilstoragepython'`. Create the toolkit venv with Python 3.12 specifically: ```bash # Ensure you have Python 3.12 brew install python@3.12 # Create venv with Python 3.12 python3.12 -m venv /path/to/toolkit/venv source /path/to/toolkit/venv/bin/activate pip install -r requirements.txt --prefer-binary ``` This does not affect your system Python (which can stay at 3.13 or later). The toolkit venv is isolated. ## Exporting the Adapter The output adapter is of the format `.fmadapter`, which is a self-contained package ready to use in Xcode: ``` Exporting adapter to .fmadapter format... ... Adapter saved at /Users/rudrankriyam/Downloads/adapter_exports/playwriting.fmadapter. Adapter exported successfully to: /Users/rudrankriyam/Downloads/adapter_exports/playwriting.fmadapter ``` The `.fmadapter` package contains: ``` playwriting.fmadapter/ ├── adapter_weights.bin (127MB - trained adapter weights) └── metadata.json (adapter metadata: author, description, etc.) ``` ### Shipping and App Size Each adapter occupies roughly 160 MB on disk. Do not ship adapters in your main app bundle. As soon as you include multiple adapters or versions, your binary size grows quickly, and users may choose not to install or update your app. Treat adapters like other large, optional assets: - Host adapters on your server or CDN. - Use the Background Assets framework to download exactly one adapter that is compatible with the user’s device and OS. - Keep versioning separate from your app release so you can rotate or revoke adapters without a full app update. ## Entitlements and Device Support Production apps require the entitlement `com.apple.developer.foundation-model-adapter` to enable custom adapters on device. However, you do not need this entitlement to train or to locally preview adapters in Xcode. While Foundation Models can run in simulator and even in SwiftUI previews, testing adapters requires a physical device. ## Adapter Studio: Side-by-Side Evaluation on macOS I wrote an open-source macOS app called [Adapter Studio](https://github.com/rudrankriyam/Foundation-Models-Adapter-Studio) that allows you to compare adapters against the baseline model side by side. It is a simple app that lets you import the `.fmadapter` package and compare the responses side by side. You can run the app from Xcode or build it from the source code. Here are the features: - Run live comparisons by entering one prompt and watching both the system model and adapter responses stream in parallel with timing metrics - Inspect adapter context by reviewing file metadata, swapping adapters, or revealing them in Finder without leaving the app - Measure latency by tracking time-to-first-token and total duration so regressions surface immediately ## Loading Adapters in Your App For local testing, keep your `.fmadapter` outside the project directory. In Finder, select the file and press Option + Command + C to copy its absolute path, then initialize with a file URL: ```swift import FoundationModels let localURL = URL(filePath: "/absolute/path/to/my_adapter.fmadapter") let adapter = try SystemLanguageModel.Adapter(fileURL: localURL) let model = SystemLanguageModel(adapter: adapter, guardrails: .default) guard SystemLanguageModel.default.isAvailable else { // Fallback to a non-adapted model or a user message throw NSError(domain: "Adapter", code: 1) } let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Your prompt here") ``` There are two ways to load adapters in your app: ### Local Testing (file URL) Use the file-based initializer only for local validation. Remove local adapter references before publishing your app. ### Production Loading (Background Assets) In production, initialize by name and let the system download a compatible adapter asset on demand. Remove obsolete adapters at launch, check availability, and provide a fallback when unavailable: ```swift import BackgroundAssets import FoundationModels // Reclaim space and avoid mismatched versions try SystemLanguageModel.Adapter.removeObsoleteAdapters() // Initialize by base name (no extension). If no adapter is present, // the system begins a background download of a compatible asset pack. let adapter = try SystemLanguageModel.Adapter(name: "playwriting_adapter") // Optionally, track download status and update UI before prompting. let model = SystemLanguageModel(adapter: adapter, guardrails: .default) guard SystemLanguageModel.default.isAvailable else { // Fallback to base model or defer the feature let fallback = LanguageModelSession(model: SystemLanguageModel.default) // Inform the user or queue the request throw NSError(domain: "Adapter", code: 2) } let session = LanguageModelSession(model: model) let response = try await session.respond(to: "Write a script about a friendly robot") ``` ### Background Assets Quickstart To download adapters at runtime, you need to add a Background Assets downloader extension in Xcode. You need to choose the hosting type: - Apple-Hosted, Managed: `BAHasManagedAssetPacks=YES`, `BAAppGroupID`, `BAUsesAppleHosting=YES` - Self-Hosted, Managed: `BAHasManagedAssetPacks=YES`, `BAAppGroupID` Then, in your downloader, only allow compatible adapter packs: ```swift func shouldDownload(_ assetPack: AssetPack) -> Bool { // Filter other assets if needed, then gate on compatibility return SystemLanguageModel.Adapter.isCompatible(assetPack) } ``` Finally, to drive your loading UI, track status updates for the compatible adapter identifier and wait for `.finished` before prompting: ```swift func checkAdapterDownload(name: String) async -> Bool { let ids = SystemLanguageModel.Adapter.compatibleAdapterIdentifiers(name: name) guard let id = ids.first else { return false } for await status in AssetPackManager.shared.statusUpdates(forAssetPackWithID: id) { switch status { case .finished(_): return true case .failed(_, _): return false default: break } } return false } ``` ## Compile the Draft Model (optional) If your adapter includes a draft model for speculative decoding, compile it to speed up inference. Schedule compilation with Background Tasks, and expect rate limiting (three compilations per app per day on iOS, iPadOS, tvOS, and visionOS; no limit on macOS): ```swift do { try await adapter.compile() } catch { // Handle compilation error and continue with uncompiled adapter if necessary } ``` ## Locale and Language Support You can also gate adapter selection by language or locale. Query `SystemLanguageModel.default.supportedLanguages` and `SystemLanguageModel.default.supportsLocale(_:)` before prompting, and prefer a locale-appropriate adapter when you host variants. This aligns with the internationalization guidance from the internationalization chapter. ## Integration Checklist Here is a checklist for integration: - Request the `com.apple.developer.foundation-model-adapter` entitlement - Package adapters as Background Assets; do not ship adapters in the main app bundle - Add a downloader extension and required Info.plist keys - Call `SystemLanguageModel.Adapter.removeObsoleteAdapters()` at launch to reclaim space and avoid mismatched versions - Initialize adapters by name and track download status before prompting - Use guardrails and availability checks; provide a non-adapter fallback - Optionally compile the draft model in a background task to speed up inference ## What's Next You now understand how to build and specialize Foundation Models with custom adapters. Start with the base model, refine with prompting and tools, then train adapters for genuinely domain-specific behavior that cannot be achieved through instruction alone. Adapters remain tied to Foundation Models versions, so plan for retraining whenever iOS updates introduce a new base model.