Foundation Models Guide / Chapter 2

Getting Started with Sessions

This chapter introduces sessions, the core building block of every Foundation Models interaction.

Prerequisites and Context

This chapter builds on the Foundation Models introduction from the previous chapter. You should understand what Foundation Models can do, how guided generation works conceptually, and when to choose Foundation Models over MLX Swift. Now you move from theory to practice, focusing on on-device generation using the Neural Engine.

What You Will Learn

By the end of this chapter, you will be able to:

Check Foundation Models availability and handle different states
Create and configure language model sessions
Distinguish between single-turn and multi-turn interactions
Write effective instructions and prompts
Handle common errors gracefully
Optimize session performance with prewarming
Build a simple chat interface

Project Setup

Apple Intelligence is built into the OS, so there is no external package to add like in the case of MLX Swift. Import the new FoundationModels framework:

import FoundationModels

That is it. The model is already on the user’s device if Apple Intelligence is enabled and downloaded.

Checking Availability

Foundation Models only works if Apple Intelligence is enabled. You cannot assume it is available, so you should always check that first.

Understanding the Availability States

The SystemLanguageModel.default.availability returns one of these states:

.available - You are good to go
.unavailable(.deviceNotEligible) - Device does not support Apple Intelligence (iPhone 15 and below, Intel Macs, older iPads)
.unavailable(.appleIntelligenceNotEnabled) - User has not enabled Apple Intelligence in Settings
.unavailable(.modelNotReady) - Model is downloading or system conditions are not met

The model might also be unavailable due to battery level, Game Mode, or if the device becomes too warm.

Building UI for Each State

Here is an availability checker with UI that you can prefill in your app:

struct FoundationModelsAvailabilityView: View {
    private let model = SystemLanguageModel.default

    var body: some View {
        VStack(spacing: 20) {
            switch model.availability {
            case .available:
                AvailableStateView()

            case .unavailable(.deviceNotEligible):
                DeviceNotEligibleView()

            case .unavailable(.appleIntelligenceNotEnabled):
                AppleIntelligenceDisabledView()

            case .unavailable(.modelNotReady):
                ModelNotReadyView()

            case .unavailable(let other):
                UnknownUnavailableView(reason: other)
            }
        }
        .padding()
    }
}

This gives you UI components for every availability state that you can adapt according to your app’s design.

Exploring with Playgrounds

Before building a full interface, it is helpful to experiment with prompts and instructions directly in the playground. The #Playground macro provides a live-updating environment, similar to SwiftUI Previews, where you can see model responses in real time.

import FoundationModels
import Playgrounds

#Playground {
    let session = LanguageModelSession()
    let response = try await session.respond(to: "Why is the sky blue?")
}

The response appears instantly in the preview canvas on the right. You can expand the result to see details like the generated content and the request duration. For example, for the above query, the response is:

The sky appears blue primarily due to a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it is made up of different colors, each with its own wavelength. Blue light has a shorter wavelength and is scattered in all directions by the gases and particles in the atmosphere more than other colors.

As a result, when we look up, we see the predominantly blue light scattered in all directions, giving the sky its blue appearance. This effect is more pronounced when the sun is lower in the sky, as the sunlight has to pass through more atmosphere, scattering even more blue light.

It took 2.1 seconds to generate on an M4 MacBook Air.

You can use this to quickly test different instructions, prompts, and generation options throughout this chapter.

Your First Session

Foundation Models is built around sessions. You can treat a session as a conversation that carries the context of previous interactions.

Single-Turn vs Multi-Turn Sessions

Foundation Models supports two types of interactions:

Single-turn: Create a new session for each request. Good for independent tasks like generating titles or extracting data:

// Fresh sessions avoid conversation context for independent tasks
let session = LanguageModelSession()
let response = try await session.respond(to: "Generate a title for a travel blog")
print(response.content)

Here is a sample response:

"Wanderlust Chronicles: Exploring the World One Adventure at a Time"

Multi-turn: Reuse the same session to maintain conversation context. Perfect for chat interfaces:

// Persistent sessions maintain context across multiple interactions
let session = LanguageModelSession()

let response1 = try await session.respond(to: "I'm planning a trip to Japan")
let response2 = try await session.respond(to: "What should I pack?") // Context from previous Japan question influences packing suggestions

print(response2)

The second response incorporates context from the first, with packing suggestions tailored for Japan:

Response<String>(userPrompt: "What should I pack?", duration: 24.287179583, content: "Packing for a trip to Japan requires considering the season, cultural norms, and activities you plan to engage in. Here is a general packing list to help you prepare...

Simple Session Example

Here is the simplest possible chat interface:

struct BasicChatView: View {
    @State private var session: LanguageModelSession?
    @State private var prompt = ""
    @State private var response = ""
    @State private var isGenerating = false
    
    var body: some View {
        VStack(spacing: 20) {
            TextField("Ask me anything...", text: $prompt)
                .textFieldStyle(RoundedBorderTextFieldStyle())
            
            Button("Generate") {
                Task { await generateResponse() }
            }
            .disabled(isGenerating || session == nil)
            
            if !response.isEmpty {
                Text(response)
                    .padding()
                    .background(Color.gray.opacity(0.1))
                    .cornerRadius(8)
            }
        }
        .padding()
        .task {
            await setupSession()
        }
    }
    
    private func setupSession() async {
        guard SystemLanguageModel.default.availability == .available else { return }
        
        session = LanguageModelSession(
            instructions: Instructions("You are a helpful assistant. Keep responses concise.")
        )
    }
    
    private func generateResponse() async {
        guard let session = session else { return }
        
        isGenerating = true
        
        do {
            let result = try await session.respond(to: prompt)
            response = result.content
        } catch {
            response = "Error: (error.localizedDescription)"
        }
        
        isGenerating = false
    }
}

Physiqa Example: Workout Assistant Session

The Zenther fitness app demonstrates practical session management for workout guidance and logging. This real-world implementation shows session setup, streaming responses, and dynamic instruction updates.

Core Session Management

@Observable
final class ChatViewModel {
    var isLoading: Bool = false
    var sessionCount: Int = 1
    var instructions: String = "You are a fitness AI assistant specializing in workout guidance, exercise form, nutrition advice, and health tracking. Help users log their workouts, plan training sessions, and achieve their fitness goals."
    
    private(set) var session: LanguageModelSession

    init(subscriptionService: SubscriptionStatusService) {
        self.subscriptionService = subscriptionService
        self.session = LanguageModelSession(
            instructions: Instructions(
                "You are a fitness AI assistant specializing in workout guidance, exercise form, nutrition advice, and health tracking. Help users log their workouts, plan training sessions, and achieve their fitness goals."
            )
        )
    }
}

Streaming Message Handling

The workout assistant uses streaming responses to provide real-time feedback during exercise logging:

@MainActor
func sendMessage(_ content: String) async {
    isLoading = session.isResponding

    do {
        // Stream response from current session
        let responseStream = session.streamResponse(to: Prompt(content))

        for try await _ in responseStream {
            // The streaming automatically updates the session transcript
        }

    } catch {
        // Handle other errors by showing an error message
        errorMessage = handleFoundationModelsError(error)
        showError = true
    }
    
    isLoading = session.isResponding
}

Session Lifecycle Management

The Zenther app provides methods to clear conversation history and update instructions dynamically:

@MainActor
func clearChat() {
    sessionCount = 1
    session = LanguageModelSession(
        instructions: Instructions(instructions)
    )
}

@MainActor
func updateInstructions(_ newInstructions: String) {
    instructions = newInstructions
    // Create a new session with updated instructions
    session = LanguageModelSession(
        instructions: Instructions(instructions)
    )
}

This pattern allows users to reset their conversation context or switch between different AI personas (like switching from workout planning to nutrition guidance) without restarting the app.

Instructions vs Prompts

Foundation Models distinguishes between instructions and prompts to provide better security and control over AI behavior.

Instructions

Tell the model who it is and how to behave. These persist for the entire session and take priority over prompts:

// Instructions shape the AI's behavior and personality throughout the session
let instructions = """
You are a helpful writing assistant that helps users improve their content.
Focus on clarity, tone, and structure.
Provide specific suggestions for improvement.
Keep responses concise and actionable.
"""

let session = LanguageModelSession(instructions: Instructions(instructions))

Prompts

Individual questions or requests from the user:

// Prompts contain user's actual requests and questions
let prompt = "Improve this email draft: (emailText)"

try await session.respond(to: prompt)

Prompt Engineering Best Practices

Use these simple methods to get the best results from the model. Since the model is smaller and optimized for specific tasks rather than general knowledge, targeted prompting improves output quality for app features.

Control Output Length

Be specific about the length you want:

// Unclear constraints often produce verbose, unfocused responses
"Summarize this article"

// Length constraints help fit responses into your UI layout
"Summarize this article in exactly two sentences"

// Combined constraints produce more useful, targeted content
"Create a brief product description under 50 words that highlights key features"

Clear length constraints help the model generate appropriately sized responses for your UI. This prevents layout issues and ensures consistent user experience across different devices. This is also important given the context window, which is shared between input and output. You can query the exact limit with try await SystemLanguageModel.default.contextSize — it defaults to 4096 tokens but may grow in future OS releases.

Specify Roles and Context

Give the model a clear role and context:

let instructions = """
You are a customer service representative for a fitness app.
Be helpful, encouraging, and focus on solving user problems.
Keep responses professional but friendly.
"""

This produces a different output than a generic assistant by providing domain-specific context and tone guidance.

Write Clear Commands

Like other large language models, Apple’s foundation model performs best with clear, specific commands:

// Focused requests produce more accurate results
"Generate five related workout routines for beginners"

// Examples guide the AI toward your desired output format
"Generate five beginner workout routines. Each should be 2-3 words like 'Morning Yoga' or 'Quick Cardio'"

Use Examples in Instructions

Provide a few examples of desired outputs. This helps the model match the desired output format and style:

let instructions = """
You suggest related topics. Examples:

User: "Making homemade bread"
Assistant: 1. Sourdough starter basics 2. Bread flour types 3. Kneading techniques

User: "iOS development"  
Assistant: 1. SwiftUI fundamentals 2. App Store guidelines 3. Xcode debugging

Keep suggestions concise (3-7 words) and naturally related.
"""

Use Strong Commands When Needed

If you observe unwanted output, use firm constraints:

let instructions = """
You are a helpful assistant for children's content.
DO NOT include scary or violent content.
DO NOT mention inappropriate topics.
"""

The model responds reliably to all-caps “DO NOT” constraints.

Understanding the Model’s Capabilities

This is a 3B parameter model optimized for on-device use—in comparison, popular server-based models use hundreds of billions of parameters. Keep the following in mind for better implementation:

Focus on language tasks like summarization, classification, and conversation
Avoid complex reasoning, math calculations, and code generation
Be aware of potential hallucinations for factual content
Take advantage of the full context window (query the size with SystemLanguageModel.default.contextSize) by providing examples and context for best results

Basic Error Handling

Foundation Models can fail for several reasons. Here is how to handle them. You can create custom UI for each error case.

do {
    let result = try await session.respond(to: prompt)
    return result.content
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    return "This conversation is too long. Please start a new session."
} catch LanguageModelSession.GenerationError.guardrailViolation {
    return "I cannot respond to that request." // Content safety system blocked the request
} catch LanguageModelSession.GenerationError.assetsUnavailable {
    return "Foundation Models is temporarily unavailable. Please try again."
} catch LanguageModelSession.GenerationError.concurrentRequests {
    return "Please wait for the current request to finish before starting a new one."
} catch LanguageModelSession.GenerationError.rateLimited {
    return "Too many requests. Please try again later."
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    return "This language is not supported. Please try English or another supported language."
} catch LanguageModelSession.GenerationError.decodingFailure {
    return "Unable to process the response. Please try again."
} catch LanguageModelSession.GenerationError.unsupportedGuide {
    return "Invalid generation parameters. Please check your request format."
} catch LanguageModelSession.GenerationError.refusal(let refusal, _) {
    // Model refused to respond - you can get an explanation
    do {
        let explanation = try await refusal.explanation
        return "The model declined to respond: (explanation.content)"
    } catch {
        return "The model declined to respond to this request."
    }
} catch {
    return "Something went wrong: (error.localizedDescription)"
}

Understanding Specific Error Types

Guardrail Violations: Content safety system blocks unsafe requests. Consider whether the request was user-initiated (show helpful message) or proactive (silently ignore).

Refusal Errors: The model chooses not to respond even when content passes safety checks. Unlike guardrail violations, you can ask the model to explain why it refused.

Concurrent Requests: Sessions can only handle one request at a time. Always check session.isResponding before making new requests.

Rate Limiting: Only occurs when your app runs in the background and exceeds system limits. This rarely occurs in practice, but handle it gracefully.

Session Safety

Important: A session can only handle one request at a time. Calling it while it is busy causes a runtime error. Always check session.isResponding:

guard !session.isResponding else { return }

// Disable UI during generation to prevent concurrent requests
Button("Generate") {
    Task { await generateResponse() }
}
.disabled(session?.isResponding == true)

Performance Optimization

Use session.prewarm() to eagerly load resources and optionally cache prompt prefixes when you anticipate user interaction:

// Basic prewarming - loads model resources into memory
.task {
    await setupSession()
    session?.prewarm() // Call when user interaction is likely within seconds
}

// Prewarming with prompt prefix caching
func prepareForUserInput() {
    // When user starts typing in a text field
    let commonPrefix = Prompt("You are a helpful writing assistant. The user is asking about:")
    session?.prewarm(promptPrefix: commonPrefix)
}

// Smart prewarming based on UI state
.onChange(of: isTextFieldFocused) { focused in
    if focused {
        // User is about to type - prewarm with known context
        session?.prewarm(promptPrefix: sessionInstructions)
    }
}

Prewarming Best Practices

Here are practices for prewarming that you can use as a starting point. Consider prewarming when the user begins typing in a text input field, navigates to an AI-enabled screen, or when your app transitions to foreground with AI features visible.

When using prompt prefix caching, focus on instruction patterns that multiple prompts share, such as session instructions or conversation context. Avoid including user-specific content that will not be reused across requests, as this reduces the effectiveness of the caching mechanism.

Prewarming does not guarantee immediate resource loading and may be less effective when your app runs in the background or when the system is under load.

// Example: Smart prewarming in a chat app
class ChatViewModel: ObservableObject {
    private var session: LanguageModelSession?
    
    func handleUserStartedTyping() {
        // User began typing - prewarm with conversation context
        let contextPrefix = buildConversationContext()
        session?.prewarm(promptPrefix: contextPrefix)
    }
    
    private func buildConversationContext() -> Prompt {
        // Build a prefix from recent conversation history
        let recentMessages = conversationHistory.suffix(3)
        let context = recentMessages.map { "($0.role): ($0.content)" }
            .joined(separator: "\n")
        return Prompt(context)
    }
}

This approach can reduce response latency by preprocessing common prompt patterns before the user submits the actual request.

What’s Next

You now understand how to check availability, create single- and multi-turn sessions, and guide behavior using instructions and prompts. You also know how to handle common errors and optimize for performance. The best way to internalize these concepts is to apply them.

Try building a simple feature using the Foundation Models framework, experiment with prompts in the playground, and adjust sampling parameters to observe how responses change.

With sessions established, the next chapter explores streaming and snapshots - Foundation Models’ approach to building responsive UIs with real-time updates. You will see how to create interfaces that feel alive as the model generates content.