Foundation Models Guide / Chapter 8

Safety and Best Practices

Building AI features brings responsibility as the responses are probabilistic and not always in your control nor accurate all the time. Your users trust you to create experiences that are helpful without being harmful. Foundation Models includes built-in safety measures, but understanding how to use them properly and knowing when you need to add your own layers makes the difference between an AI feature people love to use and one that becomes the main character on Twitter.

This chapter walks through Apple’s safety approach, implementation patterns, and considerations for building an experience that you are proud of.

Prerequisites and Context

This chapter applies safety considerations to all the Foundation Models patterns you have learned - from basic sessions through advanced tool systems. You should understand how to create sessions, handle responses, work with tools, and manage conversation state. The safety patterns here integrate with all previous concepts and provide essential guidance for production deployment.

What You Will Learn

By the end of this chapter, you will be able to:

Understand Apple’s multi-layered safety approach and how to work with it
Recognize and handle different types of generation errors gracefully
Design safe instruction patterns that prioritize user protection
Implement input validation strategies for different risk levels
Use structured generation as a safety mechanism
Apply domain-specific safety considerations for different app categories
Build user trust through transparent AI attribution and appropriate disclaimers

Apple’s Safety Philosophy

Foundation Models inherits its safety principles from Apple Intelligence: privacy-first design with on-device protection. The framework processes everything locally, which helps with privacy, but you still need to think carefully about what your AI features generate and how users might interact with them.

The on-device model includes trained safety guardrails, but they are not perfect. No AI safety system is. Your job is to design experiences that work well within those constraints and handle edge cases gracefully when they inevitably appear.

During the early beta period in summer 2025, the guardrails were quite aggressive and blocked perfectly reasonable requests - at least, that was how many developers felt. With the iOS 26 launch, the system has relaxed somewhat while still protecting users from harmful content.

Understanding Model Limitations

Keep these limits in mind when designing features:

Knowledge gaps

The model has limited world knowledge with a training cutoff around October 2023. Do not rely on it as a source of truth for facts. Instead, embed verified information directly into your prompts or use tool calling with web search when factual accuracy matters.

Mathematical accuracy

Avoid using the model for calculator-like precision or complex mathematical reasoning. Use dedicated logic for calculations outside of the model and let the AI handle the natural language aspects.

Task complexity

Break complex requests into smaller, more manageable pieces. The model performs better with short, concrete requirements than with multi-step reasoning, especially given the limited context window. You can check the exact context size at runtime with SystemLanguageModel.default.contextSize.

Built-in Safety Layers

Foundation Models implements what security experts call a “defense in depth” approach. It has multiple independent safety mechanisms that reduce risk when combined:

Input filtering

This mechanism checks your instructions, prompts, and tool calls for potentially harmful content before they reach the model. This catches problematic requests early in the pipeline.

Output filtering

This mechanism examines model responses before returning them to your app. Even if input filtering misses something, output filtering provides a second chance to catch unsafe content.

Model-level safety

This mechanism includes safety training baked directly into the model weights, making it naturally reluctant to generate harmful content even without explicit filtering.

When any safety mechanism triggers, you receive a GenerationError.guardrailViolation:

do {
    let result = try await session.respond(to: prompt)
    handleSuccessfulResponse(result.content)
} catch LanguageModelSession.GenerationError.guardrailViolation {
    handleGuardrailViolation()
} catch {
    handleOtherError(error)
}

func handleGuardrailViolation() {
    showAlert(
        title: "Cannot Process Request",
        message: "I cannot help with that. Please try something different.",
        actions: [
            .init(title: "Try Again", style: .default),
            .init(title: "Cancel", style: .cancel)
        ]
    )
}

How you handle safety violations depends on whether the user initiated the action. When someone explicitly asks for something that gets blocked, provide clear feedback. Explain briefly that the request cannot be processed and offer alternatives like editing the prompt, choosing from curated options, or cancelling entirely.

Working with Guardrails in Depth

The short version so far is: “there are safety guardrails, you will hit them, handle the errors nicely.” In practice, you will spend a surprising amount of time tuning how those guardrails behave for your specific app.

Configuring Guardrail Modes

SystemLanguageModel lets you tweak how strict the built‑in safety system is:

// Default guardrails: strict, general-purpose
let strictModel = SystemLanguageModel(
    guardrails: .default
)

// More permissive for transformation-style tasks
let permissiveModel = SystemLanguageModel(
    guardrails: .permissiveContentTransformations
)

let strictSession = LanguageModelSession(model: strictModel)
let permissiveSession = LanguageModelSession(model: permissiveModel)

.default: Good starting point for most apps. It blocks prompts and responses that violate system policies and surfaces GenerationError.guardrailViolation.
.permissiveContentTransformations: Designed for scenarios where you are mostly transforming user input (summaries, paraphrases, rephrasings), even when that input might be sensitive. In this mode, string-based generations will not throw guardrailViolation just because the input contains sensitive content, but it may still refuse to answer the request.

The key thing to remember: this does not disable safety. It changes when the system chooses to error versus when it tries to produce a safer transformation of the input.

Note: If you output structured data, permissive content transformations are bypassed and you will get a guardrail violation.

Structured Output with Permissive Mode

The limitation above creates a dilemma: you want the safety benefits of .permissiveContentTransformations for sensitive content, but you also want the type safety and structure of @Generable types. Over the past two months and running thousands of tests, I have found a production-tested workaround that gives you both.

The .permissiveContentTransformations only applies to String output. When you generate a @Generable type directly, the framework falls back to .default behavior regardless of your guardrail setting. This is mentioned in the documentation, and I ran multiple tests to confirm it. But nothing stops you from generating JSON as a string and parsing it yourself!

The Pattern

Instead of using your @Generable type directly for generation, you use it for two things: compile-time type safety and automatic schema generation. Then you generate a String response and parse it manually:

@Generable
struct ContentClassification {
    @Guide(description: "Primary category of the content")
    var category: Category
    
    @Guide(description: "Confidence level from 0.0 to 1.0")
    var confidence: Double
    
    @Guide(description: "Brief explanation of the classification")
    var reasoning: String
}

@Generable
enum Category: String, CaseIterable {
    case safe
    case sensitive
    case educational
    case personal
}

Rather than calling session.respond(to:generating:) with ContentClassification.self, you extract the schema and include it in your prompt:

func classifyContent(_ userInput: String) async throws -> ContentClassification {
    let schema = ContentClassification.generationSchema.debugDescription
    
    let prompt = """
    Classify the following user content.
    Output MUST be exactly one JSON object matching this schema:
    (schema)
    
    Output JSON only. No markdown, no explanation, no backticks.
    
    Content to classify:
    (userInput)
    """
    
    let model = SystemLanguageModel(guardrails: .permissiveContentTransformations)
    let session = LanguageModelSession(model: model)
    let response = try await session.respond(to: Prompt(prompt))
    
    return try parseClassification(from: response.content)
}

The parsing function extracts the JSON and decodes it directly to your @Generable type—no intermediate struct needed since @Generable types automatically conform to Codable:

enum ClassificationError: Error {
    case invalidResponse
}

private func parseClassification(from text: String) throws -> ContentClassification {
    guard let jsonString = extractJSON(from: text),
          let data = jsonString.data(using: .utf8) else {
        throw ClassificationError.invalidResponse
    }
    
    return try JSONDecoder().decode(ContentClassification.self, from: data)
}

private func extractJSON(from text: String) -> String? {
    guard let end = text.lastIndex(of: "}"),
          let start = text[...end].lastIndex(of: "{") else { return nil }
    return String(text[start...end])
}

Foundation Models is remarkably good at generating valid JSON. When you provide an explicit schema in the prompt, the model follows it consistently. The generationSchema.debugDescription provides the JSON schema that the model understands without any additional formatting.

I have used this pattern in the current production app that I am working on for a client, handling sensitive topics ranging from faith, mental health, grief, and difficult emotions. After hundreds of test runs across every input I could think of, the JSON parsing success rate is 100%.

The model respects the schema, produces valid JSON, and the permissive guardrails allow the content through without throwing violations as often as the default guardrails would. But when it does throw a violation, it does so as a refusal to answer the request, rather than an error. Like “I cannot help with that request.” or “I am sorry, but I cannot assist with that request.”

The pattern works because you are not asking the model to do anything unusual. You are asking it to classify or transform content (which permissive mode allows) and output the result as JSON (which the model does naturally). The manual parsing step and the extra tokens for the schema are the trade-off for type safety and permissive behavior.

I advise you to not use this pattern when your content is not sensitive and .default guardrails work fine.

While I have not come across any edge cases where the model wraps the JSON in markdown code fences or adds explanatory text, you may want to handle them. A more robust extraction function handles these cases:

private func extractJSON(from text: String) -> String? {
    var cleaned = text
        .replacingOccurrences(of: "```json", with: "")
        .replacingOccurrences(of: "```", with: "")
        .trimmingCharacters(in: .whitespacesAndNewlines)
    
    guard let end = cleaned.lastIndex(of: "}"),
          let start = cleaned[...end].lastIndex(of: "{") else { return nil }
    return String(cleaned[start...end])
}

Since the model receives the exact schema with the precise enum raw values, it produces matching output. The direct decoding approach keeps your code simple while the explicit prompt instructions ensure the output is valid JSON.

Guardrail Violations and Refusals

When the model says “I cannot assist with that request,” it can mean very different things.

When the model throws a GenerationError.guardrailViolation, it means that the system’s safety checks blocked the request or the response. You should tell the user clearly that you cannot help with that request, retry and urge them to try a different wording or a different task.

For cases where the model passes the safety checks but still chooses not to answer by throwing a GenerationError.refusal(_, _), you can explain the limitation in plain language (“I cannot give medical diagnoses”, “I am not a lawyer”) and offer a hardcoded response instead.

Mitigating False Positives

You will probably hit the “but this should be allowed” moment. The temptation is to search for a big red “disable safety” switch. That switch does not exist—and that is a good thing.

You can reframe the prompt while keeping the meaning:

func softenSensitiveLanguage(_ text: String) -> String {
    text
        .replacingOccurrences(of: "mortal sin", with: "serious moral matter")
        .replacingOccurrences(of: "hell", with: "eternal separation from God")
        .replacingOccurrences(of: "damnation", with: "spiritual consequence")
        .replacingOccurrences(of: "sexual", with: "intimate")
}

func respondWithReframing(_ query: String) async -> String {
    do {
        return try await strictSession.respond(to: Prompt(query)).content
    } catch LanguageModelSession.GenerationError.guardrailViolation {
        let softened = softenSensitiveLanguage(query)
        return try await strictSession.respond(to: Prompt(softened)).content
    }
}

Or use a softer prompt variants for sensitive domains. For topics like faith, mental health, grief, or relationships, keep two versions of your instructions:

A “normal” one for everyday questions.
A pastoral / gentle one that you automatically switch to if a guardrail violation occurs or if the input matches a sensitive pattern.

Lean on permissive transformations when you are mostly rephrasing if you are summarizing user text, explaining doctrinal material, or paraphrasing policy documents. Provide very explicit instructions about what the model is allowed to do. This reduces false positives while still giving the system permission to refuse dangerous behavior.

Change your prompt, not the policy. You are teaching the model to talk about hard topics in a way that stays inside the safety rails, instead of ripping the rails out.

Testing and Measuring Guardrail Behavior

You do not really understand your guardrails until you have tests and numbers.

Add explicit tests for safety behavior by creating a small suite of prompts that you expect to be:

Blocked (e.g., self‑harm instructions, hate content).
Allowed but carefully handled (e.g., “I’m depressed and need someone to talk to”).
Fully safe (everyday small talk).

For each, assert whether you see guardrailViolation, refusal, or a normal response. Having numbers makes it possible to say “we reduced guardrail false positives from 10% to 2% over three iterations” instead of “it feels better now.”

Detect implicit guardrail behavior in responses as even when no error is thrown, the model may output phrases like:

“I cannot assist with that request.”
“I’m sorry, but I cannot provide that information.”
“This content is too sensitive.”

Treat these phrases as soft guardrail signals. Log them, feed them into your telemetry, and rerun the query or provide a placeholder response.

Case Study: A Sensitive Topics App

I worked on a stealth app where my biggest challenge was navigating guardrails for sensitive topics.

For the sake of the NDA, I will not name the app, but users wrote about trauma, grief, or difficult emotions. The initial pass rate was around 60% on AI-assisted reflection prompts, even though the questions themselves were appropriate and user-initiated.

The solution combined .permissiveContentTransformations with carefully designed few-shot examples filled with compassionate, non-judgmental responses.

After numerous rounds of iteration and testing, the pass rate improved to nearly 100%, even on the strongest words I could think of.

Here are some lessons that I think can be applied to any app:

Teaching the model how to respond (through concrete examples) beats just telling it what to avoid
Grounding responses in domain‑appropriate sources reduces speculative content that can trigger guardrails
Using consistent sampling with a low temperature and concise prompt makes safety behavior reproducible and testable over time.

Safety in Instructions

Instructions take priority over prompts, making them your primary tool to ensure safe behavior. The most critical safety rule: never include untrusted content in instructions.

// WRONG - Security vulnerability
let unsafeInstructions = """
You are (userRole). Help the user with (userRequest).
"""

// RIGHT - Safe approach  
let safeInstructions = """
You are a helpful travel assistant. Help users plan safe, enjoyable trips.
"""
let userPrompt = "I want to plan a trip to (userDestination)"

Use clear, firm language in your instructions. The model responds well to direct commands, especially negative constraints:

let instructions = """
You are a family-friendly assistant for a children's app.
DO NOT include violence, scary content, or adult themes.
DO NOT use inappropriate language or discuss sensitive topics.
Keep all content appropriate for ages 8-12.
"""

Input Patterns with Risk Management

Different approaches to handling user input carry different levels of risk:

Direct User Input (Highest Risk)

// User controls the entire prompt
let response = try await session.respond(to: userInput)

This is the highest risk approach as the user can control the entire prompt. You should use this approach for chat interfaces, creative tools where flexibility is essential. The risk is that the user can inject inappropriate requests, such as asking for harmful content. The mitigations are strong safety instructions, while relying on the model to be able to handle the request and not generate harmful content.

Structured Prompts (Moderate Risk)

let prompt = """
Summarize the following content in a positive, family-friendly way:

Content: (userInput)

Requirements:
- Keep the tone optimistic
- Focus on key highlights only
- Avoid controversial topics
"""

This is a moderate risk approach as the user can control the prompt, but the prompt is structured and the model is expected to handle the request and not generate harmful content. Although, there is still a risk that the user content could still influence model behavior in unexpected ways.

Curated Options (Lowest Risk)

let prompts = [
    "Generate a happy bedtime story about animals",
    "Create a motivational quote for students", 
    "Suggest a fun family activity for weekends"
]

let selectedPrompt = prompts[userSelection]
let response = try await session.respond(to: selectedPrompt)

This is the lowest risk approach as the user can only select from a list of prompts. The risk is minimal as you control all possible inputs. The mitigations are careful curation, thorough testing of each option as the output is still probabilistic in nature.

Using Guided Generation for Safety

Structured output can also serve as a safety mechanism by constraining the content of responses:

@Generable
enum StoryTone: String, CaseIterable {
    case gentle, funny, adventurous
}

@Generable
struct SafeStoryResponse {
    @Guide(description: "One of: gentle, funny, adventurous")
    let tone: StoryTone

    @Guide(description: "3–5 sentences, age-appropriate, no violence or scary content")
    let text: String
}

let session = LanguageModelSession(
    instructions: Instructions("""
    You are a family-friendly writing assistant.
    DO NOT include violence, horror, or adult themes.
    Keep language simple and supportive.
    """)
)

do {
    let stream = session.streamResponse(
        to: "Bedtime story about a fox",
        generating: SafeStoryResponse.self
    )
    for try await snapshot in stream { 
        renderStory(snapshot.content) 
    }
} catch LanguageModelSession.GenerationError.guardrailViolation {
    showMessage("I cannot create that story. Please try a different topic.")
}

This approach combines the type safety of Swift with the content safety of Foundation Models.

Domain-Specific Safety Considerations

Different app categories require different safety approaches:

Food and Recipe Apps

let cookingInstructions = """
You are a helpful cooking assistant for a family recipe app.

Safety requirements:
- Always include allergy warnings for common allergens
- Mention food safety practices when relevant  
- Do not recommend raw or undercooked foods for vulnerable populations
- Consider dietary restrictions when suggesting substitutions

Example response format:
"This recipe contains nuts and dairy. Wash hands before cooking. Cook chicken to 165°F internal temperature."
"""

The additional considerations that I can think of including are allowing the users to set dietary restrictions, filtering suggestions based on known allergies, and verifying ingredient safety for children out of the many more that you can think of.

Educational Content

let educationInstructions = """
You are an educational assistant for middle school students.

Content guidelines:
- Keep all content age-appropriate for 11-14 year olds
- Focus on factual, educational information
- When discussing sensitive historical topics, maintain educational context
- If asked about inappropriate topics, redirect to related educational content

Teaching approach:
- Encourage curiosity and learning
- Break down complex concepts clearly
- Provide examples relevant to student experiences
"""

You will need to review the content for cultural sensitivity, avoid topics inappropriate for the age group and provide educational context for sensitive subjects.

Building Trust

The best AI apps are transparent about AI involvement and set correct expectations. This is a great way to build trust with your users.

// Good - Clear attribution
"Here is an AI-generated travel itinerary for your Tokyo trip:"

// Better - Sets realistic expectations  
"I have created a suggested itinerary using AI. Please verify details and make adjustments based on your preferences."

You can also include appropriate disclaimers:

let disclaimerInstructions = """
Always end responses with relevant disclaimers:

For travel advice: "Please verify current travel requirements and safety conditions."
For health information: "This is general information. Consult healthcare professionals for medical advice."  
For financial content: "This is educational information, not financial advice."
"""

Medical Disclaimer in System Instructions

In my app, Zenther, I include a consistent medical disclaimer across all the AI interactions:

let medicalDisclaimerInstructions = """
You are a fitness and wellness assistant. Do not provide medical diagnosis or treatment.
If the user describes symptoms such as chest pain, severe dizziness, or acute discomfort, advise them to stop exercising and seek medical attention.
Encourage consulting a qualified healthcare provider before starting or changing exercise programs.
Avoid dangerous or illegal guidance. Refuse requests that involve self-harm, unsafe practices, or unlawful activity.
Keep guidance general and wellness-oriented; do not claim to replace professional medical advice.
"""

User-Friendly Error Handling

The nutrition label scanning feature in Zenther has a graceful error handling:

case .safetyGuardrailsTriggered:
    return String(localized: "Unable to analyze this nutrition label. Please try a different image or enter the nutrition information manually.", comment: "Error message when AI safety guardrails prevent analysis of nutrition label")

What’s Next

You now understand how to implement safety measures that work with Foundation Models rather than against it.

Build incrementally. Start with the simplest version of your AI feature, test it thoroughly, then expand. Users prefer transparent limitations over ambitious promises that do not deliver consistently.

Different apps need different approaches. What works for a children’s educational app will not work for a professional writing tool. A recipe app has completely different safety concerns than a fitness tracker.

Design your safety measures around your actual users and use cases, not generic guidelines. Treat this chapter as a guide rather than a rulebook.