InferenceError with Apple Foundation Model – Context Length Exceeded on macOS 26.0 Beta

Hello Team,

I'm currently working on a proof of concept using Apple's Foundation Model for a RAG-based chat system on my MacBook Pro with the M1 Max chip.

Environment details:

macOS: 26.0 Beta

Xcode: 26.0 beta 2 (17A5241o)

Target platform: iPad (as the iPhone simulator does not support Foundation models)

While testing, even with very small input prompts to the LLM, I intermittently encounter the following error:

InferenceError::inference-Failed::Failed to run inference: Context length of 4096 was exceeded during singleExtend.

Has anyone else experienced this issue? Are there known limitations or workarounds for context length handling in this setup?

Any insights would be appreciated.

Thank you!

When you saw the error, did the session (LanguageModelSession) contain a long conversation? The context size applies to the session, and so you may hit the limit when the tokens of the whole conversation is over 4096, even the current prompt is short.

Did you try to create a new session when hitting the error? If a new session reproduces the error, I'd super curious to look into your configuration and code, if you don't mind to provide.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Hello Team,

Thanks for your answer. Please find the code and screenshot of the issue. The issue is that in the same session, for some queries, I get the context length issue and for some I don't.

import SwiftUI
import FoundationModels

struct ChatMessage: Identifiable {
    let id = UUID()
    let isUser: Bool
    let content: String
    let timestamp: Date = Date()
}

struct ContentView: View {
    @State private var prompt: String = ""
    @State private var messages: [ChatMessage] = []
    @State private var isLoading: Bool = false
    @State private var session: LanguageModelSession?

    var body: some View {
        VStack(spacing: 0) {
            Text("💬 Chat Assistant")
                .font(.title2)
                .bold()
                .padding()

            Divider()

            ScrollViewReader { scrollProxy in
                ScrollView {
                    LazyVStack(spacing: 12) {
                        ForEach(messages) { message in
                            HStack(alignment: .bottom, spacing: 10) {
                                if message.isUser {
                                    Spacer()
                                    chatBubble(message.content, isUser: true)
                                    userAvatar
                                } else {
                                    botAvatar
                                    chatBubble(message.content, isUser: false)
                                    Spacer()
                                }
                            }
                            .padding(.horizontal)
                            .id(message.id)
                        }
                    }
                    .padding(.top, 10)
                }
                .onChange(of: messages.count) { _ in
                    if let last = messages.last {
                        scrollProxy.scrollTo(last.id, anchor: .bottom)
                    }
                }
            }

            Divider()

            HStack {
                TextField("Type a message...", text: $prompt)
                    .textFieldStyle(RoundedBorderTextFieldStyle())
                    .disabled(isLoading)

                if isLoading {
                    ProgressView()
                        .padding(.leading, 5)
                }

                Button("Send") {
                    Task { await sendMessage() }
                }
                .disabled(prompt.isEmpty || isLoading)
            }
            .padding()
        }
        .task {
            do {
                session = try await LanguageModelSession()
            } catch {
                messages.append(.init(isUser: false, content: "❌ Failed to start session: \(error.localizedDescription)"))
            }
        }
    }

    func sendMessage() async {
        let userInput = prompt.trimmingCharacters(in: .whitespacesAndNewlines)
        prompt = ""
        messages.append(ChatMessage(isUser: true, content: userInput))
        isLoading = true

        do {
            if let session {
                let output = try await session.respond(to: userInput)
                messages.append(ChatMessage(isUser: false, content: output.content))
            } else {
                messages.append(ChatMessage(isUser: false, content: "❌ No valid session."))
            }
        } catch {
            messages.append(ChatMessage(isUser: false, content: "❌ Error: \(error.localizedDescription)"))
        }

        isLoading = false
    }

    func chatBubble(_ text: String, isUser: Bool) -> some View {
        Text(text)
            .padding(12)
            .foregroundColor(.primary)
            .background(isUser ? Color.blue.opacity(0.2) : Color.gray.opacity(0.15))
            .cornerRadius(16)
            .frame(maxWidth: 250, alignment: isUser ? .trailing : .leading)
    }

    var userAvatar: some View {
        Image(systemName: "person.crop.circle.fill")
            .resizable()
            .frame(width: 32, height: 32)
            .foregroundColor(.blue)
    }

    var botAvatar: some View {
        Image(systemName: "sparkles")
            .resizable()
            .frame(width: 32, height: 32)
            .foregroundColor(.purple)
    }
}

Thanks for providing the details. That indeed seems to be a bug, and so I’d suggest that you file a feedback report and share your report ID here.

When you file feedback report, it’s super important to add the language model feedback attachment, which contains the session transcript that helps us reason the model’s output and analyze the error.

Additionally, as part of the debugging process, would you mind to run your app with Instruments.app + the Foundation Models instrument, and check if the numbers of tokens make sense? To do so:

  1. In Xcode, open your project and click Product > Profile to launch Instruments.app.
  2. Click Blank, and then the Choose button to see the main UI of Instruments.app.
  3. Click + Instrument to add the Foundation Models instrument.
  4. Start recording your app, reproduce the error, and then stop recording.

Instruments.app should show the token numbers of your requests in the Detail area, and from there, you can check if the numbers make sense.

Please attach the Instruments trace you capture to your feedback report as well.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

InferenceError with Apple Foundation Model – Context Length Exceeded on macOS 26.0 Beta
 
 
Q