InferenceError with Apple Foundation Model – Context Length Exceeded on macOS 26.0 Beta

Question

Kushagra_Kumar OP

Created 3d

Replies 3

Boosts 0

Participants 2

Hello Team,

I'm currently working on a proof of concept using Apple's Foundation Model for a RAG-based chat system on my MacBook Pro with the M1 Max chip.

Environment details:

macOS: 26.0 Beta

Xcode: 26.0 beta 2 (17A5241o)

Target platform: iPad (as the iPhone simulator does not support Foundation models)

While testing, even with very small input prompts to the LLM, I intermittently encounter the following error:

InferenceError::inference-Failed::Failed to run inference: Context length of 4096 was exceeded during singleExtend.

Has anyone else experienced this issue? Are there known limitations or workarounds for context length handling in this setup?

Any insights would be appreciated.

Thank you!

Boost

Answer 1

DTS Engineer OP

Apple

3d

When you saw the error, did the session (LanguageModelSession) contain a long conversation? The context size applies to the session, and so you may hit the limit when the tokens of the whole conversation is over 4096, even the current prompt is short.

Did you try to create a new session when hitting the error? If a new session reproduces the error, I'd super curious to look into your configuration and code, if you don't mind to provide.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

0

Answer 2

Kushagra_Kumar OP

2d

Hello Team,

Thanks for your answer. Please find the code and screenshot of the issue. The issue is that in the same session, for some queries, I get the context length issue and for some I don't.

import SwiftUI
import FoundationModels

struct ChatMessage: Identifiable {
    let id = UUID()
    let isUser: Bool
    let content: String
    let timestamp: Date = Date()
}

struct ContentView: View {
    @State private var prompt: String = ""
    @State private var messages: [ChatMessage] = []
    @State private var isLoading: Bool = false
    @State private var session: LanguageModelSession?

    var body: some View {
        VStack(spacing: 0) {
            Text("💬 Chat Assistant")
                .font(.title2)
                .bold()
                .padding()

            Divider()

            ScrollViewReader { scrollProxy in
                ScrollView {
                    LazyVStack(spacing: 12) {
                        ForEach(messages) { message in
                            HStack(alignment: .bottom, spacing: 10) {
                                if message.isUser {
                                    Spacer()
                                    chatBubble(message.content, isUser: true)
                                    userAvatar
                                } else {
                                    botAvatar
                                    chatBubble(message.content, isUser: false)
                                    Spacer()
                                }
                            }
                            .padding(.horizontal)
                            .id(message.id)
                        }
                    }
                    .padding(.top, 10)
                }
                .onChange(of: messages.count) { _ in
                    if let last = messages.last {
                        scrollProxy.scrollTo(last.id, anchor: .bottom)
                    }
                }
            }

            Divider()

            HStack {
                TextField("Type a message...", text: $prompt)
                    .textFieldStyle(RoundedBorderTextFieldStyle())
                    .disabled(isLoading)

                if isLoading {
                    ProgressView()
                        .padding(.leading, 5)
                }

                Button("Send") {
                    Task { await sendMessage() }
                }
                .disabled(prompt.isEmpty || isLoading)
            }
            .padding()
        }
        .task {
            do {
                session = try await LanguageModelSession()
            } catch {
                messages.append(.init(isUser: false, content: "❌ Failed to start session: \(error.localizedDescription)"))
            }
        }
    }

    func sendMessage() async {
        let userInput = prompt.trimmingCharacters(in: .whitespacesAndNewlines)
        prompt = ""
        messages.append(ChatMessage(isUser: true, content: userInput))
        isLoading = true

        do {
            if let session {
                let output = try await session.respond(to: userInput)
                messages.append(ChatMessage(isUser: false, content: output.content))
            } else {
                messages.append(ChatMessage(isUser: false, content: "❌ No valid session."))
            }
        } catch {
            messages.append(ChatMessage(isUser: false, content: "❌ Error: \(error.localizedDescription)"))
        }

        isLoading = false
    }

    func chatBubble(_ text: String, isUser: Bool) -> some View {
        Text(text)
            .padding(12)
            .foregroundColor(.primary)
            .background(isUser ? Color.blue.opacity(0.2) : Color.gray.opacity(0.15))
            .cornerRadius(16)
            .frame(maxWidth: 250, alignment: isUser ? .trailing : .leading)
    }

    var userAvatar: some View {
        Image(systemName: "person.crop.circle.fill")
            .resizable()
            .frame(width: 32, height: 32)
            .foregroundColor(.blue)
    }

    var botAvatar: some View {
        Image(systemName: "sparkles")
            .resizable()
            .frame(width: 32, height: 32)
            .foregroundColor(.purple)
    }
}

Screenshot 2025-06-30 at 18.19.53.png

0

Answer 3

DTS Engineer OP

Apple

2d

Thanks for providing the details. That indeed seems to be a bug, and so I’d suggest that you file a feedback report and share your report ID here.

When you file feedback report, it’s super important to add the language model feedback attachment, which contains the session transcript that helps us reason the model’s output and analyze the error.

Additionally, as part of the debugging process, would you mind to run your app with Instruments.app + the Foundation Models instrument, and check if the numbers of tokens make sense? To do so:

In Xcode, open your project and click Product > Profile to launch Instruments.app.
Click Blank, and then the Choose button to see the main UI of Instruments.app.
Click + Instrument to add the Foundation Models instrument.
Start recording your app, reproduce the error, and then stop recording.

Instruments.app should show the token numbers of your requests in the Detail area, and from there, you can check if the numbers make sense.

Please attach the Instruments trace you capture to your feedback report as well.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

0