Thanks for being a part of WWDC25!

How did we do? We’d love to know your thoughts on this year’s conference. Take the survey here

InferenceError referencing context length in FoundationModels framework

I'm experimenting with downloading an audio file of spoken content, using the Speech framework to transcribe it, then using FoundationModels to clean up the formatting to add paragraph breaks and such. I have this code to do that cleanup:

private func cleanupText(_ text: String) async throws -> String? {
    print("Cleaning up text of length \(text.count)...")
    let session = LanguageModelSession(instructions: "The content you read is a transcription of a speech. Separate it into paragraphs by adding newlines. Do not modify the content - only add newlines.")
    
    let response = try await session.respond(to: .init(text), generating: String.self)
    return response.content
}

The content length is about 29,000 characters. And I get this error:

InferenceError::inferenceFailed::Failed to run inference: Context length of 4096 was exceeded during singleExtend..

Is 4096 a reference to a max input length? Or is this a bug?

This is running on an M1 iPad Air, with iPadOS 26 Seed 1.

I have the same problem. Note that 4096 is max token length, not text length, but I don't know how token length has been calculated

InferenceError referencing context length in FoundationModels framework
 
 
Q