At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks.
What are you most excited about in the Foundation Models framework?
The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage.
When should I still bring my own LLM via CoreML?
It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates.
Should I migrate PyTorch code to MLX?
MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models.
Can I test Foundation Models in Xcode simulator or device?
Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device.
Which on-device models will be supported? any open source models?
The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models.
How often will the Foundational Model be updated? How do we test for stability when the model is updated?
The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant.
Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you.
The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices.
What is the context window for the model? What are max tokens in and max tokens out?
The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded.
Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone?
Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively.
Do the foundation models support languages other than English?
Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice.
Are larger server-based models available through Foundation Models?
No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons.
Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework?
Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.