Model Rate Limits?

Trying the Foundation Model framework and when I try to run several sessions in a loop, I'm getting a thrown error that I'm hitting a rate limit.

Are these rate limits documented? What's the best practice here?

I'm trying to run the models against new content downloaded from a web service where I might get ~200 items in a given download. They're relatively small but there can be that many that want to be processed in a loop.

Answered by DTS Engineer in 843168022

I don't think we have documented the rate limit as of today, but as far as I know, an app that has UI and runs in the foreground doesn't have a rate limit when using the models; a macOS command line tool, which doesn't have UI, does.

Would you mind to share how you would use the models? In general, if you hit the rate limit, and can't work around that by switching to an app with UI, I’d suggest that you file a feedback report with your concrete use case for the Foundation Models folks to evaluate – If you do so, please share your report ID here for folks to track.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

I don't think we have documented the rate limit as of today, but as far as I know, an app that has UI and runs in the foreground doesn't have a rate limit when using the models; a macOS command line tool, which doesn't have UI, does.

Would you mind to share how you would use the models? In general, if you hit the rate limit, and can't work around that by switching to an app with UI, I’d suggest that you file a feedback report with your concrete use case for the Foundation Models folks to evaluate – If you do so, please share your report ID here for folks to track.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

In this case, it was a foregrounded visionOS app. This app is a feed reader / RSS app and this happened when I downloaded new feed items (~150 of them) and ran the model in a loop over each entry.

The actual model request is to summarize the content of the feed item.

FB17984127

The error is:

Error Domain=com.apple.SensitiveContentAnalysisML Code=15 "Failed model manager query for model com.apple.fm.language.instruct_300m.safety: Client rate limit exceeded, try again later"

I am having the same issue with a macOS command line tool and raised a feedback (17965726). My issue is that the rate limit seems to just block forever after you hit it. E.g. It will loop 10 times, and then throw a rate limit error. I can put in tasks to delay/sleep for up to 30 minutes after the error and it will still error about rate limits. What are the rate limits? Is X per second/minute/hour? Can we get some documentation or preferred implementation for those sorts of tasks that do loop through small prompt calls?

point of comparison: i ran my iOS26 app all night running foundation models in a Task { }, no rate limits. Phone got warm but no persistent errors.

(haven't tried macOS, visionOS though)

Model Rate Limits?
 
 
Q