I am calling into an app extension from a Safari Web Extension (sendNativeMessage, which in turn results in a call to NSExtensionRequestHandling’s beginRequest). My Safari extension aims to make use of the new foundation models for some of the features it provides.
In my testing, I hit the rate limit by sending 4 requests, waiting 30 seconds between each. This makes the FoundationModels framework (which would otherwise serve my use case perfectly well) unusable in this context, because the model is called in response to user input, and this rate of user input is perfectly plausible in a real world scenario.
The error thrown as a result of the rate limit is “Safety guardrail was triggered after consecutive failures during streaming.", but looking at the system logs in Console.app shows the rate limit as the real culprit.
My suggestions:
- Please introduce sensible rate limits for app extensions, through an entitlement if need be. If it is rate limited to 1 request per every couple of seconds, that would already fix the issue for me.
- Please document the rate limit.
- Please make the thrown error reflect that it is the result of a rate limit and not a generic guardrail violation. IMPORTANT: please indicate in the thrown error when it is safe to try again.
Filed a feedback here: FB18332004
Thank you for the feedback!
First, rate limiting is not expected when your device is connected to power. This is a known issue. (153216632)
Rate limiting applies when you device is on battery AND when your process is running in the background. Safari extensions run in the background. When using Foundation Models in the background, we recommend not streaming the responses as it would use more power and hit the rate limit sooner. Instead, we recommend calling respond
to generate the whole response.