Stream response

With respond() methods, the foundation model works well enough. With streamResponse() methods, the responses are very repetitive, verbose, and messy.

My app with foundation model uses more than 500 MB memory on an iPad Pro when running from Xcode. Devices supporting Apple Intelligence have at least 8GB memory. Should Apple use a bigger model (using 3 ~ 4 GB memory) for better stream responses?

Hi @yvsong,

In terms of the discrepancy you're seeing between the quality of the respond and streamResponse methods, please submit a feedback, and let me know the FB# so I can keep track of it once you've done so.

Currently, Foundation Model's maximum context window is around 4,000 tokens. It's designed for use-cases like summarization, extraction, classification, and tagging. For tasks that require advanced reasoning or world knowledge, or a much larger context window, consider using a server-based LLM.

We've also shared that the on-device models will be updated in sync with operating system updates, so there may be improvements over time.

Best,

-J

I had a programming error. The PartiallyGenerated responses are sequentially improving versions of the final response. Initially I thought they were sequential responses to be combined together as the final response. With proper programming, the stream responses are as good as results of respond method.

Stream response
 
 
Q