Thanks for being a part of WWDC25!

How did we do? We’d love to know your thoughts on this year’s conference. Take the survey here

CoreML memory allocation logic

hello, I got a question about coreml. I loaded the coreml model in the project and set the computing unit to CPU+GPU. When I used instruments to analyze the performance, I found that there was an overhead of prepare gpu request before each inference. I also checked the freezing point graph and found that memory was frequently allocated. Is this as expected? Is there any way to avoid frequent prepares? I have tried some methods, such as memory sharing of predict interface input parameters, but it seems to be ineffective.

CoreML memory allocation logic
 
 
Q