While building an app with large language model inferencing on device, I got gibberish output. After carefully examining every detail, I found it's caused by the fused scaledDotProductAttention operation. I switched back to the discrete operations and problem solved. To reproduce the bug, please check https://github.com/zhoudan111/MPSGraph_SDPA_bug
Hello,
Thank you for letting us know.
Please take a moment to send a bug report with the Feedback Assistant. You can use your message above as content.