[MPSGraph runWithFeeds:targetTensors:targetOperations:] randomly crash

I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this:

LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295)

and crash, the stack backtrace screenshot is attached. Note that 5th frame is

mlir::getIntValues<long long>

and 6th frame is

llvm::SmallVectorBase<unsigned int>::grow_pod

It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?

[MPSGraph runWithFeeds:targetTensors:targetOperations:] randomly crash
 
 
Q