Metal

打开显示HUD图形后，应用崩溃

hi everyone, 我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口，在进行性能测试时，打开了设置-开发者-显示HUD图形。运行应用后，正常展示HUD，但应用很快发生了崩溃，日志主要信息如下： Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd Hardware Model: iPhone14,3 Process: waimai [16584] Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai Identifier: Version: 61488 (8.53.0) Code Type: ARM-64 Parent Process: ? [1] Date/Time: 2025-06-12 14:41:45.296 +0800 OS Version: iOS 18.0 (22A3354) Report Version: 104 Monitor Type: Mach Exception Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00 Crashed Thread: 57 Thread 57 Crashed: 0 libMTLHud.dylib esfm_GenerateTriangesForString + 408 1 libMTLHud.dylib esfm_GenerateTriangesForString + 92 2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204 3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656 4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72 5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56 6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104 7 Metal MTLDispatchListApply + 52 8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312 9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136 10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64 11 libdispatch.dylib _dispatch_client_callout4 + 20 12 libdispatch.dylib _dispatch_mach_msg_invoke + 464 13 libdispatch.dylib _dispatch_lane_serial_drain + 368 14 libdispatch.dylib _dispatch_mach_invoke + 456 15 libdispatch.dylib _dispatch_lane_serial_drain + 368 16 libdispatch.dylib _dispatch_lane_invoke + 432 17 libdispatch.dylib _dispatch_lane_serial_drain + 368 18 libdispatch.dylib _dispatch_lane_invoke + 380 19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288 20 libdispatch.dylib _dispatch_workloop_worker_thread + 540 21 libsystem_pthread.dylib _pthread_wqthread + 288 我们测试了几个不同的机型，只有iPhone 13 Pro Max会发生崩溃。 Q1：为什么会发生这个崩溃？ Q2：相同的逻辑，为什么仅在iPhone 13 Pro Max机型上出现崩溃？期待您的解答。

Graphics & Games Metal Metal MetalKit

0

16

14h

'tangents' was deprecated in visionOS 2.0: Use cp_drawable_compute_projection instead

I'm using a class with tangents to render on RealityKit for VisionOS but in Vision26 it cause a crash on App and there not documentation how implement cp_drawable_compute_projection I have tried a few options but without success. Could you help me to implement it ? The part of code is: return drawable.views.map { view in let userViewpointMatrix = (simdDeviceAnchor * view.transform).inverse let projectionMatrix = ProjectiveTransform3D( leftTangent: Double(view.tangents[0]), rightTangent: Double(view.tangents[1]), topTangent: Double(view.tangents[2]), bottomTangent: Double(view.tangents[3]), nearZ: Double(drawable.depthRange.y), farZ: Double(drawable.depthRange.x), reverseZ: true ) let screenSize = SIMD2(x: Int(view.textureMap.viewport.width), y: Int(view.textureMap.viewport.height)) return ModelRendererViewportDescriptor(viewport: view.textureMap.viewport, projectionMatrix: .init(projectionMatrix), viewMatrix: userViewpointMatrix * translationMatrix * rotationMatrix * scalingMatrix * commonUpCalibration, screenSize: screenSize) }

Graphics & Games RealityKit Metal RealityKit visionOS

0

18

1d

MapKit with MKTileOverlay Crashes After a Time

I'm building a weather map that shows the rain on the map. I'm able to retrieve PNG images that are used as tiles to put onto the map. I then reload all the tiles on the map with each timeframe (tile set for every 10 minutes). I'm able to get the map loaded up and I'm able to place the tiles and reload the data for each time slot. I preload all the PNG data needed for the tiles and store that NSData for them in memory so that they are quick for loading and showing on the map. I have timer's set to reload the overlay with the next set of tiles for each time slot. Giving the view of a moving precipitation map over time (just like you'd see on any weather map.) I have 12 time slots (timestamps) showing every 10 minutes for the past 2 hours. I have it showing each in sequence and then repeating. Over time I get a crash with this error as a Thread 1: signal SIGABRT. Failed to acquire drawable, rendering to temporary texture validateRenderPassDescriptor:782: failed assertion `RenderPass Descriptor Validation MTLRenderPassAttachmentDescriptor MTLStoreActionMultisampleResolve store action at attachment 0 requires resolve texture ' validateRenderPassDescriptor:782: failed assertion `RenderPass Descriptor Validation MTLRenderPassAttachmentDescriptor MTLStoreActionMultisampleResolve store action at attachment 0 requires resolve texture ' Through some searching I've discovered that this seems to be console output from Metal. I assume Metal is used for MapKit to render the overlay tiles? I'm using the same custom overlay where I set the timestamp on it and then tell it to reload. I also reuse the same MKOverlayRenderer as shown here... - (MKOverlayRenderer*)mapView:(MKMapView*)mapView rendererForOverlay:(id<MKOverlay>)overlay { if ([overlay isKindOfClass:[MKTileOverlay class]]) { if (!self.rainRenderer) { self.rainRenderer = [[MKTileOverlayRenderer alloc] initWithTileOverlay:overlay]; self.rainRenderer.alpha = 0.5; } return self.rainRenderer; } return nil; } And here's the function that reloads the overlay... - (void) updateRainFrame { self.currentFrameIndex = (self.currentFrameIndex + 1) % self.timestamps.count; if ((self.currentFrameIndex >= 0) && (self.timestamps.count > self.currentFrameIndex)) { NSLog (@"self.currentFrameIndex = %lu", self.currentFrameIndex); NSString *timestamp = self.timestamps[self.currentFrameIndex]; [self.overlay setTimestamp:timestamp]; [self.rainRenderer reloadData]; } } The time it takes to crash seems arbitrary. Sometimes it's very quick. Less than a minute. But usually it's several minutes. 10 or 20 minutes or more. Feels like some sort of race condition that's occurring. Perhaps ARC is not able to release the images for the tiles quick enough for each overlay reload? That's a wild guess but I think it's something more deeper in Metal as I feel I would see other errors related to memory availability. Some of my searches point to something about MSAA needing to be turned off in Metal to resolve this. However I have no idea how I would do that through MapKit. Any suggestions? Let me know if there is somehow a way to capture more from the crash to give more insight.

App & System Services Maps & Location Metal MapKit

2

0

43

15h

[CoreImage] OS 26 breaks Metal kernels for CIFilters

I maintain a couple of CoreImage libraries that provide custom Metal kernel backed CIFilters. In iOS/iPadOS 26, the CIColorKernel.apply() method invoked in the CIFilter subclass fails to add the coreimage::destination parameter to the Metal function call: -[CIColorKernel applyWithExtent:arguments:options:] argument count mismatch for kernel 'FractalNoise3D', expected 13 but saw 12. I've compiled the code with Xcode 26 and deployed to iOS 18 devices without any breakage, so this is definitely an iOS problem, not an Xcode problem. Library here: https://github.com/JoshuaSullivan/SimplexNoiseFilter Feedback ID: FB17874311

Media Technologies General Metal Core Image

0

49

1w

XCode 26 Metal Compiler build error

Just downloaded XCode 26 and I see build fails despite Metal toolchain 26.0 downloaded. What am I missing? cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain

Developer Tools & Services Xcode Metal Xcode

2

112

1w

HDR video & screen brightness

When I play an HDR video in the iPhone Photos app, I can see the HDR effect obviously. But if this HDR video is played continuously for more than 30-40 minutes, the HDR effect will disappear and the brightness will be compressed to the SDR range. This issue will appear on any iPhone. Depending on the phone, it may be 20-30 minutes, or 30-40 minutes, or even a few minutes, such as iPhone 12 mini. Similarly, if I use AVPlayer to play and preview an HDR video, if it plays more than 30-40 minutes, the HDR effect will disappear and the screen brightness will dim. Also the currentEDRHeadroom will gradually decrease to 1 Note, test it with an HDR video longer than 1 hour, and if the video is short, please loop it. My question is how to avoid losing the HDR effect after 30-40 minutes when I use CAMetalLayer to render any HDR video.

Media Technologies Video Metal AVFoundation EDR

0

38

1w

Unable to compile Core Image filter on Xcode 26 due to missing Metal toolchain

I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode. If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice? Xcode Version Version 26.0 beta (17A5241e) Build Output Showing All Errors Only Build target Lessons of project StudyJapanese with configuration Light RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese') cd /Users/chris/Code/SerpentiSei/StudyJapanese /bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"' ' error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain /Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Build failed 6/9/25, 8:31 PM 27.1 seconds Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select) xcodebuild -downloadComponent MetalToolchain Beginning asset download... Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg Done downloading: Metal Toolchain (17A5241c). Screenshots from Xcode Result of "Copy Information" Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)

Graphics & Games Metal Metal Core Image

7

0

323

1d

RealityKit/ARKit Memory Not Fully Released After AR Session Cleanup

Hi, I'm developing a SwiftUI app using RealityKit and ARKit for an AR measuring feature. I’ve noticed that after navigating away from my AR view and performing extensive cleanup (including removing all anchors/entities, pausing the ARSession, and nil-ing out all references), memory usage remains elevated and sometimes grows with repeated AR sessions. Each time I enter and exit the AR view, memory increases The memory does not return to the baseline after cleanup, even though all custom objects are deallocated. Are there best practices beyond what I’ve described to ensure all ARKit/RealityKit resources are released after an AR session?

Media Technologies Photos & Camera Metal ARKit SwiftUI RealityKit

0

27

1w

Regarding Smoothing in Spectrogram using Metal

Hey, I need to know how to use texture mapping for rendering a spectrogram in metal. As I need smoothens the spectrogram. In my current project I am using vertex based approach which results in blocky behaviour between each quad. I need to smooth across each qaud so that It will smoothly gradient over.

App & System Services Hardware Metal MetalKit Metal Performance Shaders

0

46

1w

Creating a voxel mesh and render it using metal within a RealityKit ImmersiveView

Hi everyone, I'm creating an educational App that allows doing computational design in an immersive environment with the Vision Pro. The App is free and can be found here: https://apps.apple.com/us/app/arcade-topology/id6742103633 The problem I have is that the mesh of voxels I currently create use ModelEntity and I recently read that this is horrible for scalability. I already start to see issues when I try to use thousands of voxels. I also read somewhere that I should then take advantage of GPUs and use metal to that end. I was wondering if someone could point me to a tutorial or article that discusses this. In essence, I need to create a 3D voxel mesh, and those voxels have to update their opacity within an iterative loop. Thanks! —Alejandro

Spatial Computing General Metal MetalKit RealityKit

3

0

70

1w

SCNTechnique clearColor Always Shows sceneBackground When Passes Share Depth Buffer

Problem Description I'm encountering an issue with SCNTechnique where the clearColor setting is being ignored when multiple passes share the same depth buffer. The clear color always appears as the scene background, regardless of what value I set. The minimal project for reproducing the issue: https://www.dropbox.com/scl/fi/30mx06xunh75wgl3t4sbd/SCNTechniqueCustomSymbols.zip?rlkey=yuehjtk7xh2pmdbetv2r8t2lx&st=b9uobpkp&dl=0 Problem Details In my SCNTechnique configuration, I have two passes that need to share the same depth buffer for proper occlusion handling: "passes": [ "box1_pass": [ "draw": "DRAW_SCENE", "includeCategoryMask": 1, "colorStates": [ "clear": true, "clearColor": "0 0 0 0" // Expecting transparent black ], "depthStates": [ "clear": true, "enableWrite": true ], "outputs": [ "depth": "box1_depth", "color": "box1_color" ], ], "box2_pass": [ "draw": "DRAW_SCENE", "includeCategoryMask": 2, "colorStates": [ "clear": true, "clearColor": "0 0 0 0" // Also expecting transparent black ], "depthStates": [ "clear": false, "enableWrite": false ], "outputs": [ "depth": "box1_depth", // Sharing the same depth buffer "color": "box2_color", ], ], "final_quad": [ "draw": "DRAW_QUAD", "metalVertexShader": "myVertexShader", "metalFragmentShader": "myFragmentShader", "inputs": [ "box1_color": "box1_color", "box2_color": "box2_color", ], "outputs": [ "color": "COLOR" ] ] ] And the metal shader used to display box1_color and box2_color with splitting: fragment half4 myFragmentShader(VertexOut in [[stage_in]], texture2d<half, access::sample> box1_color [[texture(0)]], texture2d<half, access::sample> box2_color [[texture(1)]]) { half4 color1 = box1_color.sample(s, in.texcoord); half4 color2 = box2_color.sample(s, in.texcoord); if (in.texcoord.x < 0.5) { return color1; } return color2; }; Expected Behavior Both passes should clear their color targets to transparent black (0, 0, 0, 0) The depth buffer should be shared between passes for proper occlusion Actual Behavior Both box1_color and box2_color targets contain the scene background instead of being cleared to transparent (see attached image) This happens even when I explicitly set clearColor: "0 0 0 0" for both passes Setting scene.background.contents = UIColor.clear makes the clearColor work as expected, but I need to keep the scene background for other purposes What I've Tried Setting different clearColor values - all are ignored when sharing depth buffer Using DRAW_NODE instead of DRAW_SCENE - didn't solve the issue Creating a separate pass to capture the background - the background still appears in the other passes Various combinations of clear flags and render orders Environment iOS/macOS, running with "My Mac (Designed for iPad)" Xcode 16.2 Question Is this a known limitation of SceneKit when passes share a depth buffer? Is there a workaround to achieve truly transparent clear colors while maintaining a shared depth buffer for occlusion testing? The core issue seems to be that SceneKit automatically renders the scene background in every DRAW_SCENE pass when a shared depth buffer is detected, overriding any clearColor settings. Any insights or workarounds would be greatly appreciated. Thank you!

Graphics & Games SceneKit Metal Swift MetalKit SceneKit

0

51

1w

Sparse Texture Writes

Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }

Graphics & Games Metal Metal

1

0

71

3w

Trouble with MDLMesh.newBox()

I'm trying to build an MDLMesh then add normals let mdlMesh = MDLMesh.newBox(withDimensions: SIMD3<Float>(1, 1, 1), segments: SIMD3<UInt32>(2, 2, 2), geometryType: MDLGeometryType.triangles, inwardNormals:false, allocator: allocator) mdlMesh.addNormals(withAttributeNamed: MDLVertexAttributeNormal, creaseThreshold: 0) When I render the mesh, some normals are (0,0,0). I don't know if the problem is in the mesh, or in the conversion to MTKMesh. Is there a way to examine an MDLMesh with the geometry viewer? When I look at the variable values for my mdlMesh I get this: Not too useful. I don't know how to track down the normals. What's the best way to find out where the normals getting broken?

Graphics & Games General Metal MetalKit Xcode Debugging

1

0

62

3w

trouble with MDLMesh.newBo()

I made a box with MDLMesh.newBox(). I added normals. let mdlMesh = MDLMesh.newBox(withDimensions: SIMD3<Float>(1, 1, 1), segments: SIMD3<UInt32>(2, 2, 2), geometryType: MDLGeometryType.triangles, inwardNormals:false, allocator: allocator) mdlMesh.addNormals(withAttributeNamed: MDLVertexAttributeNormal, creaseThreshold: 0.25) After I convert to MTKMesh the normals are (0,0,0) for a group of vertices. I can only inspect the geometry after I convert to MTKMesh. Is there a way you can use Geometry Viewer on a MDLMesh?

Developer Tools & Services Instruments Metal MetalKit

0

28

May ’25

vsync, drawable present, instrument gui

hi When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?

Graphics & Games Metal Graphics and Games Metal

0

60

May ’25

Xcode cannot find any frameworks

I am new to Xcode and trying to learn how to use Metal for my internship. I am trying to link the binaries of Foundation.framework, Metal.framework, and Quartcore.framework. But whenever I try to build it always fails to find any of them. I have my Header Search Path as $(PROJECT_DIR)/metal-cpp, I tried adding some for the Frameworks but that did not work either. I do have the binaries linked in the Build Phases, so I don't know what else I could be missing.

Developer Tools & Services Xcode Metal Quartz

2

0

57

May ’25

CMake unable to generate the Xcode file described in this tutorial

In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?

Graphics & Games Metal Metal MetalKit USDZ OpenGL

1

0

74

May ’25

CoreML Model Conversion Help

I’m trying to follow Apple’s “WWDC24: Bring your machine learning and AI models to Apple Silicon” session to convert the Mistral-7B-Instruct-v0.2 model into a Core ML package, but I’ve run into a roadblock that I can’t seem to overcome. I’ve uploaded my full conversion script here for reference: https://pastebin.com/T7Zchzfc When I run the script, it progresses through tracing and MIL conversion but then fails at the backend_mlprogram stage with this error: https://pastebin.com/fUdEzzKM The core of the error is: ValueError: Op "keyCache_tmp" (op_type: identity) Input x="keyCache" expects list, tensor, or scalar but got state[tensor[1,32,8,2048,128,fp16]] I’ve registered my KV-cache buffers in a StatefulMistralWrapper subclass of nn.Module, matching the keyCache and valueCache state names in my ct.StateType definitions, but Core ML’s backend pass reports the state tensor as an invalid input. I’m using Core ML Tools 8.3.0 on Python 3.9.6, targeting iOS18, and forcing CPU conversion (MPS wasn’t available). Any pointers on how to satisfy the handle_unused_inputs pass or properly declare/cache state for GQA models in Core ML would be greatly appreciated! Thanks in advance for your help, Usman Khan

Machine Learning & AI Core ML Metal Metal Performance Shaders Core ML tensorflow-metal

0

92

May ’25

Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions

Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..

Graphics & Games Metal Graphics and Games Metal

1

0

77

Apr ’25

iOS Metal system delayed one Vsync period to really display the frame on the screen

View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.

Graphics & Games Metal Metal

0

54

Apr ’25

Post

Replies

Boosts

Views

Activity

Metal

Posts under Metal tag

Post

Replies

Boosts

Views

Activity