大多数浏览器和
Developer App 均支持流媒体播放。
-
深入探索 Metal 4 游戏
深入了解 Metal 4 的最新改进。我们将介绍全新光线追踪功能,助你将复杂度极高且视觉效果丰富的工作负载成功移植到 Apple 芯片上。了解 MetalFX 如何通过渲染画质提升、帧插值和场景去噪来扩展工作负载。 为了充分从这个讲座中获益,我们建议你先观看“探索 Metal 4”和“探索 Metal 4 游戏”。
章节
- 0:00 - 简介
- 2:13 - 渲染画质提升
- 7:17 - 帧插值
- 13:50 - 利用 Metal 4 提升光线追踪效果
- 19:25 - 同时实现去噪和画质提升
- 26:08 - 后续步骤
资源
相关视频
WWDC25
WWDC23
WWDC22
-
搜索此视频…
Hi, my name is Matias Koskela. Today I am going to introduce you to the techniques and best practices, that will help you go further with your advanced games and pro-apps on Apple platforms.
Before this talk you might want to watch “Discover Metal 4” for an overview of Metal 4, and “Explore Metal 4 games”, to learn how to use the latest iteration of Metal. This "go further" talk is the second part of our Metal 4 gaming series. You can also learn about how Metal 4 can help you combine machine learning and graphics in a separate track.
Games, like CyberPunk 2077 shown here, are becoming more realistic with high quality rendering. This makes each pixel a more expensive task, making high resolutions and frame rates even more challenging. With Metal, you can render high quality frames across Apple platforms ranging from iPhones to Macs. If you're using techniques, such as rasterization or ray tracing, Metal provides you with easy to use APIs. You can scale your workloads for even higher resolutions, and frame rates, by using MetalFX Upscaling.
And if you want to go even further, you can now use the new MetalFX frame interpolator.
The latest games, like Cyberpunk 2077, offer a very realistic, real time path tracing. These expanded capabilities in real time rendering can be achieved with the new awesome features in Metal 4. These include ray tracing enhancements, and the new MetalFX denoised upscaler, to easily scale, by reducing the required ray count in your game.
The MetalFX Upscaler can help you achieve higher resolution and faster frame rates. You can smooth out your gameplay even more with the new MetalFX frame interpolator. New Metal 4 ray tracing features can improve your performance even further, and you can combine them with the new MetalFX denoised upscaler.
Upscaling is a widely used technique, that will help you boost your performance in most scenarios. MetalFX has a machine learning based upscaler, that has been a part of Apple platforms since 2022, and has gotten better every year.
MetalFX Upscaling now includes new tools and techniques, which you can use to improve your game's quality and performance. The first step is to properly apply temporal upscaling to your game. One part of that process is getting the exposure input parameter right. You can then improve performance further with dynamic resolution and you can also improve the quality in certain scenarios using reactivity hints.
Imagine a typical rendering pipeline. First the frame is rasterized, or ray traced, before your game performs post processing effects such as motion blur. Next, it applies exposure and tone-mapping, then it renders UI, and finally it displays the frame to the player. The ideal place to add MetalFX upscaling is after jittered rendering, and before post effects. You can watch "Boost performance with MetalFX Upscaling" for more details on integrating the upscaler. This year, you have even more tools and features available to improve your game performance.
Setting a correct exposure value on the upscaler is essential for a high quality outcome.
If you pass in a drastically wrong value it can lead to flickering and ghosting.
In the rendering pipeline, the input and output color of the upscaler are in linear color space. The upscaler takes in a parameter called exposure, which when multiplied with your color input, results in a brightness that should approximately match the exposure used, in your tone mapping. This ensures that the upscaler understands what visible features of the frame are, when they are displayed to the player. Note that the value is just a hint for the upscaler, and doesn't change the brightness of the output. MetalFX includes a new tool to help you tune the exposure input value, you send to the upscaler. It's called the exposure debugger. To enable it, set the environment variable MTLFX_EXPOSURE_TOOL_ENABLED Now the upscaler renders a grey checkerboard on top of your frame and applies the inverse of your exposure value to it.
You can then review how the pattern looks at the very end of your pipeline on the display.
If the exposure value you pass in to the upscaler is a mismatch with the tonemapper, the checkerboard will appear too dark, or too bright.
Another indicator of a mismatch is when the brightness of the checkerboard changes while your game is running.
When your exposure value is correct the grid pattern is a constant mid gray.
Because game complexities can change a lot from scene to scene, many games have adopted Dynamic Resolution Rendering.
When the frame is more complex, the upscaler input resolution is lowered. And when it is even more challenging, your game dynamically lowers the input resolution even further. The MetalFX temporal upscaler now supports dynamically sized inputs, instead of requiring you to pass in the same sized input every frame. To get the best scaling quality, your game should not set the maximum scale to higher than 2x scaling if it is not needed.
Another new feature in the MetalFX temporal upscaler is a new optional functionality to hint to the upscaler about pixels reactivity.
When your game is rendering transparent effects, or particles like the fireworks, it does not render them into motion and depth textures.
At high scaling ratios and low input resolutions, you might find that such particles blend in a bit with the background or they might show ghosting. This happens because in rendering they can appear just like texture details or specular highlights. To give you control over how your particles are handled the upscaler now takes in a new optional input called the reactive mask. This mask allows you to mark areas covered by these effects.
To use it, set a reactive mask value in the shader, for example, based on the material type in the G buffer. On the host code, bind the texture to the temporal upscaler object before you encode it.
Only use the reactive mask if going to higher input resolutions is not an option. Also do not use a reactive mask that is tuned for another upscaler, because it could be masking areas that already look great in the MetalFX upscaler output. Using the Upscaler provides great performance with great quality. But sometimes you want to hit even higher refresh rates. This year MetalFX introduces frame interpolation to all Apple platforms. MetalFX frame interpolation is really easy to integrate into your game. You'll first set up an interpolator object, render UI into your interpolated frames, and correctly present and pace your frame.
Frame interpolation can help you use pixels you’ve already rendered, to enable a smooth gaming experience.
Here is the same rendering pipeline, this time without UI rendering.
Interpolate your frames after your tone-mapping step. And note that for even higher resolutions and frame-rates you can have both upscaling and interpolation in the same pipeline.
To use the MetalFX frame interpolator your app provides two rendered frames, motion vectors and depth. If you have adopted the upscaler, the same motion vectors and depth can be used. The motion texture has color for objects because they have moved right. With these inputs, MetalFX generates a frame in between those two rendered frames.
To set up the interpolator, for higher combined performance, provide the upscale object to the interpolator descriptor. When you create the interpolator, define its motion scale and depth convention. Then bind all the required five textures to the interpolator.
Once you are starting to get some interpolated frames, it's time to think about UI rendering.
In the typical rendering pipeline, a game typically renders its UI at the end of each frame around the same location where the frame interpolation should happen.
The UI rendering alpha blends elements into to the frame, might contain every frame changing text and doesn't modify the motion or depth textures.
You have multiple ways to achieve great looking UI with frame interpolation enabled.
There are three most commonly used techniques for rendering UI with frame interpolation. There's Composited UI, Offscreen UI and Every-Frame UI.
In composited UI, the interpolator gets the previous frame N - 1, the current frame N without UI, and the same frame N with UI. The composited UI is easiest to adopt. In this mode the frame interpolator can see the delta between texture with UI and without UI. This way it can try to remove the UI, and put it in the right location in the interpolated frame. But unblending an already blended pixel cannot be done perfectly. Therefore, you can help the interpolator out by using one of the other options.
Like the offscreen UI, where the UI is rendered into a completely separate UI texture. The interpolator then adds it on top of the interpolated frame. Inputting it to the interpolator saves you from one extra load and store. Since the interpolator can write the UI into its output.
Finally in every-frame UI, the UI handling is left up to your code, which might require the biggest code changes on your side. But in this case you can also update the UI for the interpolated frame, resulting in the smoothest experience for the player.
Now you have a nice looking UI on top of the interpolated frame too. And it is time to think about, how both the interpolated and natively rendered frames can be presented in the right order, and with the right intervals.
Typically your game rendering consists of the Render thread, the GPU, and the Present thread. The render thread sets up the necessary work for the GPU and Presentation. When a frame is rendered, the interpolator can generate a frame with a timestamp in between the just rendered frame and the previous frame. And your game can then present the interpolated frame. After a present interval your game can display the most recently rendered frame.
Determining the length of this interval in a consistent way can be tricky. But it's needed to get the pacing of your game just right.
The new Metal HUD can be a great tool to help you identify when your pacing is off. Watch the "Level up your games" talk for more details on how to enable it, and learn more about all the new awesome features it has to offer.
Take a look at the Frame Interval graph where horizontal axis is time and vertical axis is frame interval length.
If the graph shows an irregular pattern and the spikes indicating longer frame update intervals seem random, it indicates that your pacing is off.
Another way to know that your pacing is off, is that you have more than two frame interval histogram buckets.
Once your pacing is fixed, you should see a flat line if you are meeting your target display refresh rate, or regular repeating pattern if you are below it, with maximum of two histogram buckets.
Here's an example of how it can be done correctly with a handy presentHelper class. During the draw loop, everything is rendered into low resolution texture, and upscaled by the MetalFX upscaler. UI is rendered after telling the helper that UI render starts. And finally, the interpolator call is handled by the presentHelper class. Check out the sample code for implementation details.
In addition to pacing, it's also important to get the delta time and camera parameters right. The occlusion area might have artifacts, if not all the parameters are correct. With the correct parameters the occlusion area aligns perfectly.
This is because the interpolator can now adjust the motion vectors, to match the length of the real simulation motion.
After getting all the inputs and pacing right, the interpolated frames should look great. And, your interpolation input should have a decently high frame rate. Try to have at minimum 30 frames per second before interpolation.
The upscaler and frame interpolator are techniques you can universally use to scale almost any rendering style. In contrast, ray tracing is typically used in higher end rendering scenarios. Metal 4 adds a number of new ray tracing features around acceleration structure builds and intersection functions.
More and more games are using Metal ray tracing on Apple platforms.
In this demo, the lighting is realistic, and the drone is visible in the reflections on the floor. The ray tracing techniques and the complexities vary from game to game.
That requires more flexibility in intersection function management and more options for acceleration structure builds.
Metal 4 introduces new features to help streamline both of these.
To learn the basics of Metal ray tracing, such as building acceleration structures and intersection functions, watch "Your guide to Metal ray tracing".
Consider a game that ray traces a simple scene with grass around a tree.
Already in this simple scene, there are multiple material types such as alpha tested foliage and the opaque trunk of the tree. And as a result, many different ray tracing intersection functions are required. Separately for primary rays and shadow rays. An intersection function buffer is an argument buffer that contains handles to your scene’s intersection functions.
For example, the grass and the leaves might need similar functionality to trace primary rays. Intersection function buffers allow your game to easily have multiple entries pointing to the same intersection function.
Setting up intersection function buffer indices requires, setting state on the instance level, where this example scene has two instances. And on the geometry level, where the grass has only one geometry and the tree has two. The intersector needs to know, which intersection function to use for shadow rays that hit the trunk.
When you’re creating your instance acceleration structures, specify the intersectionFunctionTableOffset on each instance descriptor.
When building your primitive acceleration structure, you also set the intersectionFunctionTableOffset on the geometry descriptors.
When you set up the intersector in your shader, add “intersection_function_buffer” to its tags.
Next, set the geometry multiplier on the intersector. The multiplier is the number of ray types in your intersection function buffer. Our example has two ray types for each geometry. Therefore the correct value here is two. And within those two ray types, you need to provide the base index for the ray type you are tracing. In this example the base index for tracing primary rays would be 0.
And for tracing shadows, the base id is 1.
When the instance and geometry contribution of the tree trunk, the geometry multiplier, and the base id of shadow ray type are combined, the pointer ends up in the desired intersection function.
Finish your code off by passing the intersection function buffer arguments to the intersect method.
By specifying the buffer, its size, and stride. These give you some extra flexibility, compared to what you might be used to in other APIs. If you’re porting from DirectX, you can port your shader binding tables to Metal intersection function buffers easily.
In DirectX, you set the Intersection Function Buffer address and stride on the host, when creating the descriptor to dispatch the rays. In Metal, you set this in the shader. All the threads in the SIMD group should set the same value, or the behavior is undefined.
Ray type index and geometry multiplier are handled in the same way in DirectX and Metal. Your app can set them in your shader. In DirectX and Metal, you set the instance offset index per instance when creating your instance acceleration structure. But while the geometry offset index is generated automatically in DirectX, Metal gives you the flexibility to set this geometry offset yourself.
Intersection function buffers greatly improve the Metal porting experience for your ray traced game. Once you’re up and running, Metal 4 also gives you the ability to optimize how Metal builds your acceleration structures. Metal already provides you a lot of control over acceleration structure builds. Besides the default behavior, you can optimize for refit, enable larger scenes, or build the acceleration structure more quickly. This year, you get even more flexibility and can prefer fast intersection to reduce the time it takes to trace rays.
Or you can opt to minimize the memory usage of your acceleration structure as well.
Usage flags can be set per acceleration structure build, and don't have to be the same for all acceleration structures.
The new acceleration structure flags make the ray tracing part of your rendering pipeline even more tailored for your needs. In case you use it for stochastic effects, you'll need a denoiser. And now, denoising can be part of your MetalFX upscaler.
Real time ray tracing is used more all the time, both simpler hybrid ray tracing, and all the way to complex path tracing. In this example image, ray tracing makes everything more grounded, and improves the reflections significantly. The best quality and performance trade off in ray tracing can be achieved by using denoising with fewer rays.
With the new MetalFX API, combining upscaling and denoising can be as easy as adding a couple of extra inputs. But you can improve the quality further by helping the denoised upscaler more, by adding additional inputs, and by getting the details right.
Before you can combine your upscaler and denoiser let's take look how these steps are traditionally done.
Typical real time and interactive ray traced rendering pipelines trace multiple effects separately, denoise them separately, and compose the result into one noise free jittered texture. Which is upscaled by MetalFX temporal upscaler. Followed by post processing.
Traditional denoisers require separate artistic parameter tuning for each scene. Here you can see how some denoisers look without artist tuned parameters. In contrast, there is no need to tune parameters with MetalFX denoised upscaler. Which is applied after the main rendering, and just before post processing. Machine learning based techniques in MetalFX provide robust, high performance and high quality denoising and upscaling across many scenarios. And it is easier to integrate. Integrating the upscaler is a good starting point on the way to integrate the denoised upscaler. Here we can see the inputs to the upscaler. Color, motion and depth. The new combined API is a super set of the upscaler API.
For the new API, we need to add extra noise free auxiliary buffers, which are shown here on the left. Most of these are something your app might already have. Let's dive deeper into each one of them next.
The first new input is normals. For best results, these should be in world space.
Then diffuse albedo, this is the base color of the diffuse radiance of the material.
Next roughness, it represents how smooth or how rough the surface is, which is a linear value. And the last input is specular albedo. This should be a noise free approximation of the specular radiance of your rendering. It should include a fernel component. In code the addition of these new inputs is simple.
Creation of a typical temporal upscaler only takes about 10 lines of code. To enable the denoised version, you need to change the scaler type and add the types of the additional textures.
Similarly when encoding the scaler this would be the upscaler call. And also here the only difference is that you need to bind the extra input textures.
After setting up the basic usage of the denoiser, you can improve it by using some of the optional inputs. And by avoiding some typical integration pit falls.
There are some optional input textures, which can be used to improve the quality.
First is specular hit distance, telling the ray length from the pixel primary visibility point to secondary bounce point. Then denoiser strength mask, which can be used to mark areas that don't need denoising. And finally transparency overlay, which is used based on the alpha channel to blend in color that is only upscaled and not denoised.
The most typical integration issue is input that is too noisy. To fix this, you should use all the standard path tracing sampling improvements, like next event estimation, importance sampling techniques, and in a bigger scene with many light sources, mostly sample the light sources that actually contribute to the area. Another thing related to ray tracing sample quality is correlated random numbers. You should not use random number generators that are too correlated. Both spatial and temporal correlation can cause artifacts.
One potential pit fall related to the auxiliary data is with metallic material's diffuse albedo. In this example, the chess pieces are metallic, and therefore have color in specular albedo. In that case, the diffuse albedo for the chess pieces should be darker.
And finally, there are some common pitfalls related to the normals. MetalFX denoised upscaler expects normals to be in world space for better denoising decisions. You need to use a texture datatype that has a sign bit, otherwise the quality can be suboptimal, depending on the orientation of the camera.
After getting all these details right, you should have nice denoised and upscaled frames.
Let's take a look at what happens when you put all of these features into a single renderer.
My colleagues put together a demo, that uses the rendering pipeline I talked about earlier. The demo uses the new Metal 4 ray tracing features to optimize the ray tracing part of the rendering. It does denoising and upscaling at the same time with MetalFX denoised upscaler. And after exposure and tone mapping, the frames are interpolated by MetalFX frame interpolator.
This demo uses advanced ray tracing lighting effects such as global illumination, reflections, shadows, and ambient occlusions, to bring to life a scene showing two robots playing chess. In the right upper corner view you can see the rendering before any MetalFX processing. And other MetalFX input in other views.
We adopted both the MetalFX denoised upscaler and the frame interpolator. The denoiser also greatly simplified the rendering by eliminating all manual tuning of the final look.
If you have already integrated the MetalFX upscaler, this is your opportunity to upgrade to frame interpolation. If you're new to MetalFX, take a look at the upscaler first. Then make sure your ray tracing effects are using best practices like the intersection function buffers covered today. And reduce your game's ray budget with denoised upscaler.
I cannot wait to see the new features in action in your games. And what you will create using Metal 4. Thanks for watching!
-
-
6:46 - Reactive Mask
// Create reactive mask setup in shader out.reactivity = m_material_id == eRain ? (m_material_id == eSpark ? 1.0f : 0.0f) : 0.8f; // Set reactive mask before encoding upscaler on host temporalUpscaler.reactiveMask = reactiveMaskTexture;
-
8:35 - MetalFX Frame Interpolator
// Create and configure the interpolator descriptor MTLFXFrameInterpolatorDescriptor* desc = [MTLFXFrameInterpolatorDescriptor new]; desc.scaler = temporalScaler; // ... // Create the effect and configure your effect id<MTLFXFrameInterpolator> interpolator = [desc newFrameInterpolatorWithDevice:device]; interpolator.motionVectorScaleX = mvecScaleX; interpolator.motionVectorScaleY = mvecScaleY; interpolator.depthReversed = YES; // Set input textures interpolator.colorTexture = colorTexture; interpolator.prevColorTexture = prevColorTexture; interpolator.depthTexture = depthTexture; interpolator.motionTexture = motionTexture; interpolator.outputTexture = outputTexture;
-
12:45 - Interpolator present helper class
#include <thread> #include <mutex> #include <sys/event.h> #include <mach/mach_time.h> class PresentThread { int m_timerQueue; std::thread m_encodingThread, m_pacingThread; std::mutex m_mutex; std::condition_variable m_scheduleCV, m_threadCV, m_pacingCV; float m_minDuration; uint32_t m_width, m_height; MTLPixelFormat m_pixelFormat; const static uint32_t kNumBuffers = 3; uint32_t m_bufferIndex, m_inputIndex; bool m_renderingUI, m_presentsPending; CAMetalLayer *m_metalLayer; id<MTLCommandQueue> m_presentQueue; id<MTLEvent> m_event; id<MTLSharedEvent> m_paceEvent, m_paceEvent2; uint64_t m_eventValue; uint32_t m_paceCount; int32_t m_numQueued, m_framesInFlight; id<MTLTexture> m_backBuffers[kNumBuffers]; id<MTLTexture> m_interpolationOutputs[kNumBuffers]; id<MTLTexture> m_interpolationInputs[2]; id<MTLRenderPipelineState> m_copyPipeline; std::function<void(id<MTLRenderCommandEncoder>)> m_uiCallback = nullptr; void PresentThreadFunction(); void PacingThreadFunction(); void CopyTexture(id<MTLCommandBuffer> commandBuffer, id<MTLTexture> dest, id<MTLTexture> src, NSString *label); public: PresentThread(float minDuration, CAMetalLayer *metalLayer); ~PresentThread() { std::unique_lock<std::mutex> lock(m_mutex); m_numQueued = -1; m_threadCV.notify_one(); m_encodingThread.join(); } void StartFrame(id<MTLCommandBuffer> commandBuffer) { [commandBuffer encodeWaitForEvent:m_event value:m_eventValue++]; } void StartUI(id<MTLCommandBuffer> commandBuffer) { assert(m_uiCallback == nullptr); if(!m_renderingUI) { CopyTexture(commandBuffer, m_interpolationInputs[m_inputIndex], m_backBuffers[m_bufferIndex], @"Copy HUDLESS"); m_renderingUI = true; } } void Present(id<MTLFXFrameInterpolator> frameInterpolator, id<MTLCommandQueue> queue); id<MTLTexture> GetBackBuffer() { return m_backBuffers[m_bufferIndex]; } void Resize(uint32_t width, uint32_t height, MTLPixelFormat pixelFormat); void DrainPendingPresents() { std::unique_lock<std::mutex> lock(m_mutex); while(m_presentsPending) m_scheduleCV.wait(lock); } bool UICallbackEnabled() const { return m_uiCallback != nullptr; } void SetUICallback(std::function<void(id<MTLRenderCommandEncoder>)> callback) { m_uiCallback = callback; } }; PresentThread::PresentThread(float minDuration, CAMetalLayer *metalLayer) : m_encodingThread(&PresentThread::PresentThreadFunction, this) , m_pacingThread(&PresentThread::PacingThreadFunction, this) , m_minDuration(minDuration) , m_numQueued(0) , m_metalLayer(metalLayer) , m_inputIndex(0u) , m_bufferIndex(0u) , m_renderingUI(false) , m_presentsPending(false) , m_framesInFlight(0) , m_paceCount(0) , m_eventValue(0) { id<MTLDevice> device = metalLayer.device; m_presentQueue = [device newCommandQueue]; m_presentQueue.label = @"presentQ"; m_timerQueue = kqueue(); metalLayer.maximumDrawableCount = 3; Resize(metalLayer.drawableSize.width, metalLayer.drawableSize.height, metalLayer.pixelFormat); m_event = [device newEvent]; m_paceEvent = [device newSharedEvent]; m_paceEvent2 = [device newSharedEvent]; } void PresentThread::Present(id<MTLFXFrameInterpolator> frameInterpolator, id<MTLCommandQueue> queue) { id<MTLCommandBuffer> commandBuffer = [queue commandBuffer]; if(m_renderingUI) { frameInterpolator.colorTexture = m_interpolationInputs[m_inputIndex]; frameInterpolator.prevColorTexture = m_interpolationInputs[m_inputIndex^1]; frameInterpolator.uiTexture = m_backBuffers[m_bufferIndex]; } else { frameInterpolator.colorTexture = m_backBuffers[m_bufferIndex]; frameInterpolator.prevColorTexture = m_backBuffers[(m_bufferIndex + kNumBuffers - 1) % kNumBuffers]; frameInterpolator.uiTexture = nullptr; } frameInterpolator.outputTexture = m_interpolationOutputs[m_bufferIndex]; [frameInterpolator encodeToCommandBuffer:commandBuffer]; [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> _Nonnull) { std::unique_lock<std::mutex> lock(m_mutex); m_framesInFlight--; m_scheduleCV.notify_one(); m_paceCount++; m_pacingCV.notify_one(); }]; [commandBuffer encodeSignalEvent:m_event value:m_eventValue++]; [commandBuffer commit]; std::unique_lock<std::mutex> lock(m_mutex); m_framesInFlight++; m_numQueued++; m_presentsPending = true; m_threadCV.notify_one(); while((m_framesInFlight >= 2) || (m_numQueued >= 2)) m_scheduleCV.wait(lock); m_bufferIndex = (m_bufferIndex + 1) % kNumBuffers; m_inputIndex = m_inputIndex^1u; m_renderingUI = false; } void PresentThread::CopyTexture(id<MTLCommandBuffer> commandBuffer, id<MTLTexture> dest, id<MTLTexture> src, NSString *label) { MTLRenderPassDescriptor *desc = [MTLRenderPassDescriptor new]; desc.colorAttachments[0].texture = dest; desc.colorAttachments[0].loadAction = MTLLoadActionDontCare; desc.colorAttachments[0].storeAction = MTLStoreActionStore; id<MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor:desc]; [renderEncoder setFragmentTexture:src atIndex:0]; [renderEncoder setRenderPipelineState:m_copyPipeline]; [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:3]; if(m_uiCallback) m_uiCallback(renderEncoder); renderEncoder.label = label; [renderEncoder endEncoding]; } void PresentThread::PacingThreadFunction() { NSThread *thread = [NSThread currentThread]; [thread setName:@"PacingThread"]; [thread setQualityOfService:NSQualityOfServiceUserInteractive]; [thread setThreadPriority:1.f]; mach_timebase_info_data_t info; mach_timebase_info(&info); // maximum delta (0.1ms) in machtime units const uint64_t maxDeltaInNanoSecs = 100000000; const uint64_t maxDelta = maxDeltaInNanoSecs * info.denom / info.numer; uint64_t time = mach_absolute_time(); uint64_t paceEventValue = 0; for(;;) { std::unique_lock<std::mutex> lock(m_mutex); while(m_paceCount == 0) m_pacingCV.wait(lock); m_paceCount--; lock.unlock(); // we get signal... const uint64_t prevTime = time; time = mach_absolute_time(); m_paceEvent.signaledValue = ++paceEventValue; const uint64_t delta = std::min(time - prevTime, maxDelta); const uint64_t timeStamp = time + ((delta*31)>>6); struct kevent64_s timerEvent, eventOut; struct timespec timeout; timeout.tv_nsec = maxDeltaInNanoSecs; timeout.tv_sec = 0; EV_SET64(&timerEvent, 0, EVFILT_TIMER, EV_ADD | EV_ONESHOT | EV_ENABLE, NOTE_CRITICAL | NOTE_LEEWAY | NOTE_MACHTIME | NOTE_ABSOLUTE, timeStamp, 0, 0, 0); kevent64(m_timerQueue, &timerEvent, 1, &eventOut, 1, 0, &timeout); // main screen turn on... m_paceEvent2.signaledValue = ++paceEventValue; } } void PresentThread::PresentThreadFunction() { NSThread *thread = [NSThread currentThread]; [thread setName:@"PresentThread"]; [thread setQualityOfService:NSQualityOfServiceUserInteractive]; [thread setThreadPriority:1.f]; uint64_t eventValue = 0; uint32_t bufferIndex = 0; uint64_t paceEventValue = 0; for(;;) { std::unique_lock<std::mutex> lock(m_mutex); if(m_numQueued == 0) { m_presentsPending = false; m_scheduleCV.notify_one(); } while(m_numQueued == 0) m_threadCV.wait(lock); if(m_numQueued < 0) break; lock.unlock(); @autoreleasepool { id<CAMetalDrawable> drawable = [m_metalLayer nextDrawable]; lock.lock(); m_numQueued--; m_scheduleCV.notify_one(); lock.unlock(); id<MTLCommandBuffer> commandBuffer = [m_presentQueue commandBuffer]; [commandBuffer encodeWaitForEvent:m_event value:++eventValue]; CopyTexture(commandBuffer, drawable.texture, m_interpolationOutputs[bufferIndex], @"Copy Interpolated"); [commandBuffer encodeSignalEvent:m_event value:++eventValue]; [commandBuffer encodeWaitForEvent:m_paceEvent value:++paceEventValue]; if(m_minDuration > 0.f) [commandBuffer presentDrawable:drawable afterMinimumDuration:m_minDuration]; else [commandBuffer presentDrawable:drawable]; [commandBuffer commit]; } @autoreleasepool { id<MTLCommandBuffer> commandBuffer = [m_presentQueue commandBuffer]; id<CAMetalDrawable> drawable = [m_metalLayer nextDrawable]; CopyTexture(commandBuffer, drawable.texture, m_backBuffers[bufferIndex], @"Copy Rendered"); [commandBuffer encodeWaitForEvent:m_paceEvent2 value:++paceEventValue]; if(m_minDuration > 0.f) [commandBuffer presentDrawable:drawable afterMinimumDuration:m_minDuration]; else [commandBuffer presentDrawable:drawable]; [commandBuffer commit]; } bufferIndex = (bufferIndex + 1) % kNumBuffers; } } void PresentThread::Resize(uint32_t width, uint32_t height, MTLPixelFormat pixelFormat) { if((m_width != width) || (m_height != height) || (m_pixelFormat != pixelFormat)) { id<MTLDevice> device = m_metalLayer.device; if(m_pixelFormat != pixelFormat) { id<MTLLibrary> lib = [device newDefaultLibrary]; MTLRenderPipelineDescriptor *pipelineDesc = [MTLRenderPipelineDescriptor new]; pipelineDesc.vertexFunction = [lib newFunctionWithName:@"FSQ_VS_V4T2"]; pipelineDesc.fragmentFunction = [lib newFunctionWithName:@"FSQ_simpleCopy"]; pipelineDesc.colorAttachments[0].pixelFormat = pixelFormat; m_copyPipeline = [device newRenderPipelineStateWithDescriptor:pipelineDesc error:nil]; m_pixelFormat = pixelFormat; } DrainPendingPresents(); m_width = width; m_height = height; MTLTextureDescriptor *texDesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:pixelFormat width:width height:height mipmapped:NO]; texDesc.storageMode = MTLStorageModePrivate; for(uint32_t i = 0; i < kNumBuffers; i++) { texDesc.usage = MTLTextureUsageShaderRead|MTLTextureUsageShaderWrite|MTLTextureUsageRenderTarget; m_backBuffers[i] = [device newTextureWithDescriptor:texDesc]; texDesc.usage = MTLTextureUsageShaderRead|MTLTextureUsageRenderTarget; m_interpolationOutputs[i] = [device newTextureWithDescriptor:texDesc]; } texDesc.usage = MTLTextureUsageShaderRead|MTLTextureUsageRenderTarget; m_interpolationInputs[0] = [device newTextureWithDescriptor:texDesc]; m_interpolationInputs[1] = [device newTextureWithDescriptor:texDesc]; } }
-
13:00 - Set intersection function table offset
// Set intersection function table offset on host-side geometry descriptors NSMutableArray<MTLAccelerationStructureGeometryDescriptor *> *geomDescs ...; for (auto g = 0; g < geomList.size(); ++g) { MTLAccelerationStructureGeometryDescriptor *descriptor = ...; descriptor.intersectionFunctionTableOffset = g; ... [geomDescs addObject:descriptor]; }
-
13:01 - Set up the intersector
// Set up the intersector metal::raytracing::intersector<intersection_function_buffer, instancing, triangle> trace; trace.set_geometry_multiplier(2); // Number of ray types, defaults to 1 trace.set_base_id(1); // Set ray type index, defaults to 0
-
13:02 - Ray trace intersection function buffers
// Ray trace intersection function buffers // Set up intersection function buffer arguments intersection_function_buffer_arguments ifb_arguments; ifb_arguments.intersection_function_buffer = raytracingResources.ifbBuffer; ifb_arguments.intersection_function_buffer_size = raytracingResources.ifbBufferSize; ifb_arguments.intersection_function_stride = raytracingResources.ifbBufferStride; // Set up the ray and finish intersecting metal::raytracing::ray r = { origin, direction }; auto result = trace.intersect(r, ads, ifb_arguments);
-
13:02 - Change of temporal scaler setup to denoised temporal scaler setup
// Change of temporal scaler setup to denoised temporal scaler setup MTLFXTemporalScalerDescriptor* desc = [MTLFXTemporalScalerDescriptor new]; desc.colorTextureFormat = MTLPixelFormatBGRA8Unorm_sRGB; desc.outputTextureFormat = MTLPixelFormatBGRA8Unorm_sRGB; desc.depthTextureFormat = DepthStencilFormat; desc.motionTextureFormat = MotionVectorFormat; desc.diffuseAlbedoTextureFormat = DiffuseAlbedoFormat; desc.specularAlbedoTextureFormat = SpecularAlbedoFormat; desc.normalTextureFormat = NormalVectorFormat; desc.roughnessTextureFormat = RoughnessFormat; desc.inputWidth = _mainViewWidth; desc.inputHeight = _mainViewHeight; desc.outputWidth = _screenWidth; desc.outputHeight = _screenHeight; temporalScaler = [desc newTemporalDenoisedScalerWithDevice:_device];
-
13:04 - Change temporal scaler encode to denoiser temporal scaler encode
// Change temporal scaler encode to denoiser temporal scaler encode temporalScaler.colorTexture = _mainView; temporalScaler.motionTexture = _motionTexture; temporalScaler.diffuseAlbedoTexture = _diffuseAlbedoTexture; temporalScaler.specularAlbedoTexture = _specularAlbedoTexture; temporalScaler.normalTexture = _normalTexture; temporalScaler.roughnessTexture = _roughnessTexture; temporalScaler.depthTexture = _depthTexture; temporalScaler.jitterOffsetX = _pixelJitter.x; temporalScaler.jitterOffsetY = -_pixelJitter.y; temporalScaler.outputTexture = _upscaledColorTarget; temporalScaler.motionVectorScaleX = (float)_motionTexture.width; temporalScaler.motionVectorScaleY = (float)_motionTexture.height; [temporalScaler encodeToCommandBuffer:commandBuffer];
-
16:04 - Creating instance descriptors for instance acceleration structure
// Creating instance descriptors for instance acceleration structure MTLAccelerationStructureInstanceDescriptor *grassInstanceDesc, *treeInstanceDesc = . . .; grassInstanceDesc.intersectionFunctionTableOffset = 0; treeInstanceDesc.intersectionFunctionTableOffset = 1; // Create buffer for instance descriptors of as many trees/grass instances the scene holds id <MTLBuffer> instanceDescs = . . .; for (auto i = 0; i < scene.instances.size(); ++i) . . .
-