大多数浏览器和
Developer App 均支持流媒体播放。
-
在你的 App 中录制电影效果视频
了解如何使用 Cinematic Video API,让你的 App 轻松拍摄大片感满屏的精彩视频。我们将介绍如何配置电影效果拍摄会话,并讲解构建视频拍摄 UI 的基本知识。我们还将探索高级电影效果功能,例如应用景深效果来实现动态跟踪拍摄和巧妙的焦点切换。
章节
- 0:00 - Introduction
- 0:33 - Cinematic video
- 3:44 - Build a great cinematic capture experience
资源
相关视频
WWDC25
WWDC24
WWDC23
-
搜索此视频…
Hi, I'm Roy. I’m an engineer on the Camera Software team. Today, I’m excited to talk about how your apps can easily capture pro-level cinema-style videos with a Cinematic Video API.
With iPhone 13 and 13 Pro, we introduced Cinematic mode. With its intuitive user interface and powerful algorithms, It transformed iPhone into a cinematography powerhouse. In this talk, we will have a look at what makes Cinematic video magical and walk through some code together to see how to build a great Cinematic capture experience.
So, what is Cinematic video? At its heart are classic storytelling tools like rect focus and tracking focus. With a shallow depth of field, the director guides viewers attention to the key subjects in the scene, enhancing narrative impacts. When subjects move, as they often do in films, tracking focus keeps them sharply in view.
Though powerful, in the real world, these focus techniques require a great deal of expertise, which is why on a movie set, there are focus pollers whose main responsibility is to carry out these powerful but challenging shots. Cinematic video drastically simplifies this by intelligently driving focus decisions. For example, when a subject enters the frame, the algorithm automatically racks the focus to them and starts tracking. When a subject looks away, the focus automatically transitions to another point, returning to the subject when appropriate. This year, we're making these amazing capabilities available as the Cinematic Video API, so your apps can easily capture these amazing cinema-style videos. Let’s explore how we can build a great capture experience for Cinematic videos using the new API. Let’s start with a typical capture session for a video app.
Firstly, we select the device from which we want to capture movies.
Then we add it to a device input. Depending on the use cases, multiple outputs can be added to the session. Connection objects will be created when these outputs are added to the capture session.
This is not a trivial setup, but enabling Cinematic video capture is really easy. In iOS 26, we're adding a new property, isCinematicVideoCaptureEnabled on the AVCaptureDeviceInput class. By setting it to true, we configure the whole capture session to output Cinematic video. and each of the outputs will now receive the Cinematic treatment.
The movie file produced by the movie file output will be Cinematic. It contains the disparity data, metadata, and the original video that enables non-destructive editing. To play it back with the bokeh rendered or edit the bokeh effect, you can use the Cinematic Framework we introduced in 2023. To learn more about this framework, please check out the WWDC23 session Support Cinematic mode videos in your app. The video data output will produce frames with a shallow depth of field effect baked in. This is useful when you need direct access to the frames, such as when sending them to a remote device.
Similarly, the preview layer will have the bokeh rendered into it in real time. It's an easy way to build a viewfinder. With this high-level architecture in mind, let’s walk through some code in these following areas.
We will configure an AVCaptureSession with all its components required for Cinematic capture.
Then we build an interface for video capture using SwiftUI.
We will walk through how to get metadata like face detections and how to draw them on the screen.
With different ways to manually drive focus, we tap into the full power of Cinematic video.
And we finish off with some advanced features to make our app more polished.
Let’s get started with the capture session. First, let’s find the video device from which we want to capture the movie. To find the device, we create an AVCaptureDevice.DiscoverySession object.
Cinematic video is supported on both the Dual Wide camera in the back and the TrueDepth camera in the front. In this case, we specify .builtInDualWideCamera in the array of device types. Since we’re shooting a movie, we use .video as the mediaType.
And we request the camera in the back of the device.
As we’re only requesting a single device type, we can just get the first element in the discovery session's devices array.
In order to enable Cinematic video capture, a format that supports this feature must be used.
To find such formats, we can iterate through all the device’s formats and use the one whose isCinematicVideoCaptureSupported property returns true.
Here are all the supported formats.
For both Dual Wide and TrueDepth cameras, both 1080p and 4K are supported at 30 frames per second.
If you are interested in recording SDR or EDR content, you can use either 420 video range or full range. If we prefer 10-bit HDR video content, use x420 instead.
Since we’re not making a silent film, we want sound as well. We will use the same DiscoverySession API to find the microphone.
With our devices in hand, we create the inputs for each one of them. Then we add these inputs to the capture session. At this point, we can turn on Cinematic video capture on the video input. To enhance the Cinematic experience, we can capture spatial audio by simply setting first order ambisonics as the multichannelAudioMode.
To learn more about spatial audio, please check out this year's session, “Enhance your app’s audio content creation capabilities.” Moving on to the outputs, we create an AVCaptureMovieFileOutput object and add it to the session.
Our hands are never as steady as a tripod, so we recommend enabling video stabilization. To do so, we first find the video connection of the movieFileOutput and set its preferredVideoStabilizationMode. In this case, we use cinematicExtendedEnhanced.
Lastly, we need to associate our preview layer with the capture session. We’re done with the capture session for now. Let's move on to the user interface.
Since AVCaptureVideoPreviewLayer is a subclass of CALayer, which is not part of SwiftUI, to make them interoperate, we need to wrap the preview layer into a struct that conforms to the UIViewRepresentable protocol. Within this struct, we make a UIView subclass CameraPreviewUIView.
We override its layerClass property to make the previewLayer the backing layer for the view.
And we make a previewLayer property to make it easily accessible as an AVCaptureVideoPreviewLayer type.
We can then put our preview view into a ZStack, where it can be easily composed with other UI elements like camera controls.
As mentioned in the intro, shallow depth of field is an important tool for storytelling. By changing the simulatedAperture property on the device input, we can adjust the global strength of the bokeh effect. Displayed on the right, driving this property with a slider, we change the global strength of the blur.
This value is expressed in the industry standard f-stops, which is simply the ratio between the focal length and the aperture diameter.
Moving them around, the aperture is the focal length divided by the f number.
Therefore, the smaller the f number, the larger the aperture, and the stronger the bokeh will be.
We can find the minimum, maximum, and default simulated aperture on the format.
We use them to populate the appropriate UI elements, like a slider.
Now, let’s build some affordances that allow the user to manually interact with Cinematic video. For users to manually drive focus, we need to show visual indicators for focus candidates like faces. And to do that, we need some detection metadata.
We will use an AVCaptureMetadataOutput to get these detections so we can draw their bounds on the screen for users to interact with. The Cinematic video algorithm requires certain metadataObjectTypes to work optimally. And they are communicated with the new property requiredMetadataObjectTypesForCinematicVideoCapture. An exception is thrown if the metadataObjectTypes provided differ from this list when Cinematic video is enabled.
Lastly, we need to provide a delegate to receive the metadata and a queue on which the delegate is called.
we receive metadata objects in the metadata output delegate callback.
To easily communicate this metadata to our view layer in SwiftUI, we use an observable class.
When we update its property, the observing view will automatically refresh.
In our view layer, whenever our observable object is updated, the view is automatically redrawn. And we draw a rectangle for each metadataObject.
When creating these rectangles, it’s important that we transform metadata’s bounds into the preview layer’s coordinate space. Using the layerRectConverted fromMetadataOutputRect method.
Note that X and Y in the position method refer to the center of the view, instead of the upper left corner used by AVFoundation. So we need to adjust accordingly by using the midX and midY of the rect.
With metadata rectangles drawn on the screen, we can use them to manually drive focus.
The Cinematic Video API offers three ways to manually focus. Let's now walk through them one by one. The setCinematicVideoTrackingFocus detectedObjectID focusMode method can be used to rack focus to a particular subject identified by the detectedObjectID, which is available on the AVMetadata objects that you get from the metadata output. focusMode configures Cinematic video’s tracking behavior. The CinematicVideoFocusMode enum has three cases: none, strong, and weak. Strong tells Cinematic video to keep tracking a subject even when there are focus candidates that would have been otherwise automatically selected.
In this case, although the cat became more prominent in the frame, the strong focus, as indicated by the solid yellow rectangle, stayed locked on the subject in the back. Weak focus, on the other hand, lets the algorithm retain focus control. It automatically racks the focus when it sees fit. In this case, as the cat turned around, he was considered more important, and the weak focus shifted automatically to him, as indicated by the dashed rectangle.
The none case is only useful when determining whether a metadata object currently has the focus, so it should not be used when setting the focus.
The second focus method takes a different first parameter. Instead of a detected object ID, it takes a point in a view.
It tells Cinematic video to look for any interesting object at the specified point. When it finds one, it will create a new metadata object with the type salient object. So we can draw the rectangle around it on the screen.
The third focus method is setCinematicVideoFixedFocus that takes a point and the focus mode. It sets the focus at a fixed distance which is computed internally using signals such as depth. Paired with a strong focus mode, this method effectively locks the focus at a particular plane in the scene, ignoring other activities even in the foreground. Any apps can implement the focus logic that makes sense in their respective use cases. In our app, we do the following: Tapping on a detection rectangle not in focus, we rack the focus to it with a weak focus. With this, we can switch the focus back and forth between subjects in and out of focus.
Tapping on a metadata object already being weakly focused on turns it into a strong focus, indicated by the solid yellow rectangle.
Tapping at a point where there are no existing detections, we want Cinematic video to try to find any salient object and weakly focus on that. With a long press, we set a strong fixed focus. Here is how we can implement this logic in code. Firstly, we need to make two gestures.
The regular tap gesture can be easily done with a SpatialTapGesture, which provides the tap location that we need to set focus.
When tapped, we call the focusTap method on our camera model object, where we have access to the underlying AVCaptureDevice.
Long press, on the other hand, is a bit more complicated, as the built-in longPressGesture doesn’t provide the tap location we need to simulate a long press with a DragGesture.
When pressed, we start at 0.3 second timer.
When it fires, we call the focusLongPress method on the camera model.
Then we create a rectangle view to receive the gestures. It’s inserted at the end of the ZStack, which puts it on top of all the detection rectangles so the user’s gesture input is not blocked.
As we already saw in the previous videos, it's important to visually differentiate the focused rectangles between weak focus, strong focus, and no focus to help the user take the right action.
We do this by implementing a method that takes an AVMetadataObject and returns a focused rectangle view. Let’s not forget that we need to transform the bounds of the metadata from the metadata output’s coordinate space to that of the preview layer.
By setting different stroke styles and colors, we can easily create visually distinct rectangles for each focus mode.
With the point passed from the view layer, we can determine which focus method to use. First, we need to figure out whether the user has tapped on a metadata rectangle. And we do this in the helper method, findTappedMetadataObject.
Here, we iterate through all the metadata that we cache for each frame and check whether the point specified falls into one of their bounds. Again, we make sure the point and the rect are in the same coordinate space.
Coming back to the focusTap method, if a metadata object is found and is already in weak focus, then we turn it into a strong focus.
If it’s not already in focus, we focus on it weakly.
If the user didn’t tap on a metadata rectangle, then we tell the framework to try to find a salient object at this point. With a long press, we simply set a strong fixed focus at the specified point. At this point, we have a fully functional app that can capture Cinematic video. Let’s polish it up with a few more details. Currently, our video capture graph looks like this. We have three outputs to capture the movie, receive metadata, and the preview. If we want to support still image capture during the recording, we can do so by simply adding an AVCapturePhotoOutput to the session.
Since our graph is already configured to be Cinematic, the photo output will get a Cinematic treatment automatically.
The image returned by the photo output will have the bokeh effect burned in.
Lastly, the Cinematic video algorithm requires sufficient amount of light to function properly. So in a room that’s too dark or the camera is covered, We want to inform users of such problem in the UI. In order to be notified when this condition occurs, you can key-value observe the new property cinematicVideoCaptureSceneMonitoringStatuses on the AVCaptureDevice class. Currently, the only supported status for Cinematic video is not enough light.
In the KVO handler, we can update the UI properly when we see insufficient light.
An empty set means that everything is back to normal.
In today’s talk, we had a recap on how Cinematic video enables our users to capture gorgeous pro-level movies, even for everyday moments like hanging out with their pets. And we had a detailed walkthrough on how to build a great Cinematic capture experience with the Cinematic Video API. We can’t wait to see how your apps can tap into these capabilities to deliver richer, more cinematic content. Thank you for watching.
-
-
4:26 - Select a video device
// Select a video device let deviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualWideCamera], mediaType: .video, position: .back) guard let camera = deviceDiscoverySession.devices.first else { print("Failed to find the capture device") return }
-
5:07 - Select a format that supports Cinematic Video capture
// Select a format that supports Cinematic Video capture for format in camera.formats { if format.isCinematicVideoCaptureSupported { try! camera.lockForConfiguration() camera.activeFormat = format camera.unlockForConfiguration() break } }
-
5:51 - Select a microphone
// Select a microphone let audioDeviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes [.microphone], mediaType: .audio, position: .unspecified) guard let microphone = audioDeviceDiscoverySession.devices.first else { print("Failed to find a microphone") return }
-
6:00 - Add devices to input & add inputs to the capture session & enable Cinematic Video capture
// Add devices to inputs let videoInput = try! AVCaptureDeviceInput(device: camera) guard captureSession.canAddInput(videoInput) else { print("Can't add the video input to the session") return } let audioInput = try! AVCaptureDeviceInput(device: microphone) guard captureSession.canAddInput(audioInput) else { print("Can't add the audio input to the session") return } // Add inputs to the capture session captureSession.addInput(videoInput) captureSession.addInput(audioInput) // Enable Cinematic Video capture if (videoInput.isCinematicVideoCaptureSupported) { videoInput.isCinematicVideoCaptureEnabled = true }
-
6:17 - Capture spatial audio
// Configure spatial audio if audioInput.isMultichannelAudioModeSupported(.firstOrderAmbisonics) { audioInput.multichannelAudioMode = .firstOrderAmbisonics }
-
// Add outputs to the session let movieFileOutput = AVCaptureMovieFileOutput() guard captureSession.canAddOutput(movieFileOutput) else { print("Can't add the movie file output to the session") return } captureSession.addOutput(movieFileOutput) // Configure video stabilization if let connection = movieFileOutput.connection(with: .video), connection.isVideoStabilizationSupported { connection.preferredVideoStabilizationMode = .cinematicExtendedEnhanced } // Add a preview layer as the view finder let previewLayer = AVCaptureVideoPreviewLayer() previewLayer.session = captureSession
-
7:11 - Display the preview layer with SwiftUI
// Display the preview layer with SwiftUI struct CameraPreviewView: UIViewRepresentable { func makeUIView(context: Context) -> PreviewView { return PreviewView() } class CameraPreviewUIView: UIView { override class var layerClass: AnyClass { AVCaptureVideoPreviewLayer.self } var previewLayer: AVCaptureVideoPreviewLayer { layer as! AVCaptureVideoPreviewLayer } ... } ... }
-
7:54 - Display the preview layer with SwiftUI
// Display the preview layer with SwiftUI @MainActor struct CameraView: View { var body: some View { ZStack { CameraPreviewView() CameraControlsView() } } }
-
8:05 - Adjust bokeh strength with simulated aperture
// Adjust bokeh strength with simulated aperture open class AVCaptureDeviceInput : AVCaptureInput { open var simulatedAperture: Float ... }
-
8:40 - Find min, max, and default simulated aperture
// Adjust bokeh strength with simulated aperture extension AVCaptureDeviceFormat { open var minSimulatedAperture: Float { get } open var maxSimulatedAperture: Float { get } open var defaultSimulatedAperture: Float { get } ... }
-
9:12 - Add a metadata output
// Add a metadata output let metadataOutput = AVCaptureMetadataOutput() guard captureSession.canAddOutput(metadataOutput) else { print("Can't add the metadata output to the session") return } captureSession.addOutput(metadataOutput) metadataOutput.metadataObjectTypes = metadataOutput.requiredMetadataObjectTypesForCinematicVideoCapture metadataOutput.setMetadataObjectsDelegate(self, queue: sessionQueue)
-
9:50 - Update the observed manager object
// Update the observed manager object func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection) { self.metadataManager.metadataObjects = metadataObjects } // Pass metadata to SwiftUI @Observable class CinematicMetadataManager { var metadataObjects: [AVMetadataObject] = [] }
-
10:12 - Observe changes and update the view
// Observe changes and update the view struct FocusOverlayView : View { var body: some View { ForEach( metadataManager.metadataObjects, id:\.objectID) { metadataObject in rectangle(for: metadataObject) } } }
-
10:18 - Make a rectangle for a metadata
// Make a rectangle for a metadata private func rectangle(for metadata: AVMetadataObjects) -> some View { let transformedRect = previewLayer.layerRectConverted(fromMetadataOutputRect: metadata.bounds) return Rectangle() .frame(width:transformedRect.width, height:transformedRect.height) .position( x:transformedRect.midX, y:transformedRect.midY) }
-
10:53 - Focus methods
open func setCinematicVideoTrackingFocus(detectedObjectID: Int, focusMode: AVCaptureDevice.CinematicVideoFocusMode) open func setCinematicVideoTrackingFocus(at point: CGPoint, focusMode: AVCaptureDevice.CinematicVideoFocusMode) open func setCinematicVideoFixedFocus(at point: CGPoint, focusMode: AVCaptureDevice.CinematicVideoFocusMode)
-
10:59 - Focus method 1 & CinematicVideoFocusMode
// Focus methods open func setCinematicVideoTrackingFocus(detectedObjectID: Int, focusMode: AVCaptureDevice.CinematicVideoFocusMode) public enum CinematicVideoFocusMode : Int, @unchecked Sendable { case none = 0 case strong = 1 case weak = 2 } extension AVMetadataObject { open var cinematicVideoFocusMode: Int32 { get } }
-
12:19 - Focus method no.2
// Focus method no.2 open func setCinematicVideoTrackingFocus(at point: CGPoint, focusMode: AVCaptureDevice.CinematicVideoFocusMode)
-
12:41 - Focus method no.3
// Focus method no.3 open func setCinematicVideoFixedFocus(at point: CGPoint, focusMode: AVCaptureDevice.CinematicVideoFocusMode)
-
13:54 - Create the spatial tap gesture
var body: some View { let spatialTapGesture = SpatialTapGesture() .onEnded { event in Task { await camera.focusTap(at: event.location) } } ... }
-
14:15 - Simulate a long press gesture with a drag gesture
@State private var pressLocation: CGPoint = .zero @State private var isPressing = false private let longPressDuration: TimeInterval = 0.3 var body: some View { ... let longPressGesture = DragGesture(minimumDistance: 0).onChanged { value in if !isPressing { isPressing = true pressLocation = value.location startLoopPressTimer() } }.onEnded { _ in isPressing = false } ... } private func startLoopPressTimer() { DispatchQueue.main.asyncAfter(deadline: .now() + longPressDuration) { if isPressing { Task { await camera.focusLongPress(at: pressLocation) } } } }
-
14:36 - Create a rectangle view to receive gestures.
var body: some View { let spatialTapGesture = ... let longPressGesture = ... ZStack { ForEach( metadataManager.metadataObjects, id:\.objectID) { metadataObject in rectangle(for: metadataObject) } Rectangle() .fill(Color.clear) .contentShape(Rectangle()) .gesture(spatialTapGesture) .gesture(longPressGesture)} } }
-
15:03 - Create the rectangle view
private func rectangle(for metadata: AVMetadataObject) -> some View { let transformedRect = previewLayer.layerRectConverted(fromMetadataOutputRect: metadata.bounds) var color: Color var strokeStyle: StrokeStyle switch metadata.focusMode { case .weak: color = .yellow strokeStyle = StrokeStyle(lineWidth: 2, dash: [5,4]) case .strong: color = .yellow strokeStyle = StrokeStyle(lineWidth: 2) case .none: color = .white strokeStyle = StrokeStyle(lineWidth: 2) } return Rectangle() .stroke(color, style: strokeStyle) .contentShape(Rectangle()) .frame(width: transformedRect.width, height: transformedRect.height) .position(x: transformedRect.midX, y: transformedRect.midY) }
-
15:30 - Implement focusTap
func focusTap(at point:CGPoint) { try! camera.lockForConfiguration() if let metadataObject = findTappedMetadataObject(at: point) { if metadataObject.cinematicVideoFocusMode == .weak { camera.setCinematicVideoTrackingFocus(detectedObjectID: metadataObject.objectID, focusMode: .strong) } else { camera.setCinematicVideoTrackingFocus(detectedObjectID: metadataObject.objectID, focusMode: .weak) } } else { let transformedPoint = previewLayer.metadataOutputRectConverted(fromLayerRect: CGRect(origin:point, size:.zero)).origin camera.setCinematicVideoTrackingFocus(at: transformedPoint, focusMode: .weak) } camera.unlockForConfiguration() }
-
15:42 - Implement findTappedMetadataObject
private func findTappedMetadataObject(at point: CGPoint) -> AVMetadataObject? { var metadataObjectToReturn: AVMetadataObject? for metadataObject in metadataObjectsArray { let layerRect = previewLayer.layerRectConverted(fromMetadataOutputRect: metadataObject.bounds) if layerRect.contains(point) { metadataObjectToReturn = metadataObject break } } return metadataObjectToReturn }
-
16:01 - focusTap implementation continued
func focusTap(at point:CGPoint) { try! camera.lockForConfiguration() if let metadataObject = findTappedMetadataObject(at: point) { if metadataObject.cinematicVideoFocusMode == .weak { camera.setCinematicVideoTrackingFocus(detectedObjectID: metadataObject.objectID, focusMode: .strong) } else { camera.setCinematicVideoTrackingFocus(detectedObjectID: metadataObject.objectID, focusMode: .weak) } } else { let transformedPoint = previewLayer.metadataOutputRectConverted(fromLayerRect: CGRect(origin:point, size:.zero)).origin camera.setCinematicVideoTrackingFocus(at: transformedPoint, focusMode: .weak) } camera.unlockForConfiguration() }
-
16:23 - Implement focusLongPress
func focusLongPress(at point:CGPoint) { try! camera.lockForConfiguration() let transformedPoint = previewLayer.metadataOutputRectConverted(fromLayerRect:CGRect(origin: point, size: CGSizeZero)).origin camera.setCinematicVideoFixedFocus(at: pointInMetadataOutputSpace, focusMode: .strong) camera.unlockForConfiguration() }
-
17:10 - Introduce cinematicVideoCaptureSceneMonitoringStatuses
extension AVCaptureDevice { open var cinematicVideoCaptureSceneMonitoringStatuses: Set<AVCaptureSceneMonitoringStatus> { get } } extension AVCaptureSceneMonitoringStatus { public static let notEnoughLight: AVCaptureSceneMonitoringStatus }
-
17:42 - KVO handler for cinematicVideoCaptureSceneMonitoringStatuses
private var observation: NSKeyValueObservation? observation = camera.observe(\.cinematicVideoCaptureSceneMonitoringStatuses, options: [.new, .old]) { _, value in if let newStatuses = value.newValue { if newStatuses.contains(.notEnoughLight) { // Update UI (e.g., "Not enough light") } else if newStatuses.count == 0 { // Back to normal. } } }
-
-
- 0:00 - Introduction
Use the Cinematic Video API to capture pro-level cinema-style videos in your apps. iPhone 13 and 13 Pro introduced Cinematic mode, which transformed iPhone into a cinematography powerhouse.
- 0:33 - Cinematic video
Cinematic video uses shallow depth of field and tracking focus to guide viewer attention, mimicking film techniques. The Cinematic Video API in iOS 26 simplifies this process, enabling apps to automatically rack and track focus. To build a Cinematic capture experience, set up a capture session, select a device, and then enable Cinematic video capture by setting 'isCinematicVideoCaptureEnabled' to true on the 'AVCaptureDeviceInput' class. This configures the session to output Cinematic video with disparity data, metadata, and the original video, allowing for non-destructive editing. You can play back or edit the bokeh rendering with the Cinematic Framework.
- 3:44 - Build a great cinematic capture experience
The example begins by setting up an 'AVCaptureSession' to enable Cinematic video capture on compatible devices, such as the Dual Wide camera on the back and the TrueDepth camera on the front. The example selects an appropriate video format, with 'isCinematicVideoCaptureSupported' returning as true, and then adds audio input from the microphone to the session. Spatial audio capture is enabled to enhance the Cinematic experience. To learn more about Spatial Audio, see "Enhance your app's audio content creation capabilities". Next, video stabilization is enabled, to enhance the user experience, and the capture session is previewed using a SwiftUI view. Then, the example creates a custom representable struct to wrap the 'AVCaptureVideoPreviewLayer', allowing it to be integrated seamlessly into the SwiftUI interface. The example then delves into controlling the Cinematic video effect, specifically the shallow depth of field. By adjusting the 'simulatedAperture', the bokeh effect can be strengthened or weakened, providing more creative control over the video. To enable manual focus control, the example implements metadata detection to identify focus candidates, such as faces. Then, it draws rectangles on the screen to represent these candidates, allowing users to tap and focus on specific subjects. The Cinematic Video API provides several methods to control focus during video recording. The API outputs metadata objects that include information about detected subjects in the frame. The 'focusMode' configuration parameter determines the tracking behavior of Cinematic video. There are three cases for this enum: 'none', 'strong', and 'weak'. Strong focus locks onto a subject, ignoring other potential focus candidates. Weak focus allows the algorithm to automatically rack focus based on the scene. The none case is primarily used to determine focus status rather than set it. The API offers three focus methods: 'setCinematicVideoTrackingFocus' method takes a detected object ID as input and sets the focus to that object. 'setCinematicVideoTrackingFocus' method takes a point in the view as input. Cinematic video then searches for an interesting object at that point and creates a new metadata object of type 'salient object', which can then be focused on. 'setCinematicVideoFixedFocus' sets a fixed focus at a specific point in the scene, computing the distance internally using depth signals. When paired with a strong focus mode, this locks the focus on a particular plane, ignoring other activities in the scene. You can implement custom focus logic in your apps. For example, tapping on a detection rectangle can switch focus between subjects, and a long press can set a strong fixed focus. The app visually differentiates between weak, strong, and no focus to guide the user. Additionally, the API allows for still image capture during recording, which will automatically receive the Cinematic treatment with the bokeh effect. The app can also use key-value observation to observe the 'cinematicVideoCaptureSceneMonitoringStatuses' property, and inform the user when there is insufficient light for proper Cinematic video capture.