Matter commissioning issue with Matter support extension

My team has developed an app with a Matter commissioner feature (for own ecosystem) using the Matter framework on the MatterSupport extension.

Recently, we've noticed that commissioning Matter devices with the MatterSupport extension has become very unstable. Occasionally, the HomeUIService stops the flow after commissioning to the first fabric successfully, displaying the error: "Failed to perform Matter device setup: Error Domain=HMErrorDomain Code=2." (normally, it should send open commissioning window to the device and then add the device to the 2nd fabric). The issue is never seen before until recently few weeks and there is no code changes in the app. We are suspected that there is some data that fail to download from the icloud or apple account that cause this problem.

For evaluation, we tried removing the HomeSupport extension and run the Matter framework directly in developer mode, this issue disappears, and commissioning works without any problems.

Answered by DTS Engineer in 835748022

So, let me start with that error here:

displaying the error: "Failed to perform Matter device setup: Error Domain=HMErrorDomain Code=2."

Error 2 is "HMErrorCodeNotFound". Unfortunately, that's a fairly general error that's used in a large number of different context to basically mean "I didn't have/get something I expected". Given that you seem to be able to reproduce the issue, here's what I would suggest doing next.

First off, if at all possible, do this testing on a dedicated test device and with the minimum possible home configuration. This is always feasible but every additional app, accessory, or configuration choice introduces more log activity. Log activity is what makes this process difficult, so anything you can do to reduce that noise is helpful.

Next, please install the following profiles on the device that's failing:

That's obviously a lot of profiles, but the goal here is to get "all" of the necessary information in a single pass so that we don't end up in a situation where the log tells what component the failure happened in but not what the actual problem is.

Once those profile are installed do the following:

  • Turn the device off.

  • Leave it alone for "awhile". The exact amount of time doesn't matter, but longer is always better. As little as 10-15 minutes is fine, overnight is fabulous.

The goal here to create a large time gap in the console log, making it easier to cut out/ignore old data.

When your ready to start testing, do the following:

  • Turn the device on and unlock it.

  • Give the device a few minutes, then start testing.

  • When the problem occurs, note the time it occurred, then wait a few minutes.

  • Trigger a sysdiagnose and collect the data.

Obviously that's the "ideal" flow when the problem is relatively easy to replicate. If the problem is more intermittent (or, for example, it only happens on an end user device), then you should do the same setup as above and then do your normal testing until the failure happens. Once the failure happens, the critical points are:

  • When exactly the log is captured isn't that important. Eventually the system does purge data, but as long as the device has plenty of storage their isn't very much difference between a log collected immediately after and a log capture several hours later.

  • It is important that you NOT reboot the device until after you've collected the sysdiagnose. Many components purge their log data when the device reboots, which makes that log data largely useless.

Once you've got a sysdiagnose, please file a bug describing what happened, what time the failure occurred, and then upload the sysdiagnose. After that's done, please post the bug number back here and I'll see what I can determine. If it takes awhile to get the bug data, then you can also file a code-level support request that includes my name and a link to this post.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

So, let me start with that error here:

displaying the error: "Failed to perform Matter device setup: Error Domain=HMErrorDomain Code=2."

Error 2 is "HMErrorCodeNotFound". Unfortunately, that's a fairly general error that's used in a large number of different context to basically mean "I didn't have/get something I expected". Given that you seem to be able to reproduce the issue, here's what I would suggest doing next.

First off, if at all possible, do this testing on a dedicated test device and with the minimum possible home configuration. This is always feasible but every additional app, accessory, or configuration choice introduces more log activity. Log activity is what makes this process difficult, so anything you can do to reduce that noise is helpful.

Next, please install the following profiles on the device that's failing:

That's obviously a lot of profiles, but the goal here is to get "all" of the necessary information in a single pass so that we don't end up in a situation where the log tells what component the failure happened in but not what the actual problem is.

Once those profile are installed do the following:

  • Turn the device off.

  • Leave it alone for "awhile". The exact amount of time doesn't matter, but longer is always better. As little as 10-15 minutes is fine, overnight is fabulous.

The goal here to create a large time gap in the console log, making it easier to cut out/ignore old data.

When your ready to start testing, do the following:

  • Turn the device on and unlock it.

  • Give the device a few minutes, then start testing.

  • When the problem occurs, note the time it occurred, then wait a few minutes.

  • Trigger a sysdiagnose and collect the data.

Obviously that's the "ideal" flow when the problem is relatively easy to replicate. If the problem is more intermittent (or, for example, it only happens on an end user device), then you should do the same setup as above and then do your normal testing until the failure happens. Once the failure happens, the critical points are:

  • When exactly the log is captured isn't that important. Eventually the system does purge data, but as long as the device has plenty of storage their isn't very much difference between a log collected immediately after and a log capture several hours later.

  • It is important that you NOT reboot the device until after you've collected the sysdiagnose. Many components purge their log data when the device reboots, which makes that log data largely useless.

Once you've got a sysdiagnose, please file a bug describing what happened, what time the failure occurred, and then upload the sysdiagnose. After that's done, please post the bug number back here and I'll see what I can determine. If it takes awhile to get the bug data, then you can also file a code-level support request that includes my name and a link to this post.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I am a colleague of the original poster MarcoKhoo. I would like to supplement with the complete log of the issues we have encountered.

Logs:

错误 11:03:30.791227+0800 homed [188914BD-5163-425C-9E59-CAE9BFA1A288] Could not find system commissioner pairing for newly staged server with identifier <private> in all pairings: <private> 默认 11:03:30.791335+0800 homed Answering incoming message HMASC.m.confirmDeviceCredential (602718BC-314B-4555-9D32-2874A9D6207F) from client 'HomeUIService' that expects a response with error Error Domain=HMErrorDomain Code=2 "(null)" 错误 11:03:30.791488+0800 homed [188914BD-5163-425C-9E59-CAE9BFA1A288] tag="stagedPairingFailure" desc="Could not find system commissioner pairing for newly staged server" errorDomain="HMErrorDomain" errorCode="2" 错误 11:03:30.791779+0800 HomeUIService [C27DF567] Failed to stage CHIP accessory pairing in steps: Error Domain=HMErrorDomain Code=2 错误 11:03:30.791920+0800 HomeUIService -[HSSetupStateMachineCHIPPartnerConfiguration stageCHIPAccessory]_block_invoke Failed to stage CHIP accessory: Error Domain=HMErrorDomain Code=2 默认 11:03:30.792021+0800 HomeUIService -[HSSetupStateMachineConfiguration setPairingError:] *** Setting pairingError *** = Error Domain=HMErrorDomain Code=2 默认 11:03:30.792114+0800 HomeUIService -[HSSetupStateMachineCHIPPartnerConfiguration setPhase:] old phase: PairingInProgress new phase: PermanentlyFailed 错误 11:03:30.792260+0800 HomeUIService Unexpected accessory <private> setup error Error Domain=HMErrorDomain Code=2 默认 11:03:30.793215+0800 HomeUIService Updating status title: "<private>" description: "<private>" 默认 11:03:30.793335+0800 HomeUIService stageCHIPAccessory hit error Error Domain=HMErrorDomain Code=2 默认 11:03:30.793417+0800 HomeUIService -[HSSetupStateMachineConfiguration setPairingError:] *** Setting pairingError *** = Error Domain=HMErrorDomain Code=2 默认 11:03:30.793627+0800 HomeUIService -[HSSetupStateMachineConfiguration setPairingError:] *** Setting pairingError *** = Error Domain=HMErrorDomain Code=2 UserInfo={HFErrorUserInfoOptionsKey=<private>} 默认 11:03:30.793715+0800 HomeUIService <private> 318 -[HSProximityCardHostViewController coordinator:updatedConfiguration:]: Pairing 默认 11:03:30.793790+0800 HomeUIService Calculating potential skip of: Error 默认 11:03:30.793850+0800 HomeUIService Show step: Error 默认 11:03:30.793931+0800 HomeUIService <private> tuple 0xd0c3c1a70 accessory (null) nextViewController Pairing->Error 默认 11:03:30.794018+0800 HomeUIService Presenting error card with error Error Domain=HMErrorDomain Code=2 UserInfo={HFErrorUserInfoOptionsKey=<private>} 默认 11:03:30.803741+0800 HomeUIService Presenting error screen w/ title <private>, subtitle <private>, pairing error Error Domain=HMErrorDomain Code=2 UserInfo={HFErrorUserInfoOptionsKey=<private>}, underlyingError (null), errorLocalizedDescription <private> 默认 11:03:30.803822+0800 HomeUIService Prox Card UI Transition -> Error | VC: <private> 默认 11:03:30.804111+0800 HomeUIService stagePairingFuture result length 0 默认 11:03:30.804227+0800 HomeUIService <private> 318 -[HSProximityCardHostViewController coordinator:updatedConfiguration:]: Error

I am a colleague of the original poster MarcoKhoo. I would like to supplement with the complete log of the issues we have encountered.

If you still have access to the log data, please take a look earlier in the log messages for either (or both) of these to messages from homed:

Changing System Commissioner Mode from YES to NO

Default thread operational dataset is available. Proceed with informing browser to start thread if needed

I've tracked down an existing that looks similar to what you're describing and those log messages would confirm that we're looking at the same issue.

More broadly, please file a bug, attach whatever log data you have available, and then post that number back here. The log data you posted here is somewhat "late" in the failure process, and basically shows the system routing the failure, but doesn't actually include what caused the failure. It's likely that the team will want a full sysdiagnose to do investigate in detail, but it's possible that the logs you already have will be enough to prove/disprove this is the same as the issue I found.

Finally a few other points:

  • What system versions have you tested with/seen this on?

  • Can you provide more details about what you're controller configuration you're testing with? What's the pairing device (the device you're using the pair UI on), is there a Home Hub, and device(s) are acting as the bridge to Thread?

  • What's the ratio of success/failure when pairing?

  • Again, please get a bug filed on this as soon as possible and please include the failure ratio you're seeing in that bug.

On the bug side, keep in mind that even if this issue is known, one of the critical factor in evaluating whether a fix should ship in a software update or wait for the next major system release is how it's impacting users and developers. That's what makes developer bugs critical- they're the best tool you have for letting us know the problems an issue is causing.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Hello. I would appreciate it if you could kindly help to solve this problem. Thank you very much!

I have already filled in the relevant information in the feedback assistant and uploaded the sysdiagnose file: -- FB17317942

please file a bug describing what happened

--When using our app to scan the QR code for adding a Thermostat (Matter Over Thread), an error occurred, stating: "Unable to Add Accessory".

what time the failure occurred

--2025-4-22 16-17 GMT+8

What system versions have you tested with/seen this on?

--iOS 18.4.1

Can you provide more details about what you're controller configuration you're testing with? What's the pairing device (the device you're using the pair UI on), is there a Home Hub, and device(s) are acting as the bridge to Thread?

--We don't have a Home Hub. Instead, we're using our own Matter Controller, which is also a Thread Border Router, and our Thermostat that works with Matter Over Thread.

What's the ratio of success/failure when pairing?

--When the network's good, the success rate's really high. Nine out of ten attempts are successful, and there's just one failure.

--When the network's bad, the success rate's really low. Nine out of ten attempts are failures, and there's only one success.

Our software development environment:

  1. MacOS 15.3.2
  2. Xcode 16.2
  3. iOS 18.4
  4. Dart 3.4.1
  5. Flutter 3.22.1
  6. Matter Controller(Matter1.3)
  7. Thermostat(Matter1.3)

what time the failure occurred --2025-4-22 16-17 GMT+8

Looking at the logging from homed, that timing tracks with these two log messages:


2025-04-22 16:16:01.705228+0800 0x3f8e     info        0x124cc              164    0    homed: (HomeKitMatter) [com.apple.HomeKit:hmmtr.accessory.server] [462492007/3938950945(1835160791)] Request to open the pairing window with PIN for a duration of 300
...
2025-04-22 16:17:31.721119+0800 0x432c     error       0x0                  164    0    homed: (Matter) [com.csa.matter:Controller] Failed to open pairing window on the device. Status src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:124: CHIP Error 0x00000032: Timeout

That leads to here:

--When the network's bad, the success rate's really low. Nine out of ten attempts are failures, and there's only one success.

As far as the log is concerned, it looks networking was simply so poor that pairing wasn't impossible. I'm not sure how you disrupted the network in this particular case, but it's certainly true that you can disrupt a network to the point where pairing simply cannot occur.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Sometimes, it doesn't even send the "open pairing window" command. It just gives an error right away. Do you know why that is?

错误 13:59:42.963124+0800 homed Mdns: Resolve failure (src/platform/Darwin/DnssdImpl.cpp:697: CHIP Error 0x00000074: The operation has been cancelled)

错误 13:59:42.963193+0800 homed OperationalSessionSetup[1:0000000023C84E43]: operational discovery failed: src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:124: CHIP Error 0x00000032: Timeout

错误 13:59:42.963254+0800 homed Creating NSError from src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:124: CHIP Error 0x00000032: Timeout (context: (null))

Sorry about that. The log I had before might've given you the wrong idea. Actually, the error we run into a lot is "Could not find system commissioner pairing for newly staged server with identifier 6f:d6:63:43:b5:b9 in all pairings:", not "Failed to open pairing window on the device".

I have uploaded the log with the ID: FB17357284

Could you assist us in resolving this issue? Thank you.

@DTS Engineer sorry that we may have give you the wrong log in the 19/Apr. But the latest log in FB17357284. should be one of the topics is case we that we presenting.

Fabric1 complete (matter thread network created)

默认 2025-04-25 11:14:33.354895 +0800 homed Received Command Response Data, Endpoint=0 Cluster=0x0000_0030 Command=0x0000_0005 默认 2025-04-25 11:14:33.354903 +0800 homed Received CommissioningComplete response, errorCode=0 . . . (Report unable to add device )

错误 2025-04-25 11:14:33.539798 +0800 homed [188914BD-5163-425C-9E59-CAE9BFA1A288] Could not find system commissioner pairing for newly staged server with identifier 6f:d6:63:43:b5:b9 in all pairings: ( "<MTSSystemCommissionerPairing, Uuid: 34355356-AAFA-4604-B

It is only 0.2s between the 2 events, so we suspected that there is some problem with HomeD in the process. Thank you and look forward for your reply

@DTS Engineer sorry that we may have give you the wrong log in the 19/Apr. But the latest log in FB17357284. should be one of the topics is case we that we presenting.

So, looking over both bugs, I have a few questions:

  • Are you able to reliably pair with this configuration when JUST pairing through Home.app? As part of that, please make sure you're accessory is fully functional as it looks like there may be a bug which could cause pairing to appear successful.

  • As part of that, make sure you validate from your border router that the accessory is fully paired and communicating through your border router.

  • Related to that point, has your app confirmed that the correct thread credentials are configured using THClient? Note that this validation should generally be done before starting MatterSupport pairing, since your extension can't really handle error if that data is unavailable.

The concern her is that your border router was never fully configured, so you don't actually have a working thread network during pairing.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you for your detailed response and questions. Here are my answers:

Pairing through Home.app: We cannot use the same configuration to set up in Home.app since we did not form a thread mesh network with the HomePod. The thread network is created by the border router with its own dataset. From our understanding, the Home.app uses the dataset of the HomePod. While it is possible to change the dataset of our border router to match the HomePod's, it is not preferred. The thread network can be created before the user uses iOS (e.g., setup using Android with thread devices already connected to the thread border router using the original dataset). Changing it would require reconfiguring all devices. Therefore, using Home.app will only add the thread device to the HomePod instead of our border router even thoguh both are in homepod and our border router are in same local wifi network.

Thread Credentials Configuration: My app has confirmed that the correct thread credentials are configured using THClient. This validation was done before starting MatterSupport pairing, as I understand that the extension cannot preceed if this data is unavailable.

Could you please advise on how we should verify that the thread credentials are configured correctly or if there is anything we cannot check? The dataset & Border Agent ID given to the device during commissioning is definitely correct, or else the device would not join the thread network of our border router to create the first matter fabric. or will MatterSupport check whether the device's thread network matches the preferred network in THClient before moving forward or other requirement checking? eg. Google Matter SDK has this requirement of using the preferred network only, and we have to change the preferred network in GMS before the setup.

And this issue is happen randomly

And this issue is happen randomly

Unfortunately, all I can really say about this is that, yes, this is happening. There are timing bugs in the logic that manages how the MatterSupport pairing engine "chains" between ecosystem controllers, the net result of which is exactly what you're seeing. If you haven't already, I would test on iOS 18.5 (which may see a modest improvement) but the issues won't be fully resolved until additional fixes ship/seed.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Matter commissioning issue with Matter support extension
 
 
Q