DriverKit assertion in OSAction::Cancel() for timer handler

I have a dext that creates a periodic timer on its own dispatch queue. The callback is declared as follows:

virtual void HandleTimer( OSAction *action, uint64_t time ) TYPE(IOTimerDispatchSource::TimerOccurred);

The timer is allocated as follows:

CreateActionHandleTimer( size, &ivars->TimerHandler ); IODispatchQueue::Create( "TimerQueue", 0, 0, &ivars->TimerDispatchQueue ); IOTimerDispatchSource::Create( ivars->TimerDispatchQueue, &ivars->TimerDispatchSrc );

I can start up the timer and it works just fine. However, in my Stop() method, when trying to shut the timer down, I get an assertion in OSAction::Cancel() for TimerHandler: Assertion failed: (queue), function Cancel, file uioserver.cpp, line 4401.

What does this assertion indicate or is the source code available? If so, where? I'm using macOS 15.5.

Note I am attempting to cancel the handler after the dispatch source and queue are canceled and the cleanup methods have been called (which is working). But, cancelling TimerHandler first also asserts.

Answered by DTS Engineer in 840324022

I can start up the timer and it works just fine. However, in my Stop() method, when trying to shut the timer down, I get an assertion in OSAction::Cancel()

Please file a bug on this and post the bug number back here. I have a reasonable idea what's going on and it's an easy fix (see below) but at a minimum we should document it better and it's possible we should change something.

In any case, getting into specifics:

TimerHandler: Assertion failed: (queue), function Cancel, file uioserver.cpp, line 4401.

The assert itself itself basically indicates that it can't find the queue/"target" the action would target.

Related to that point:

But, cancelling TimerHandler first also asserts.

Do you hit exactly the same assert or is a different crash?

Moving on to what's actually happening...

What does this assertion indicate...

In terms of exactly what's happening, the first case is the easy one:

Note I am attempting to cancel the handler after the dispatch source and queue are canceled

A few different things can/will do wrong here:

  • The dispatch source holds a reference to it's action, which it destroys at free. If you wanted to do this, you'd need to us OSSharedPtr to hold a reference to the action.

  • When canceling, the action's cancelation handler is called on the queue which it targets, which is a problem if the queue is already gone.

The second case is a bit hard to guess at:

But, cancelling TimerHandler first also asserts.

...however, there is a hint another hit to both cases here in the header doc:

/*!
* @brief Cancel all callbacks from the action.
* @discussion After cancellation, the action can only be freed. It cannot be reactivated.
* @param handler Handler block to be invoked after any callbacks have completed.
* @return kIOReturnSuccess on success. See IOReturn.h for error codes.
*/
kern_return_t
Cancel(OSActionCancelHandler handler) LOCALONLY;

The keyword there being "from the action". That is, the intention is that this method is how action can be canceled from inside it's own handler, not as generic cancellation method. Note that this solve both of the failure points I mentioned above, since the source and queue both must be fully "functional" for the action to fire at all.

Finally, what you should be doing here, here's what I found from reviewing our own usage of :

  1. A comfortable majority of our code actually used a local variable for the OSAction object, which means they couldn't cancel from this context even if they wanted to. Outside of very specific requirements, this is the approach I'd recommend.

  2. Calling OSAction->Cancel() at all is pretty rare in our code. In the cases I looked at, it was necessary because of complicated architectural issue that I don't think would every really apply to a standard DEXT.

So, the solution here is basically "don't cancel the action". Keep in mind that unless you're doing something "weird", your DEXT process is going to be destroyed shortly after you return from stop. That means your goals here are basically:

  • Don't panic the kernel.

  • Don't leave your hardware in a weird/broken state.

  • Don't crash, since that might compromise your other two goals.

By design, "perfect" resource management is far less critical that it would be in a DEXT. For example, your DEXT cannot "leak" it's own local memory (that will be destroyed at process death) and it shouldn't be able to leak the kernel memory it does have access to (the IOMemoryDescriptor's it has access to should all have kernel owners that can destroy them). Proper resource management is obviously still important, but not in the same way it would be for a KEXT.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Accepted Answer

I can start up the timer and it works just fine. However, in my Stop() method, when trying to shut the timer down, I get an assertion in OSAction::Cancel()

Please file a bug on this and post the bug number back here. I have a reasonable idea what's going on and it's an easy fix (see below) but at a minimum we should document it better and it's possible we should change something.

In any case, getting into specifics:

TimerHandler: Assertion failed: (queue), function Cancel, file uioserver.cpp, line 4401.

The assert itself itself basically indicates that it can't find the queue/"target" the action would target.

Related to that point:

But, cancelling TimerHandler first also asserts.

Do you hit exactly the same assert or is a different crash?

Moving on to what's actually happening...

What does this assertion indicate...

In terms of exactly what's happening, the first case is the easy one:

Note I am attempting to cancel the handler after the dispatch source and queue are canceled

A few different things can/will do wrong here:

  • The dispatch source holds a reference to it's action, which it destroys at free. If you wanted to do this, you'd need to us OSSharedPtr to hold a reference to the action.

  • When canceling, the action's cancelation handler is called on the queue which it targets, which is a problem if the queue is already gone.

The second case is a bit hard to guess at:

But, cancelling TimerHandler first also asserts.

...however, there is a hint another hit to both cases here in the header doc:

/*!
* @brief Cancel all callbacks from the action.
* @discussion After cancellation, the action can only be freed. It cannot be reactivated.
* @param handler Handler block to be invoked after any callbacks have completed.
* @return kIOReturnSuccess on success. See IOReturn.h for error codes.
*/
kern_return_t
Cancel(OSActionCancelHandler handler) LOCALONLY;

The keyword there being "from the action". That is, the intention is that this method is how action can be canceled from inside it's own handler, not as generic cancellation method. Note that this solve both of the failure points I mentioned above, since the source and queue both must be fully "functional" for the action to fire at all.

Finally, what you should be doing here, here's what I found from reviewing our own usage of :

  1. A comfortable majority of our code actually used a local variable for the OSAction object, which means they couldn't cancel from this context even if they wanted to. Outside of very specific requirements, this is the approach I'd recommend.

  2. Calling OSAction->Cancel() at all is pretty rare in our code. In the cases I looked at, it was necessary because of complicated architectural issue that I don't think would every really apply to a standard DEXT.

So, the solution here is basically "don't cancel the action". Keep in mind that unless you're doing something "weird", your DEXT process is going to be destroyed shortly after you return from stop. That means your goals here are basically:

  • Don't panic the kernel.

  • Don't leave your hardware in a weird/broken state.

  • Don't crash, since that might compromise your other two goals.

By design, "perfect" resource management is far less critical that it would be in a DEXT. For example, your DEXT cannot "leak" it's own local memory (that will be destroyed at process death) and it shouldn't be able to leak the kernel memory it does have access to (the IOMemoryDescriptor's it has access to should all have kernel owners that can destroy them). Proper resource management is obviously still important, but not in the same way it would be for a KEXT.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Feedback Item FB17684395 has been filed.

To answer your question:

Do you hit exactly the same assert or is a different crash?

It is the same assert.

So, the solution here is basically "don't cancel the action".

Understood. Thanks for the help.

DriverKit assertion in OSAction::Cancel() for timer handler
 
 
Q