Hi everyone,
I was following the Video Modernize PCI and SCSI drivers with DriverKit and the Document to implement UserMapHBAData(), and here’s my current implementation:
// kern_return_t DRV_MAIN_CLASS_NAME::UserMapHBAData_Impl(uint32_t *uniqueTaskID)
kern_return_t IMPL(DRV_MAIN_CLASS_NAME, UserMapHBAData)
{
Log("UserMapHBAData() - Start");
// Define the vm_page_size explicitly
const uint32_t vm_page_size = 4096;
kern_return_t ret;
IOBufferMemoryDescriptor *buffer = nullptr;
IOMemoryMap *memMap = nullptr;
void *taskData = nullptr;
// Create a buffer for HBA-specific task data
ret = IOBufferMemoryDescriptor::Create(kIOMemoryDirectionOutIn, ivars->fTaskDataSize, vm_page_size, &buffer);
__Require((kIOReturnSuccess == ret), Exit);
// Map memory to the driver extension's memory space
ret = buffer->CreateMapping(0, 0, 0, 0, 0, &memMap);
__Require((kIOReturnSuccess == ret), Exit);
// Retrieve mapped memory address
taskData = reinterpret_cast<void *>(memMap->GetAddress());
__Require(taskData, Exit);
// WARNING: Potential leak of an object stored into 'buffer'
// WARNING: Potential leak of an object stored into 'memMap'
// Assign a unique task ID
ivars->fTaskID++; // ERROR: No member named 'fTaskID' in 'DriverKitAcxxx_IVars'
ivars->fTaskArray[ivars->fTaskID] = taskData;
*uniqueTaskID = ivars->fTaskID;
Log("UserMapHBAData() - End");
return kIOReturnSuccess;
Exit:
// Cleanup in case of failure
if (memMap) {
memMap->free(); // Correct method for releasing memory maps
}
if (buffer) {
buffer->free(); // Correct method for releasing buffer memory
}
LogErr("ret = 0x%0x", ret);
Log("UserMapHBAData() - End");
return ret;
}
For reference, in KEXT, memory allocation is typically done using:
IOBufferMemoryDescriptor *buffer = IOBufferMemoryDescriptor::inTaskWithOptions(
kernel_task, // Task in which memory is allocated
kIODirectionOutIn, // Direction (read/write)
1024, // Size of the buffer in bytes
4); // Alignment requirements
However, after installing the dext, macOS hangs, and I have to do a hardware reset. After rebooting, the sysctl list output shows:
% sectl list
1 extension(s)
--- com.apple.system_extension.driver_extension
enabled active teamID bundleID (version) name [state]
* - com.accusys.DriverKitAcxxx (5.0/11) com.accusys.DriverKitAcxxx [activated waiting for user]
Questions:
- What could be causing macOS to halt?
- How should I approach debugging and resolving this issue?
Looking forward to your insights, any suggestions would be greatly appreciated!
Best regards, Charles
What could be causing macOS to halt?
First off, as general note, I think it's important to emphasize that DEXT development is FAR closer to KEXT development than it is standard app development. The DEXT architecture is VASTLY more secure that KEXTs and the impact "scope" of DEXT is far smaller. However, DEXTs are fundamentally still drivers and are FULLY capable of hanging or panic'ing the kernel, particularly the lower level PCI and SCSI families.
Looking at your specific code, there are a few things I notice:
-
I don't think it will make any difference but you can use IOVMPageSize to check the pages size and, strictly speaking, it's 16k on Apple Silicon, not 4k.
-
You're retrieving the address using GetAddress(), but that means you didn't retrieve/check the length. Both the address and the length are more typically retrieved through the returned arguments of "Map()".
-
You should be "fully" processing the memory in UserMapHBAData, not mapping it. Our own driver takes the map all the way through IODMACommand->PrepareForDMA so that it has the final physical address.
Having said all that, I suspect that you didn't actually hang here, but actually stalled at a later point when actually trying to start I/O.
That leads to here:
How should I approach debugging and resolving this issue?
A few different comments/suggestions:
-
Keep in mind that significant log tends to be lost when the machine panic/power cycles, so don't assume that log data you retrieve after a reboot is necessarily accurate.
-
I would try is creating an ssh session into the machine, using that to monitor the system log, then triggering the hang. It's possible that enough of the system is still functioning normally that you'll be able to see the kernel log data, at which point you can then use standard "print" debugging to investigate further.
-
You can also use two machine kernel debugging to "actively" debug your DEXT. The instructions for that as well as kernel symbols are in the KDKs (Kernel Debug Kits) found in the "More Downloads" section of the developer portal. Note that there are issues with the "standard" atos (this does not effect lldb) symbolication process, the details of which are described here.
Finally, as a general comment, my experience has been that what's often helpful is to slow down and carefully validate every step of the process instead of focusing on getting it "all" to work. In the case here, I would start by simply getting the I/O "pipeline" working WITHOUT actually doing any I/O. That is, implementing every method and having each step "succeed" without actually touching hardware. Next, I'd try to get I/O working, but I'd "rig" the process so that I'm only actually doing a single I/O to a "fixed" addresses, not trying to setup the fully functional I/O pipeline. Finally, I'd then expand that approach to the "final" full I/O pipeline. The goal here is to ensure that at every point the failure "range" is narrow enough that it's relatively easy to figure out what actually failed, avoiding the problem where "everything" has been implemented, but nothing actually works.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware