I am able to symbolicate kernel backtraces for addresses that belong to my kext.
Is it possible to symbolicate kernel backtraces for addresses that lie beyond my kext and reference kernel code?
Let me step back to the beginning and explain this in a bit more detail:
-
When using "-l", what atos is actually "getting" there is the load address starting from the first __TEXT segment in the executable.
-
In arm64, (very generally) things shifted about code was loading out of __TEXT_EXEC, NOT _TEXT.
SO, to correctly symbolicate, you need to properly shift the load address to account for that. Starting with loadable KEXTs, this is fairly easy. They're __TEXT segment is at 0, so the math here is:
(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->
KEXTs also tend to have a fixed opening __TEXT size, so the value of __TEXT_EXEC.vmaddr ends up being "0x4000"
0x4000 - 0 -> 0x4000
That is where my original command came from:
atos -arch arm64e -o <symbol file path) -l <KEXT load address - 0x4000> <address to symbol>
NOTE: My earlier post was wrong, you need to subtract not add.
For kext, that load address comes directly from the panic log. Here is what that looks like in the panic from the other thread on this issue:
lr: 0xfffffe0013723894 fp: 0x0000000000000000
Kernel Extensions in backtrace:
com.company.product(1.4.21d119)[92BABD94-80A4-3F6D-857A-3240E4DA8009]@0xfffffe001203bfd0->0xfffffe00120533ab
dependency: com.apple.iokit.IOPCIFamily(2.9)[6D6666E6-340F-3A5E-9464-DE05164C0658]@0xfffffe0015e65e90->0xfffffe0015e93b3f
Now, the math is actually exactly the same for the kernel, but the numbers are totally different. Running otool -l on kernel.release.t8112 gets you:
__TEXT.vmaddr ->
...
Load command 0
cmd LC_SEGMENT_64
cmdsize 552
segname __TEXT
vmaddr 0xfffffe0007004000
vmsize 0x0000000000100000
...
AND:
__TEXT_EXEC.vmaddr:
...
Load command 2
cmd LC_SEGMENT_64
cmdsize 312
segname __TEXT_EXEC
vmaddr 0xfffffe00072b8000
vmsize 0x00000000008f0000
...
(You may notice there are multiple __TEXT_EXEC section, in this case all we care about is the first)
SO, plugging that that into our math above:
(__TEXT_EXEC.vmaddr) - (__TEXT.vmaddr) ->
0xfffffe00072b8000 - 0xfffffe0007004000 -> 0x2B4000
That number is then subtracted from:
Kernel text exec base: 0xfffffe001cd8c000
0xfffffe001cd8c000 - 0x2B4000 -> 0xFFFFFE001CAD8000
Plugging that value into your atos command:
atos -o kernel.release.t8112 -arch arm64e -l 0xFFFFFE001CAD8000 0xfffffe001cde8ab8 0xfffffe001cf44894 0xfffffe001cf42a90 0xfffffe001cd938c0 0xfffffe001cde83b0 0xfffffe001d66d670 0xfffffe001d86a708 0xfffffe001d86941c 0xfffffe001d866504 0xfffffe001e234bc8 0xfffffe001cf4611c 0xfffffe001cd93938 0xfffffe001cee0a80 0xfffffe001cee0a80 0xfffffe001d38e408 0xfffffe001d3e9304 0xfffffe001ce8c24c 0xfffffe001cedf514 0xfffffe001cd9cc04
Gives us:
handle_debugger_trap (in kernel.release.t8112) (debug.c:1839)
handle_uncategorized (in kernel.release.t8112) (sleh.c:1591)
sleh_synchronous (in kernel.release.t8112) (sleh.c:1477)
fleh_synchronous (in kernel.release.t8112) + 44
panic_trap_to_debugger (in kernel.release.t8112) (debug.c:1400)
panic_with_options_and_initiator (in kernel.release.t8112) (debug.c:1213)
0xfffffe001d86a708
0xfffffe001d86941c
0xfffffe001d866504
0xfffffe001e234bc8
sleh_irq (in kernel.release.t8112) (sleh.c:3840)
fleh_irq (in kernel.release.t8112) + 64
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
vm_object_upl_request (in kernel.release.t8112) (vm_pageout.c:6213)
ubc_create_upl_kernel (in kernel.release.t8112) (ubc_subr.c:2643)
vnode_pageout (in kernel.release.t8112) (vnode_pager.c:407)
vnode_pager_data_return (in kernel.release.t8112) (bsd_vm.c:451)
vm_pageout_iothread_external_continue (in kernel.release.t8112) (vm_pageout.c:4237)
Call_continuation (in kernel.release.t8112) + 196
Note the references to "panic..." which are an easy way to validate that the symbolication is valid. By it's nature, the kernel cannot "crash" in the same sense any other process can, as the kernel IS the core component that controls all of the mechanisms that cause OTHER processes to crash. Because of that, "all" kernel panics end in a call to "panic" because that's the function that ends up generating the log data your looking at and then stopping normal execution.
Anticipating your question, no, I don't know why the other 4 frame aren't symbolicating, but I don't think it's an issue with atos. Apple obviously has our own internal symbolication tools and they're getting exactly the same result. Are they from your KEXT and you just didn't include the KEXT address info?
Having said that, they also don't really matter for this particular panic. That's because of this message at the top of the log:
panic(cpu 4 caller 0xfffffe001d86a708): watchdog timeout: no checkins from watchdogd in 90 seconds (25 total checkins since monitoring last enabled)
This message basically means exactly what it sounds like. There's system in place (managed by "watchdogd") to verify that user space that requires specific, critical system daemons to "checkin" with the kernel. If they fail to do so, then the system times out and the kernel panics. That dynamic also means that the specific stack traces is largely irrelevant, as it simply shows what happened to be executing at the time the panic occurred, NOT what actually caused the panic.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware