Crashes with Rosetta after Sonoma update (crash reports).

Question

YCOSB OP

Created Apr ’25

Replies 1

Boosts 0

Participants 2

Hello.

We have an app and a custom dylib hat seems to be crashing only when Rosetta is involved.

I believe it's the custom DYLIB that crashes.

Here are some observations.

The issue happens on the older 2022 built DYLIB (Intel only)
And the newer DYLIB built (Universal)
The universal DYLIB works fine natively on both Intel and M1 machines. It's only when we access it through an Intel only .app and it's running with Rosetta that we see the crash.
The older Intel only .DYLIB worked perfectly with the same testing .app in versions before Sonoma, now they crash with the same .app, same build.

Crash reports have been all over the place, they vary but repeat themselves. It has been a little confusing as to how to approach this issue and would appreciate any input that can help us understand what is going wrong and how to move forward.

Crash reports are attached

Crash Report 1

Crash Report 2

Crash Report 3

The crash occurs as SIGSEGV

Version: 2.0 (1)

Code Type: X86-64 (Translated)

Parent Process: launchd [1]

User ID: 501

Date/Time: 2025-04-14 19:55:06.0103 +0200

OS Version: macOS 14.6 (23G5075b)

Report Version: 12

Anonymous UUID: A08ECCFA-BF01-8636-7453-E4476586D3A8

Time Awake Since Boot: 3900 seconds

System Integrity Protection: enabled

Notes: PC register does not match crashing frame (0x0 vs 0x102920144)

Crashed Thread: 10

Exception Type: EXC_BAD_ACCESS (SIGSEGV)

Exception Codes: KERN_INVALID_ADDRESS at 0x000000011d052840

Exception Codes: 0x0000000000000001, 0x000000011d052840

Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11

Terminating Process: exc handler [3174]

VM Region Info: 0x11d052840 is not in any region.

Thank you so much for all the attention and effort.

Boost

Answer 1

DTS Engineer OP

Apple

Apr ’25

Crash reports have been all over the place, they vary but repeat themselves. It has been a little confusing as to how to approach this issue and would appreciate any input that can help us understand what is going wrong and how to move forward.

So, looking at the logs you sent, the first thing I notice is that they're all from exactly the same machine. Note that this value is the same across all logs:

Anonymous UUID:        A08ECCFA-BF01-8636-7453-E4476586D3A8

The "Anonymous UUID" that's generate by the local machine so that we can correlate crashes without having to collect any information that could be tied to the user.

Looking more closely, you can actually see that these three crashes actually line up in sequence over a fairly narrow period of time:

1)
Process:               RangerFlex [2437]
...
Date/Time:             2025-04-14 19:17:07.2594 +0200

2)
Process:               RangerFlex [2516]
...
Date/Time:             2025-04-14 19:26:14.9826 +0200

3)
Process:               RangerFlex [3174]
...
Date/Time:             2025-04-14 19:55:06.0103 +0200

In other words, those crashes all happened on the same machine over ~38 minutes. Now, it's possible you picked these crashes by chance, but one issue I've seen a few times is crashes that seem widespread/random, but are actually specifically happening to a very small number of users with very specific characteristics.

In terms of direct cause of the crash, some kind of memory corruption seems to be involved and I'd specifically be looking at what's happening here:

21  libRanger.dylib               	       0x118f43a9d std::__1::function::operator()(void*) const + 29 (function.h:981)
22  libRanger.dylib               	       0x118f43320 RangerDriverBase::ValidateLicense() + 3120 (RangerDriverBase.cpp:714)
23  libRanger.dylib               	       0x118dea4a0 ScannerPlugIn_XptDriver::ValidateLicense() + 3296 (ScannerPlugIn_XptDriver.cpp:3614)
24  libRanger.dylib               	       0x118f3d982 RangerDriverBase::SetTransportState(RangerTransportStates) + 562 (RangerDriverBase.cpp:138)
25  libRanger.dylib               	       0x118ded597 ScannerPlugIn_XptDriver::OnPluginDevicesInitialized(unsigned char*, unsigned char*) + 343 (ScannerPlugIn_XptDriver.cpp:4753)

Two out of your three crash directly include that code. On that point, one thing to be careful about when looking at memory corruption is assuming that the crashing thread is the direct cause. Memory corruption often has a pattern where one thread/modification create the problem which then causes a different thread to crash. One of your logs shows these two threads:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   runtime                       	    0x7ff7ffddfdf8 rosetta::runtime::TranslationCache::jit_translation_for_x86_address(unsigned int, unsigned long long, rosetta::runtime::TranslationCache::SingleStepMode, rosetta::runtime::BranchSlot*, rosetta::runtime::CacheSlot*, ExecutionMode) + 7576 (/Volumes/BuildRootMonorailSunburstG/Library/Caches/com.apple.xbs/Sources/1eef975d-aae8-4388-aeae-43b1c2e4a403/Rosetta-318.9/src/runtime/TranslationCacheJit.cpp:696)
1   runtime                       	    0x7ff7ffddb980 rosetta::runtime::TranslationCache::translation_for_x86_address(unsigned int, unsigned long long, ExecutionMode) + 300 (/Volumes/BuildRootMonorailSunburstG/Library/Caches/com.apple.xbs/Sources/1eef975d-aae8-4388-aeae-43b1c2e4a403/Rosetta-318.9/src/runtime/TranslationCache.cpp:172)
2   Rosetta Runtime Routines	       0x104af04ac ???
3   	       0x104d058b4 ???
4   ???                           	    0x2085f81f8085 ???
5   Foundation                    	    0x7ff80bfbefe6 NSKVODeallocate + 56
6   libobjc.A.dylib               	    0x7ff80abf4017 AutoreleasePoolPage::releaseUntil(objc_object**) + 185 (/Volumes/BuildRootMonorailSunburstG/Library/Caches/com.apple.xbs/Sources/545b3bb6-9568-42a1-af21-9f685efba273/objc4-912.7/runtime/NSObject.mm:918)


Thread 5:
0   runtime                       	    0x7ff7ffdd52dc __ulock_wait + 8
1   runtime                       	    0x7ff7ffddca90 rosetta::runtime::TranslationCache::remove_translations_in_x86_interval(unsigned int, Interval<unsigned long long>) + 80 (/Volumes/BuildRootMonorailSunburstG/Library/Caches/com.apple.xbs/Sources/1eef975d-aae8-4388-aeae-43b1c2e4a403/Rosetta-318.9/src/runtime/TranslationCacheJit.cpp:219)
2   runtime                       	    0x7ff7ffde7900 rosetta::runtime::ThreadContext::munmap(void*, unsigned long) + 44 (/Volumes/BuildRootMonorailSunburstG/Library/Caches/com.apple.xbs/Sources/1eef975d-aae8-4388-aeae-43b1c2e4a403/Rosetta-318.9/src/runtime/darwin/ThreadContextVm.cpp:503)
3   ???                           	    0x7ff89b26e070 ???
4   libsystem_kernel.dylib        	    0x7ff80af7edfe __munmap + 10
5   libRanger.dylib               	       0x118ef8215 SBT::SharedMemory::SharedMemory::Impl::~Impl() + 21 (SharedMemory.cpp:81)
...
21  libRanger.dylib               	       0x118f43a9d std::__1::function::operator()(void*) const + 29 (function.h:981)
22  libRanger.dylib               	       0x118f43320 RangerDriverBase::ValidateLicense() + 3120 (RangerDriverBase.cpp:714)
23  libRanger.dylib               	       0x118dea4a0 ScannerPlugIn_XptDriver::ValidateLicense() + 3296 (ScannerPlugIn_XptDriver.cpp:3614)
24  libRanger.dylib               	       0x118f3d982 RangerDriverBase::SetTransportState(RangerTransportStates) + 562 (RangerDriverBase.cpp:138)

Based on the data in front of me, I'd start with the assumption that "Thread 5" is more likely to be the problem source, not "Thread 0".

In terms of Rosetta's direct involvement here, that's hard to speculate on here. On the one hand, dynamically load plugins are not a common case so you are interacting with parts of Rosetta that are more complex and less heavily exercised. On the other hand, plugin architectures open up a whole range of other failure/corruption opportunities. As a side note here, if you haven't already started doing so I would strongly suggest rearchitecting to move your plugins out of process. While this could be done through an entirely new plugin API, this can also be done by creating an XPC service that replicates the environment your plugins expect and then loads each plugin into it's own service. On particular place this helps with is if you intend to support plugin unloading, as you can simply destroy the service instead of trying to hope/trust in that plugins can cleanly unload.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0