Thanks for being a part of WWDC25!

How did we do? We’d love to know your thoughts on this year’s conference. Take the survey here

Significant Performance Regression in Apple Clang 16 for Assembly File Processing

I've observed a significant performance regression in Apple Clang 16 (Xcode 16.0/16.2) compared to Clang 15 (Xcode 15.2) when processing flutter aot compilation. Further research shows that clang -cc1as process became extremely slow. The compilation time has increased by approximately 4x.

Environment

  • Machine: Apple M2 (8C8T)
  • Memory: 16GB
  • macOS Version: 14.7.2
  • Target: Flutter AOT compilation (snapshot_assembly.o)

Performance Comparison

Xcode VersioniOS SDKDuration
15.217.21:08.90
15.218.21:03.98
16.217.24:11.07
16.218.24:08.43
16.018.24:29.32

Reproduction Steps

The issue can be reproduced with the following command which is generated by flutter aot_assembly_release process:

time ${xcode_app_path}/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc \
-arch arm64 \
-miphoneos-version-min=12.0 \
-v \
-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS18.2.sdk \
-c ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.S \
-o ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.o

Additional Information

  • This issue specifically affects large assembly files generated by Flutter's AOT compilation
  • The performance regression appears to be consistent across different iOS SDK versions
  • The same assembly file compiles significantly faster with Xcode 15.2
  • Same performance regression observed on M4 Mac mini, suggesting this is not hardware-specific
  • Size of object:
size -m  ${project_path}/.dart_tool/flutter_build/f9ebf46f040933de7c8d103c84d38156/arm64/snapshot_assembly.o
Segment : 64577616
	Section (__TEXT, __text): 26603344
	Section (__DATA, __bss): 48 (zerofill)
	Section (__TEXT, __const): 21292928
	Section (__DWARF, __debug_abbrev): 61
	Section (__DWARF, __debug_info): 8934534
	Section (__DWARF, __debug_line): 4464443
	Section (__LD, __compact_unwind): 3282208
	total 64577566
total 64577616

Questions

  1. Is this a known issue with Apple Clang 16?
  2. Are there any workarounds or compiler flags we can use to improve the performance?
  3. Is this behavior expected or should it be considered a regression?

Any insights or suggestions would be greatly appreciated.

Please open a bug report with the details of what shared here. Don't forget to include an example project demonstrating the performance changes that you describe. Once you open the bug report, please post the FB number here for my reference.

If you have any questions about filing a bug report, take a look at Bug Reporting: How and Why?

— Ed Ford,  DTS Engineer

Thanks for your reply. I have submitted an issue with the project files through Feedback Assistant. Feedback ID: FB16999991

Can you see how Xcode 16.3 performs here? It was released just this week.

— Ed Ford,  DTS Engineer

@DTS Engineer

Hi, I have run this compilation on M4 mbp. The performance of Xcode 16.3 is excellent. The data is here:

  • Xcode 16.3 (Clang 17.0.0): 16.888s
  • Xcode 15.2 (Clang 15.0.0): 52.073s
  • Xcode 16.3 (Clang 16.0.0): 3m11.971s

The new compiler has achieved an impressive speedup.

Thank you for your attention to this issue and for delivering such an excellent solution.

Thanks for sharing the numbers!

— Ed Ford,  DTS Engineer

Significant Performance Regression in Apple Clang 16 for Assembly File Processing
 
 
Q