Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darwin.stdenv: use CoreFoundation instead of CF #265102

Merged
merged 4 commits into from
Nov 11, 2023

Conversation

reckenrode
Copy link
Contributor

@reckenrode reckenrode commented Nov 2, 2023

Description of changes

This patch switches the CoreFoundation on x86_64-darwin from the open source swift-corelibs-foundation (CF) to the system CoreFoundation. The changes were tested by building the stdenv and Nix, which depends on aws-sdk-cpp.

This change was motivated by failures building packages for the current staging-next cycle #263535 due to an apparent incompatibility with the rpath-based approach to choosing CF or CoreFoundation and macOS 14. This error often manifests as a crash with an Illegal Instruction.

For example, building aws-sdk-cpp for building Nix will fail this way.

https://hydra.nixos.org/build/239459417/nixlog/1

Application Specific Information:
CF objects must have a non-zero isa

Error Formulating Crash Report:
PC register does not match crashing frame (0x0 vs 0x7FF8094DD640)

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   CoreFoundation                	    0x7ff8094dd640 CF_IS_OBJC.cold.1 + 14
1   CoreFoundation                	    0x7ff8094501d0 CF_IS_OBJC + 60
2   CoreFoundation                	    0x7ff8093155e8 CFRelease + 40
3   ???                           	       0x10c7a2c61 s_aws_secure_transport_ctx_destroy + 65
4   ???                           	       0x10c87ba32 aws_ref_count_release + 34
5   ???                           	       0x10c7b7adb aws_tls_connection_options_clean_up + 27
6   ???                           	       0x10c596db4 Aws::Crt::Io::TlsConnectionOptions::~TlsConnectionOptions() + 20
7   ???                           	       0x10c2d249c Aws::CleanupCrt() + 92
8   ???                           	       0x10c2d1ff0 Aws::ShutdownAPI(Aws::SDKOptions const&) + 64
9   ???                           	       0x102d9bc6f main + 335
10  dyld                          	       0x202f333a6 start + 1942

According to a post on the Apple developer forums, hardening was added to CoreFoundation, and this particular message occurs when you attempt to release an object it does not recognize as a valid CF object. (Thank you to @lilyinstarlight for finding this post).

When I switched aws-sdk-cpp to link against CoreFoundation instead of CF, the error went away. Somehow both libraries were being used. To prevent dependent packages from linking the wrong CoreFoundation, it would need to be added as a propagated build input.

Note that there are other issues related to mixing CF and CoreFoundation frameworks. #264503 fixes an issue with abseil-cpp where it propagates CF, causing issues when using a different SDK version. Mixing versions can also cause crashes with Python when a shared object is loaded that is linked to the “wrong” CoreFoundation.

NIX_COREFOUNDATION_RPATH is supposed to make sure the right CoreFoundation is being used, but it does not appear to be enough on macOS 14 (presumably due to the hardening). While it is possible to propagate CoreFoundation manually, the cleaner solution is to make it the default. CF remains available as darwin.swift-corelibs-foundation.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Drop ninja and use a minimal Python to avoid an infinite recursion due
to dependencies that depend on configd, which also depends on this.
@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Nov 2, 2023
@reckenrode reckenrode force-pushed the darwin-corefoundation branch from 3e5ce60 to 19cc661 Compare November 2, 2023 22:25
@ofborg ofborg bot added 6.topic: darwin Running or building packages on Darwin 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild 8.has: clean-up 8.has: package (new) This PR adds a new package labels Nov 2, 2023
@ofborg ofborg bot requested a review from copumpkin November 2, 2023 23:00
This patch switches the CoreFoundation on x86_64-darwin from the open
source swift-corelibs-foundation (CF) to the system CoreFoundation.

This change was motivated by failures building packages for the current
staging-next cycle NixOS#263535 due to an apparent incompatibility with the
rpath-based approach to choosing CF or CoreFoundation and macOS 14. This
error often manifests as a crash with an Illegal Instruction.

For example, building aws-sdk-cpp for building Nix will fail this way.

https://hydra.nixos.org/build/239459417/nixlog/1

    Application Specific Information:
    CF objects must have a non-zero isa

    Error Formulating Crash Report:
    PC register does not match crashing frame (0x0 vs 0x7FF8094DD640)

    Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
    0   CoreFoundation                	    0x7ff8094dd640 CF_IS_OBJC.cold.1 + 14
    1   CoreFoundation                	    0x7ff8094501d0 CF_IS_OBJC + 60
    2   CoreFoundation                	    0x7ff8093155e8 CFRelease + 40
    3   ???                           	       0x10c7a2c61 s_aws_secure_transport_ctx_destroy + 65
    4   ???                           	       0x10c87ba32 aws_ref_count_release + 34
    5   ???                           	       0x10c7b7adb aws_tls_connection_options_clean_up + 27
    6   ???                           	       0x10c596db4 Aws::Crt::Io::TlsConnectionOptions::~TlsConnectionOptions() + 20
    7   ???                           	       0x10c2d249c Aws::CleanupCrt() + 92
    8   ???                           	       0x10c2d1ff0 Aws::ShutdownAPI(Aws::SDKOptions const&) + 64
    9   ???                           	       0x102d9bc6f main + 335
    10  dyld                          	       0x202f333a6 start + 1942

According to a [post][1] on the Apple developer forums, hardening was
added to CoreFoundation, and this particular message occurs when you
attempt to release an object it does not recognize as a valid CF object.
(Thank you to @lilyinstarlight for finding this post).

When I switched aws-sdk-cpp to link against CoreFoundation instead of
CF, the error went away. Somehow both libraries were being used. To
prevent dependent packages from linking the wrong CoreFoundation, it
would need to be added as a propagated build input.

Note that there are other issues related to mixing CF and CoreFoundation
frameworks. NixOS#264503 fixes an issue with abseil-cpp where it propagates
CF, causing issues when using a different SDK version. Mixing versions
can also cause crashes with Python when a shared object is loaded that
is linked to the “wrong” CoreFoundation.

`NIX_COREFOUNDATION_RPATH` is supposed to make sure the right
CoreFoundation is being used, but it does not appear to be enough on
macOS 14 (presumably due to the hardening). While it is possible to
propagate CoreFoundation manually, the cleaner solution is to make it
the default. CF remains available as `darwin.swift-corelibs-foundation`.

[1]: https://developer.apple.com/forums/thread/739355
@reckenrode reckenrode force-pushed the darwin-corefoundation branch from 19cc661 to daa79a1 Compare November 3, 2023 01:21
@reckenrode
Copy link
Contributor Author

@toonn

@vcunat
Copy link
Member

vcunat commented Nov 3, 2023

you suggest to rebuild darwin stdenv in staging-next?

@reckenrode
Copy link
Contributor Author

you suggest to rebuild darwin stdenv in staging-next?

Yes. This is the CoreFoundation change I’ve been discussing in the staging channel on Matrix. @K900 suggested I should probably submit it now if I don’t expect more stdenv rebuilds.

@vcunat
Copy link
Member

vcunat commented Nov 3, 2023

staging-next..staging currently does change stdenv.outPath on x86_64-darwin, on a quick check.

@K900
Copy link
Contributor

K900 commented Nov 3, 2023

AFAICT we have 3k regressions on Darwin which can't really be fixed without this, so we basically have to eat the rebuild count.

@reckenrode
Copy link
Contributor Author

I’m going to retarget this to staging. While I think it’s an improvement for the maintainability and reliability of Darwin, I’m not seeing where those 3k failures are due to this issue, and not having to rebuild the stdenv is highly preferable right now.

@reckenrode reckenrode marked this pull request as draft November 3, 2023 15:56
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-macos-monthly/12330/45

@reckenrode reckenrode changed the base branch from staging-next to staging November 9, 2023 13:56
@reckenrode reckenrode marked this pull request as ready for review November 9, 2023 13:57
@reckenrode
Copy link
Contributor Author

I’m flipping this back out of draft. python3Packages.tokenizers is another example that crashes (along with aws-sdk-cpp and arrow-cpp). The common thread is they all use the AWS SDK and link against both CF and CoreFoundation. While it is possible to fix all the aws-* packages to link CoreFoundation, my concern is that would be kicking the can down the road. In this particular case, only the packages that link other frameworks are linking CoreFoundation. When those are mixed with others that don’t, that’s when things break.

@reckenrode reckenrode changed the base branch from staging to staging-next November 11, 2023 16:11
@reckenrode reckenrode marked this pull request as draft November 11, 2023 16:18
@reckenrode
Copy link
Contributor Author

Converting to draft while I figure why retargeting to staging-next is causing an infinite recursion.

@reckenrode reckenrode marked this pull request as ready for review November 11, 2023 16:48
@reckenrode
Copy link
Contributor Author

reckenrode commented Nov 11, 2023

Marking this ready. It actually evals. I missed a commit when cherry-picking to confirm locally that there were no eval issues.

@reckenrode
Copy link
Contributor Author

reckenrode commented Nov 11, 2023

Merging to unblock the Darwin staging-next rebuilds.

@reckenrode reckenrode merged commit 9401804 into NixOS:staging-next Nov 11, 2023
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Nov 16, 2023
Using flat namespaces causes libstdc++ to link CoreFoundation, but
that fails after NixOS#265102. Since CoreFoundation is not actually needed,
disable flat namespaces to avoid linking it unnecessarily.

Disabling flat namespaces matches the behavior of newer versions of
libstdc++ (GCC 7+) when building for newer Darwin hosts (10.5+).
ghost pushed a commit that referenced this pull request Nov 30, 2023
Using flat namespaces causes libstdc++ to link CoreFoundation, but
that fails after #265102. Since CoreFoundation is not actually needed,
disable flat namespaces to avoid linking it unnecessarily.

Disabling flat namespaces matches the behavior of newer versions of
libstdc++ (GCC 7+) when building for newer Darwin hosts (10.5+).
github-actions bot pushed a commit that referenced this pull request Nov 30, 2023
Using flat namespaces causes libstdc++ to link CoreFoundation, but
that fails after #265102. Since CoreFoundation is not actually needed,
disable flat namespaces to avoid linking it unnecessarily.

Disabling flat namespaces matches the behavior of newer versions of
libstdc++ (GCC 7+) when building for newer Darwin hosts (10.5+).

(cherry picked from commit 647b2db)
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jan 28, 2024
As of NixOS#265102, x86_64-darwin no
longer uses the open-source CF release. Since there is no longer a need
to switch between implementations, and the hook causes problems with
cross-compilation (see NixOS#278348),
drop the hook and make both darwin.CF an alias for
darwin.apple_sdk.frameworks.CoreFoundation.
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request May 31, 2024
As of NixOS#265102, x86_64-darwin no
longer uses the open-source CF release. Since there is no longer a need
to switch between implementations, and the hook causes problems with
cross-compilation (see NixOS#278348),
drop the hook and make both darwin.CF an alias for
darwin.apple_sdk.frameworks.CoreFoundation.
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jun 3, 2024
As of NixOS#265102, x86_64-darwin no
longer uses the open-source CF release. Since there is no longer a need
to switch between implementations, and the hook causes problems with
cross-compilation (see NixOS#278348),
drop the hook and make both darwin.CF an alias for
darwin.apple_sdk.frameworks.CoreFoundation.
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jul 16, 2024
As of NixOS#265102, x86_64-darwin no
longer uses the open-source CF release. Since there is no longer a need
to switch between implementations, and the hook causes problems with
cross-compilation (see NixOS#278348),
drop the hook and make both darwin.CF an alias for
darwin.apple_sdk.frameworks.CoreFoundation.
reckenrode added a commit to reckenrode/nixpkgs that referenced this pull request Jul 26, 2024
Darwin no longer supports switching between CoreFoundation implementations as of NixOS#265102. The setup hook is vestigial and was mostly harmless until NixOS#329526 was necessitated to fix build failures on staging-next NixOS#328673.

The correct fix is to remove the hook. It’s not used by the 11.0 or 12.3 SDKs. This makes the 10.12 SDK a bit more like the other SDKs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants