Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically configure cross-compilation toolchain #794

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Popax21
Copy link

@Popax21 Popax21 commented Feb 16, 2025

Motivation

See previous discussion in #786.

This PR makes crane automatically configure cargo / the cc crate's toolchain when cross-compiling a mkCraneDerivation derivation. It also extends the existing stdenv argument to take a function instead of just a single stdenv instance, to support specifying the stdenv used when cross-compiling. Additionally, the noCrossToolchainEnv argument can be used to disable this new behavior. Any user-specified environment variables always take priority over those set by crane, and no environment variables are set when the derivation isn't cross compiled.

I have tested that this PR works both on my own codebase, as well as for the cross-compilation example; as such I would already consider it in a mergeable state. However, considering the scope of the problem domain + some of the earlier discussion in #786, this might need to undergo a few more rounds of bikeshedding / adjustments - this PR should serve as a suitable basis for this work.

Checklist

  • added tests to verify new behavior
  • added an example template or updated an existing one
  • updated docs/API.md (or general documentation) with changes
  • updated CHANGELOG.md

lib.mergeAttrsList [
{
# Set the target we want to build for (= our host platform)
CARGO_BUILD_TARGET = pkgs.hostPlatform.rust.rustcTarget;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think forcibly setting this will end up always taking precedence over whatever is set in .cargo/config.toml: https://doc.rust-lang.org/cargo/reference/config.html#buildtarget

Not sure of the best way to handle this as there isn't a good/easy way to introspect .cargo/config.tomls, but i also don't want to break people's projects without an obvious indicator...

It would be nice if, at the very least, we logged out all variables that we are setting defaults for (as part of the derivation build, not at evaluation time) so its a lot more obvious to anyone debugging a failing build. Though that may require NOT setting the variables directly on the derivation and instead passing them through a slightly different means (then having a setup hook which echos and sets them up as appropriate).

We can address this comment last once all other feedback has stabilized (so don't worry about addressing this right away)

Comment on lines +41 to +44
(varsForPlatform "HOST" (stdenvSelector pkgs.pkgsBuildHost))

# NOTE: "target" here isn't the nixpkgs platform; it's a "build kind" corresponding to the "host" nixpkgs platform
(varsForPlatform "TARGET" (stdenvSelector pkgs.pkgsHostTarget))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're hard coding checks to stdenv.cc.targetPrefix above but then using different stdenv instantiations to check what the values should be. I think we can avoid the complexity of having a stdenvSelector function by using stdenv.cc.nativePrefix for all the HOST variables.

Similar comment for cargoEnv: we can use stdenv.buildPlatform.rust.cargoEnvVarTarget for HOST variables as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a leftover from my original implementation which attempted to also configure a target platform toolchain. I took this out of the PR since it seemed rather pointless in retrospective (since target isn't really relevant for Rust builds), but left in the stdenvSelector function since it could be potentially used to, for example, cross-compile using clang / MinGW, while building host-binaries (like custom C build tools) using the host's own toolchain. However, I agree that it probably makes sense to ditch that flexibility in return for reducing the complexity here; I'll get onto that in a bit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've looked into switching out stdenvSelector for nativePrefix for a bit, and turns out that this isn't something that works - each stdenv.cc instance only contains binaries / wrappers for a single compilation target (note that this means that the current cross compilation example is broken in this regard; if any crate would have required a host C compiler to be present, it would have not built successfully). Additionally, I've found no docs for nativePrefix anywhere inside the nixpkgs reference manual / other nixpkgs packages. As such I think that we'll continue to need two stdenv instances, requiring something like an stdenvSelector.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the gcc world, usually different tools (e.g. cc etc) will have a prefix (e.g. aarch64-unknown-freebsd-) when cross compiling, and that's what stdenv.cc.targetPrefix represents. The stdenv.cc.nativePrefix is what exists to disambiguate which tools are targeting the build system: usually that value is empty since there's no prefix to be had (i.e. you'd want to call cc directly, and not aarch64-unknown-freebsd-cc)

When using a non-cross-compiling toolchain, the prefixes tend to be blank, so I don't think we need to evaluate two different stdenvs:

cat <<EOF | nix eval -f -
let
  pkgs = import <nixpkgs> {
    localSystem = "x86_64-linux";
    crossSystem = "aarch64-linux";
  };

in {
  nativePrefixForbuildHost = pkgs.pkgsBuildHost.stdenv.cc.nativePrefix;
  targetPrefixForBuildHost = pkgs.pkgsBuildHost.stdenv.cc.targetPrefix;

  nativePrefixForHostTarget = pkgs.pkgsHostTarget.stdenv.cc.nativePrefix;
  targetPrefixForHostTarget = pkgs.pkgsHostTarget.stdenv.cc.targetPrefix;
}
EOF

Yields

{
  nativePrefixForHostTarget = "";
  nativePrefixForbuildHost = "";
  targetPrefixForBuildHost = "";
  targetPrefixForHostTarget = "aarch64-unknown-linux-gnu-";
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am aware; however, the resulting stdenv.cc derivation does not contain any unprefixed compiler wrappers targeting the build platform in its outputs; as such we have to pull the non-cross compiling compiler in from another stdenv.cc instantiation.

Example (notice how there are no unprefixed wrappers in the built cc derivation):

nix-repl> pkgsCross.aarch64-multiplatform.stdenv.cc.nativePrefix
""

nix-repl> :b pkgsCross.aarch64-multiplatform.stdenv.cc          

This derivation produced the following outputs:
  info -> /nix/store/sr8aclvahhqrrl3xx4cfxd549k0h6i08-aarch64-unknown-linux-gnu-gcc-wrapper-14-20241116-info
  man -> /nix/store/jmyny8wil4biyn7468lb8v5sg3m59dc6-aarch64-unknown-linux-gnu-gcc-wrapper-14-20241116-man
  out -> /nix/store/whzdsaaq7kdfqq0ck7jvjjf6bxnkiq26-aarch64-unknown-linux-gnu-gcc-wrapper-14-20241116
[13 copied (301.6 MiB), 63.7 MiB DL]

sh-5.2$ ls /nix/store/whzdsaaq7kdfqq0ck7jvjjf6bxnkiq26-aarch64-unknown-linux-gnu-gcc-wrapper-14-20241116/bin
aarch64-unknown-linux-gnu-addr2line  aarch64-unknown-linux-gnu-cc       aarch64-unknown-linux-gnu-elfedit  aarch64-unknown-linux-gnu-ld       aarch64-unknown-linux-gnu-objcopy  aarch64-unknown-linux-gnu-size
aarch64-unknown-linux-gnu-ar         aarch64-unknown-linux-gnu-c++filt  aarch64-unknown-linux-gnu-g++      aarch64-unknown-linux-gnu-ld.bfd   aarch64-unknown-linux-gnu-objdump  aarch64-unknown-linux-gnu-strings
aarch64-unknown-linux-gnu-as         aarch64-unknown-linux-gnu-cpp      aarch64-unknown-linux-gnu-gcc      aarch64-unknown-linux-gnu-ld.gold  aarch64-unknown-linux-gnu-ranlib   aarch64-unknown-linux-gnu-strip
aarch64-unknown-linux-gnu-c++        aarch64-unknown-linux-gnu-dwp      aarch64-unknown-linux-gnu-gprof    aarch64-unknown-linux-gnu-nm       aarch64-unknown-linux-gnu-readelf

Additionally, the full stdenv also does not put an unprefixed cc into the path either; see the following example (notice how only the last derivation builds successfully):

nix-repl> :b pkgsCross.aarch64-multiplatform.runCommandCC "cc-test" {} "cc --version > $out"                                                                       
error: builder for '/nix/store/lbp9n9kydiqf1ygfmcj5qkqqcl50a32r-cc-test-aarch64-unknown-linux-gnu.drv' failed with exit code 127;
       last 1 log lines:
       > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 1: cc: command not found
       For full logs, run 'nix log /nix/store/lbp9n9kydiqf1ygfmcj5qkqqcl50a32r-cc-test-aarch64-unknown-linux-gnu.drv'.
[1 built (5 failed), 0.0 MiB DL]
nix-repl> :b pkgsCross.aarch64-multiplatform.runCommandCC "cc-nativePrefix-test" {} "${pkgsCross.aarch64-multiplatform.stdenv.cc.nativePrefix}cc --version > $out"
error: builder for '/nix/store/f4nw4byhzpcycx52bdwvk4s4hiclpq3j-cc-nativePrefix-test-aarch64-unknown-linux-gnu.drv' failed with exit code 127;
       last 1 log lines:
       > /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 1: cc: command not found
       For full logs, run 'nix log /nix/store/f4nw4byhzpcycx52bdwvk4s4hiclpq3j-cc-nativePrefix-test-aarch64-unknown-linux-gnu.drv'.
[1 built (6 failed), 0.0 MiB DL]
nix-repl> :b pkgsCross.aarch64-multiplatform.runCommandCC "cc-targetPrefix-test" {} "${pkgsCross.aarch64-multiplatform.stdenv.cc.targetPrefix}cc --version > $out" 

This derivation produced the following outputs:
  out -> /nix/store/97rsn15cc8mv2rcjcrqsbqn6xcdsskm6-cc-targetPrefix-test-aarch64-unknown-linux-gnu
[2 built (6 failed), 0.0 MiB DL]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, the stdenv only has compilers for the host environment but not the build environment... which I guess it makes sense in the sense that nixpkgs would expect anything needed for the host to be in nativeBuildDependencies which themselves should have been compiled with pkgsBuildBuild.stdenv...

But this ultimately begs the question: is there even a utility in automatically setting the the TARGET_CC/HOST_CC variables if we explicitly need to be pulling in two different stdenvs? The current diff sets the variables based on two different stdenvs but we aren't actually pulling them into the derivation's inputs (should we? also starts becoming complicated to answer, cross compilation isn't something that you can easily bolt on 😅 )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there even a utility in automatically setting the the TARGET_CC/HOST_CC variables if we explicitly need to be pulling in two different stdenvs?

It depends honestly; setting those variables is only required for crates which require a C/C++ program to be built and then also executed during the build process, which I figure should be fairly rare (the most likely scenario I can think of would be a crate which vendors a complex upstream C/C++ library which ships its own non-standard build tools). I would personally error on the side of caution and configure all supported toolchains a crate might need, but we could theoretically also only configure the TARGET_CC toolchain and save ourselves the hassle of pulling in a second toolchain for the host.

The current diff sets the variables based on two different stdenvs but we aren't actually pulling them into the derivation's inputs (should we? )

This works just fine in terms of how Nix derivations work - any reference in the derivation's build output to some other derivation's outputs will result in a dependency. As such the buildInputs / ... attributes aren't strictly required - instead they're provided by the stdenv for executing setup hooks / configuring other aspects of the build environment (see the nixpkgs reference manual). So, in practice the only real effect of the current setup not adding the CC wrappers to nativeBuildInputs is that the setup hook of the CC wrapper for the build platform's toolchain is not executed (the target's toolchain is not affected, since it is pulled in by default by the stdenv).

My initial assumption was that this is fine, since the CC wrapper's executables are self-contained, pulling in e.g. libc themselves. However, looking into it some more revealed that this might indeed cause some issues with dependency resolution, since the CC setup hooks aren't generic like I assumed them to be (just having one CC setup hook in the stdenv isn't enough). So I agree that we should ideally be adding the CC wrappers we use to nativeBuildInputs. I'm gonna be away over the weekend, but I'll be looking into ways to do this cleanly once I'm back.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I agree that we should ideally be adding the CC wrappers we use to nativeBuildInputs

The APIs we expose are largely intended to be thin(ish) wrappers around stdenv.mkDerivation, and if that doesn't set up the standard environment out of the box to work for cross-compiling I'd be really wary of us trying for forge a brand new path here. I really don't want to start picking the "wrong" toolchains for downstream users (e.g. what if they want to use a clangStdenv instead of gcc? We can't just blindly assume pulling in pkgs.pkgsBuildBuild.stdenv is what the caller intends)

Copy link
Author

@Popax21 Popax21 Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for having been a bit unclear earlier; most of what I described was just my rationale as to why I initially believed my original approach (which didn't use nativeBuildInputs) would work. What we will (/ should) be doing (pulling in another stdenv's CC into the derivation to have a toolchain targeting the build platform) is in fact well supported and even explicitly documented in the nixpkgs reference manual; see here. Additionally, I agree that we should not blindly assume that the standard stdenv will be appropriate for callers - that's why I introduced the stdenv selector function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants