Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds tools for hermetic nix installation #48

Closed
wants to merge 1 commit into from

Conversation

filmil
Copy link

@filmil filmil commented Apr 10, 2024

Applies the device introduced at: https://github.com/filmil/bazel-nix-flakes to produce a hermetic nix installation when bazel is started.

Made some changes to the clodl runner and pinned bazel version and options to ensure this still compiles.

Try it as follows:

bazel run :clotestbin

or

bazel run :clotestbin-cc

@filmil filmil requested a review from facundominguez as a code owner April 10, 2024 09:02
@filmil
Copy link
Author

filmil commented Apr 10, 2024

I'm not sure if you care for this device, but in case someone is interested, here it is.

@facundominguez
Copy link
Member

Hello @filmil, thanks for your contribution.

I think the feature would deserve consideration for inclusion in tweag/rules_nixpkgs#75, rather than in setup by users of rules_nixpkgs.

What do you think?

@filmil
Copy link
Author

filmil commented Apr 10, 2024 via email

Applies the device introduced at: https://github.com/filmil/bazel-nix-flakes
to produce a hermetic nix installation when bazel is started.

Made some changes to the clodl runner and pinned bazel version and
options to ensure this still compiles.

Try it as follows:

```
bazel run :clotestbin
```

or

```
bazel run :clotestbin-cc
```
@filmil
Copy link
Author

filmil commented Apr 13, 2024

I modified the approach to make it not require elevated privileges for chrooting. Based on existing solutions (in this case, nix-portable). The glue is the only contribution.

@aherrmann
Copy link
Member

@filmil Thank you for your contribution! This does look very interesting! Also thanks for documenting your approach so well. I've dug through it to get a better understanding of it (raw observations attached below in case they might be useful to others).

As far as I can see the following questions are open at the moment:

  • where to contribute this work?
  • in what form can this be contributed?
  • is there anything that should maybe be changed?

where to contribute this work?

I agree with @facundominguez that this could fit very well into rules_nixpkgs. As far as I understand this addresses issues that are relevant to all rules_nixpkgs users, not just clodl users. In particular, this provides support to build a Nix + Bazel project without a global Nix installation, not just to generate a relocatable and distributable artifact from a (Nix + ) Bazel project.

cc @benradf

in what form can this be contributed?

As you point out this does not provide a set of rules. So, it's not immediately clear how to contribute it to rules_nixpkgs. Perhaps there are ways to turn some of this into rules of some kind. But, before we go there. Maybe a good first step would be to contribute this as an example that is checked on CI. This way it provides documentation on how to use rules_nixpkgs without a global Nix installation right in rules_nixpkgs, and CI verifies that this approach continues to work going forward and is not inadvertently broken.

is there anything that should maybe be changed?

Two main things stand out to me that I think should change before this can be contributed:

  1. The use of git submodules. Git submodules tend to complicate the development and deployment flow and make it hard to keep versions in sync. And GitHub generated HTTP provided source tarballs don't include git submodules, which is a problem in Bazel, where the main distribution method is http_archive or variants thereof. Ideally, all the relevant code would be committed directly into the target repository, or downloaded through standard Nix or Bazel download methods, e.g. http_archive.
  2. Currently this checks in a binary blob, namely nix-portable. That's a supply chain risk for downstream users. How should they know which upstream version the blob corresponds to, and if no one has tampered with it. If a binary artifact is necessary, then it should be fetched from upstream releases and checked against a checked in integrity hash.

raw observations

@filmil
Copy link
Author

filmil commented Apr 16, 2024

Thanks for taking the time to look through this.

I agree with your findings.

As far as I understand this addresses issues that are relevant to all rules_nixpkgs users, not just clodl users. In particular, this provides support to build a Nix + Bazel project without a global Nix installation,

That seems to be the case. And that was the original idea.

Maybe a good first step would be to contribute this as an example that is checked on CI.

There is one at https://github.com/filmil/bazel-nix-example now.

  1. The use of git submodules. Git submodules tend to complicate the development and deployment flow and make it hard to keep versions in sync.

I understand that this is not ideal. It might be possible to bootstrap the approach somehow.

The issue is that the //tools/bazel script isn't really part of the build process, but is rather part of the bazel startup protocol. So our options to fix it may be limited.

For example, adding a self-contained repository implementing this, then using Alex Eagle's trick (used to be at 1, now 404s :/ ) to run a script from it that adapts the "main" workspace correctly.

  1. Currently this checks in a binary blob, namely nix-portable.

This is true. It should be removed.

  > /home/runner/work/bazel-nix-flakes/bazel-nix-flakes/tools/x86_64-Linux/proot: error while loading shared libraries: libpython3.11.so.1.0: cannot open shared object file: No such file or directory

FWIW: proot is not a viable approach. While I was eventually able to patch proot build to produce a fully static binary by removing the dependency on python3, proot has issues with handling the bazel server process because it gets daemonized, and proot waits forever for it to exit.

@filmil
Copy link
Author

filmil commented Apr 17, 2024

I reworked the approach to address the issues. I moved most of the setup into a separate repository, started downloading nix-portable via http_file, and removed the need to use submodules.
It is now available at filmil/bazel_local_nix#3. Note that the integration test passes.

I also minimized the amount of scripting that ends up injected into the top-level repository. I hope this relative opaqueness is not too problematic, since the scripts come from an integrity-checked regular repository anyways.
If you worry about supply chain attacks (and it's everyone's top of mind these days, I wonder why), you should be careful about updating the integrity checks. Which one should always do anyways.

I also added an integration test repository, which is currently "just" a nix-based hello world program, but which compiles and runs as one would expect.

There seems to be a bug with nix-portable, where it insists to be invoked from the root of the repository on install.
Which might not be too onerous since it must be done once on installation, and once when new nix deps are introduced.

It is also not strictly necessary to tack this code to rules_nixpkgs. It can live its separate life, and can be only brought in when needed. But it remains an option.

@aherrmann
Copy link
Member

Sorry for the delay and thank you for addressing the comments and updating the code!

I moved most of the setup into a separate repository, started downloading nix-portable via http_file, and removed the need to use submodules.

Nice! That looks very good.

The issue is that the //tools/bazel script isn't really part of the build process, but is rather part of the bazel startup protocol. So our options to fix it may be limited.

I guess another simple solution could be a similar approach to what buildozer does: Download and cache a release binary on the fly and check it against an expected digest within the Bazel wrapper. That way users that decide to go with this approach could check in their tools/bazel wrapper without having to check in a binary but also without having to run an install step at every use-site.

It is also not strictly necessary to tack this code to rules_nixpkgs. It can live its separate life, and can be only brought in when needed. But it remains an option.

True. The way this is looking it could be a standalone Bazel module or it could be a part of rules_nixpkgs. I'd be very open to the latter.

@filmil
Copy link
Author

filmil commented May 8, 2024

I guess another simple solution could be a similar approach to what buildozer does: Download and cache a release binary on the fly and check it against an expected digest within the Bazel wrapper.
That way users that decide to go with this approach could check in their tools/bazel wrapper without having to check in a binary but also without having to run an install step at every use-site.

Not sure I understand this bit.
There is no need to check a binary in anymore. Nor is there an issue with checking in the tools/bazel script.

True. The way this is looking it could be a standalone Bazel module or it could be a part of rules_nixpkgs. I'd be very open to the latter.

Thinking a bit, I'm not sure we should require people who don't need an ephemeral nix installation to have to the support in.

I'm OK if the contribution stays as-is, and would welcome the folks at tweag/rules_nixpkgs#75 to try out the ephemeral nix support at: https://github.com/filmil/bazel_local_nix

@aherrmann
Copy link
Member

Not sure I understand this bit.

Sorry, that wording was a bit convoluted. The main point was "without having to run an install step at every use-site". IIUC, right now, every user has to run the install step by invoking bazel run @bazel_local_nix//:install. With the bazelisk alike approach that wouldn't be the case.

I'm not sure we should require people who don't need an ephemeral nix installation to have to the support in.

The way I was imagining it, it would still be an opt-in choice for users. If they don't set up the tools/bazel wrapper, then they won't be using the ephemeral Nix installation.

@filmil
Copy link
Author

filmil commented May 8, 2024

IIUC, right now, every user has to run the install step by invoking bazel run @bazel_local_nix//:install.

No, this is not necessary. As soon as someone checks in the results of the installation, it is done. No additional quirks. Binaries are fetched via external repositories as usual. The bazelisk or buildifier approaches are not needed. (since the binary in the rule does not need to exist outside of a specific project scope, while bazelisk or buildifier both do.)

Maybe I misunderstood your point?

The way I was imagining it, it would still be an opt-in choice for users. If they don't set up the tools/bazel wrapper, then they won't be using the ephemeral Nix installation.

While yes, you could do that, note that it is unnecessary. The ephemeral nix installation in rules_local_nix is fully optional. One opts in by adding the stanzas to bring the rules in, just as one usually does. It is also additional complexity that doesn't need to be introduced everywhere.

That said, maybe you'd want to take over the maintenance of the rules, in case I slide under a bus or something. In which case feel free to put them wherever you think they fit.

Copy link
Member

@facundominguez facundominguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a pass over the changes after learning some more about the approach of bazel_local_nix, and left some questions.

Also, there seems to be some CI failures.

@@ -1,3 +1,4 @@
build --host_javabase=@local_jdk//:jdk
build --host_platform=@io_tweag_rules_nixpkgs//nixpkgs/platforms:host
build --incompatible_require_linker_input_cc_api=false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have something to do with bazel_local_nix. I cannot fathom from the documentation of the flag how it relates to the PR. It isn't mentioned in the setup instructions of bazel_local_nix either.

Copy link
Author

@filmil filmil May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we go too much into the weeds. This PR was not meant as a contribution to clodl. It was instead meant as an illustration of the approach. I took an arbitrary repo that uses rules_nixpkgs and converted it into a working repo with an ephemeral nix store with relatively few changes overall.

If you wanted to use the bazel_local_nix approach, someone else would need to apply it to your repo. If nobody else is interested, it might end up being me, once I get to a point where I wanted to use clodl imminently. (This may be soon enough, but perhaps not soon enough for folks who want an ephemeral nix bazel build yesterday.)

Now, on to why this flag sits here. I think this flag ended up here because I pegged the bazel version to 5.4.0 using .bazelverson, for other reasons; but the clodl default bazel version used is 3.x.x, which then fights with this version choice. Sadly, bazel is wonky like this, and changing major versions ends up too disruptive.

I suspect that MacOS side would be broken too. But I still think that the correct choice of the interpreter is essential, since the target system may use a different interpreter version. Otherwise clodl-produced archives would not be portable in the way that clodl claims them to be.

trap "rm -rf '\\$$tmpdir'" EXIT
unzip -q "\\$$0" -d "\\$$tmpdir" 2> /dev/null || true
"\\$$tmpdir/clodl-top0"
"\\$$tmpdir/ld-linux-x86-64.so.2" --library-path "\\$$tmpdir" "\\$$tmpdir/clodl-top0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work in OSX? Is the current invocation of clodl-top0 problematic for bazel_local_nix?

Copy link
Author

@filmil filmil May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly doesn't work without this on linux. The issue is that otherwise the interpreter is taken to live in /nix/..., which is not true on systems with an ephemeral nix installation. But since today you can only build clodl on systems that do have an ephemeral nix installation, you may never see this issue. (I.e. it works today for clodl only by accident.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would probably want to package a nix interpreter in to get things to work reliably everywhere, not just on systems that already have the nix store installed.
But that may complicate the licensing story, so... IANAL, not sure what to tell you.

@@ -0,0 +1 @@
third_party/bazel_local_nix/tools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this folder populated by bazel --max_idle_secs=1 run @bazel_local_nix//:install?

Looks empty to me otherwise, and the installation instructions in bazel_local_nix don't mention adding a submodule and symlinking it like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the modified approach does not require this at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify.

This PR is the first approach I made here. I have then modified the approach based on the feedback I got on the PR.

I thought I had updated the PR files as well, but apparently I have not. Which made this review more confusing than it should have been. Sorry about that. The PR should not contain .gitmodules nor any other submodule additions. Y'all noted that it's too onerous, and it is. So I removed it.

The tools symlink is also suspect.

@@ -0,0 +1,3 @@
[submodule "third_party/bazel_local_nix"]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A submodule is no longer needed with the modified approach.

@@ -66,7 +66,6 @@ binary_closure(
excludes = [
"^/System/",
"^/usr/lib/",
"ld-linux-x86-64\\.so.*",
Copy link
Author

@filmil filmil May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ELF interpreter on my machine (/bin/ld-linux-x86-64.ld.so or whatever) was incompatible with the closure bundle built by clodl. Running the clodl bundle using it ended with SIGSEGV.

So I had to include the interpreter binary from the ephemeral nix store into the clodl-generated closure and instruct clodl to use it instead of the system ELF interpreter.

I think this is a bug in clodl that should be fixed. You probably won't see it if you use this on systems that already have nix installed. But ephemeral nix support brings the issue up to the surface, since an ephemeral nix store may not be available at all outside a bazel sandbox, on systems that don't have a system-wide nix installation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. It is worth considering how to deal with this ELF interpreters separately.

I'll close this PR then, given that it is not intended to be merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. Thanks for all the advice you all gave in this PR. I think the goal has been achieved, the ephemeral approach is now way better than what I originally had, and anyone who needs it can use it today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants