Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnvidia-container: 1.9.0 -> 1.16.2 #347867

Merged
merged 2 commits into from
Oct 21, 2024

Conversation

msanft
Copy link
Contributor

@msanft msanft commented Oct 11, 2024

Update to 1.16.2.

Possibly we should be dropping the binary lookup patch, as /run/nvidia-docker seems to be unused now.

Also includes security fixes.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@msanft msanft added the 1.severity: security Issues which raise a security issue, or PRs that fix one label Oct 11, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly fail to remember what the difference was, but I'll just dump this here for reference: https://github.com/NixOS/nixpkgs/pull/279235/files#diff-2b4dc4504c07052fdeb991c058ab1cd1b3fc215f2475fddab960ebea2db772e7R94-R98. With the original patch, does libnvidia-container still run ldconfig in non-nixos environments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would. But I think it's correct that it doesn't. On non NixOS-systems, we'd probably don't want the impurities caused by this neither, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of drivers we actually kind of do, because they usually aren't at the nixos' predictable location. The dynamic loading behaviour we want pretty much for all apps is:

  1. Test LD_PRELAOD and LD_LIBRARY_PATH because these are meant to be the "overrides"
  2. Try the "pure" /nix/store paths flashed into the DT_RUNPATHs (or equivalents, in future)
  3. If loading a kernel-locked userspace driver, try nixos' predictable impure locations (@driverLink@/lib)
  4. If loading a kernel-locked userspace driver, try the normal "fhs" flow with the global /etc/ld.so.{cache,conf} but ensure some kind of isolation (cf. e.g. the libcapsule discussions elsewhere on github and on matrix). We never implemented this last bit in any consistent manner because complications, but in principle it's something to consider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think I need to give this another go to actually understand it more thoroughly. The current patch should have the exact same behaviour as the old one, so it would at least not cause any further regressions.

Copy link
Contributor Author

@msanft msanft Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a more thorough review and think that ldconfig is ran on all systems (NixOS and non-NixOS), but the system cache (i.e. /etc/ld.so.conf and /etc/ld.so.cache) are not used. Instead, the special /tmp/ld.so.conf.nvidia-host and /tmp/ld.so.cache.nvidia-host directories are used for the resolving of nvidia libraries. This should (under the assumption of nothing else using this cache) not cause any impurities on either system. This essentially ensures the precedence of steps 0 through 2 in your list, but not 3. I would also advocate to not implement step 3 here, as I really wouldn't like to maintain such a complex patch (for a library which is unnecessarily complex already).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also advocate to not implement step 3

Oh this definitely wouldn't be implement here, it's a much larger scale effort.

not cause any impurities on either system

I think for libcuda.so on non-NixOS we do want an impurity, but let's merge this PR regardless and address failures as they come

@ofborg ofborg bot requested a review from cpcloud October 11, 2024 10:00
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Oct 11, 2024
@msanft msanft force-pushed the msanft/libnvidia-container/1.16.2 branch from 73187a5 to 7fad305 Compare October 11, 2024 12:17
@msanft msanft force-pushed the msanft/libnvidia-container/1.16.2 branch from 7fad305 to 60b56ab Compare October 14, 2024 06:49
@msanft msanft marked this pull request as ready for review October 14, 2024 06:49
@msanft msanft requested a review from katexochen October 14, 2024 06:50
@SomeoneSerge
Copy link
Contributor

@ofborg eval

- argv = (char * []){cnt->cfg.ldconfig, cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};

- argv = (char * []){cnt->cfg.ldconfig, "-f", "/etc/ld.so.conf", "-C", "/etc/ld.so.cache", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};
+ argv = (char * []){cnt->cfg.ldconfig, "-f", "/tmp/ld.so.conf.nvidia-host", "-C", "/tmp/ld.so.cache.nvidia-host", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};
Copy link
Contributor

@SomeoneSerge SomeoneSerge Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw this is not guaranteed to be writable either /tmp/ld.so.conf.nvidia-host
EDIT: but it's existing code

Comment on lines +64 to +65
-e 's/^GIT_TAG ?=.*/GIT_TAG = ${version}/' \
-e 's/^GIT_COMMIT ?=.*/GIT_COMMIT = ${src.rev}/' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as makeFlags = [ "GIT_TAG=..." ...]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can improve in a follow-up)

@SomeoneSerge SomeoneSerge merged commit 99729d5 into NixOS:master Oct 21, 2024
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.severity: security Issues which raise a security issue, or PRs that fix one 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants