-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnvidia-container: 1.9.0 -> 1.16.2 #347867
libnvidia-container: 1.9.0 -> 1.16.2 #347867
Conversation
pkgs/by-name/li/libnvidia-container/fix-library-resolving.patch
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly fail to remember what the difference was, but I'll just dump this here for reference: https://github.com/NixOS/nixpkgs/pull/279235/files#diff-2b4dc4504c07052fdeb991c058ab1cd1b3fc215f2475fddab960ebea2db772e7R94-R98. With the original patch, does libnvidia-container still run ldconfig in non-nixos environments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would. But I think it's correct that it doesn't. On non NixOS-systems, we'd probably don't want the impurities caused by this neither, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of drivers we actually kind of do, because they usually aren't at the nixos' predictable location. The dynamic loading behaviour we want pretty much for all apps is:
- Test LD_PRELAOD and LD_LIBRARY_PATH because these are meant to be the "overrides"
- Try the "pure" /nix/store paths flashed into the DT_RUNPATHs (or equivalents, in future)
- If loading a kernel-locked userspace driver, try nixos' predictable impure locations (
@driverLink@/lib
) - If loading a kernel-locked userspace driver, try the normal "fhs" flow with the global
/etc/ld.so.{cache,conf}
but ensure some kind of isolation (cf. e.g. the libcapsule discussions elsewhere on github and on matrix). We never implemented this last bit in any consistent manner because complications, but in principle it's something to consider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think I need to give this another go to actually understand it more thoroughly. The current patch should have the exact same behaviour as the old one, so it would at least not cause any further regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a more thorough review and think that ldconfig
is ran on all systems (NixOS and non-NixOS), but the system cache (i.e. /etc/ld.so.conf
and /etc/ld.so.cache
) are not used. Instead, the special /tmp/ld.so.conf.nvidia-host
and /tmp/ld.so.cache.nvidia-host
directories are used for the resolving of nvidia libraries. This should (under the assumption of nothing else using this cache) not cause any impurities on either system. This essentially ensures the precedence of steps 0 through 2 in your list, but not 3. I would also advocate to not implement step 3 here, as I really wouldn't like to maintain such a complex patch (for a library which is unnecessarily complex already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also advocate to not implement step 3
Oh this definitely wouldn't be implement here, it's a much larger scale effort.
not cause any impurities on either system
I think for libcuda.so on non-NixOS we do want an impurity, but let's merge this PR regardless and address failures as they come
73187a5
to
7fad305
Compare
7fad305
to
60b56ab
Compare
@ofborg eval |
- argv = (char * []){cnt->cfg.ldconfig, cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL}; | ||
|
||
- argv = (char * []){cnt->cfg.ldconfig, "-f", "/etc/ld.so.conf", "-C", "/etc/ld.so.cache", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL}; | ||
+ argv = (char * []){cnt->cfg.ldconfig, "-f", "/tmp/ld.so.conf.nvidia-host", "-C", "/tmp/ld.so.cache.nvidia-host", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw this is not guaranteed to be writable either /tmp/ld.so.conf.nvidia-host
EDIT: but it's existing code
-e 's/^GIT_TAG ?=.*/GIT_TAG = ${version}/' \ | ||
-e 's/^GIT_COMMIT ?=.*/GIT_COMMIT = ${src.rev}/' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as makeFlags = [ "GIT_TAG=..." ...]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(can improve in a follow-up)
Update to
1.16.2
.Possibly we should be dropping the binary lookup patch, as
/run/nvidia-docker
seems to be unused now.Also includes security fixes.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.