-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kmod-5.15-nvidia sources #2455
Add kmod-5.15-nvidia sources #2455
Conversation
soname="$(%{_cross_target}-readelf -d "${lib}" | awk '/SONAME/{print $5}' | tr -d '[]')" | ||
[ -n "${soname}" ] || continue | ||
[ "${lib}" == "${soname}" ] && continue | ||
[ -e %{buildroot}/%{tesla_515_libdir}/"${soname}" ] && continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What file is this guard catching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few libraries in the binary for which the soname'd file is provided in the archive:
libnvidia-gtk3.so.515.65.01
libnvidia-eglcore.so.515.65.01
libnvidia-rtcore.so.515.65.01
libGLX.so.0
libnvidia-glvkspirv.so.515.65.01
libnvidia-tls.so.515.65.01
libnvidia-gtk2.so.515.65.01
libGLdispatch.so.0
libnvidia-wayland-client.so.515.65.01
libnvidia-compiler.so.515.65.01
libnvidia-glsi.so.515.65.01
libnvidia-glcore.so.515.65.01
libOpenGL.so.0
However, there are a few that still require the symlink:
libGLESv2_nvidia.so.515.65.01
libnvoptix.so.515.65.01
libnvidia-egl-wayland.so.1.1.9
libGL.so.1.7.0
libnvidia-allocator.so.515.65.01
libvdpau_nvidia.so.515.65.01
libnvidia-ngx.so.515.65.01
libEGL.so.1.1.0
libnvidia-nvvm.so.515.65.01
libnvidia-encode.so.515.65.01
libGLX_nvidia.so.515.65.01
libGLESv2.so.2.1.0
libnvidia-egl-gbm.so.1.1.0
libEGL.so.515.65.01
libOpenCL.so.1.0.0
libnvidia-fbc.so.515.65.01
libnvidia-opticalflow.so.515.65.01
libGLESv1_CM.so.1.2.0
libGLESv1_CM_nvidia.so.515.65.01
libnvidia-cfg.so.515.65.01
libnvidia-ptxjitcompiler.so.515.65.01
libnvidia-opencl.so.515.65.01
libnvidia-ml.so.515.65.01
libcuda.so.515.65.01
libEGL_nvidia.so.515.65.01
libnvcuvid.so.515.65.01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few libraries in the binary for which the soname'd file is provided in the archive
The "file" or the "symlink"? If we test (via [ -L ${link} ]
) that it is already a link then OK. If it is a regular file with the same name as the library, then that is not really OK, because it creates ambiguity as to which one the dynamic loader will select.
I would tend to prefer testing that it's a link and then removing it and recreating our own. If it exists and it's not a link, do something else - diff it against the target and remove it if they're the same, then create our link. Otherwise it's an exceptional case and I need more details to advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake while I was evaluating the symlinks that existed for each library (fish-shell bit me 💢 ). Anyways, I confirmed that neither the symlinks nor the files exist in the NVIDIA run archives for these libraries
Missing 'libnvoptix.so.1' for 'libnvoptix.so.515.65.01'
Missing 'libnvidia-egl-wayland.so.1' for 'libnvidia-egl-wayland.so.1.1.9'
Missing 'libGL.so.1' for 'libGL.so.1.7.0'
Missing 'libnvidia-allocator.so.1' for 'libnvidia-allocator.so.515.65.01'
Missing 'libvdpau_nvidia.so.1' for 'libvdpau_nvidia.so.515.65.01'
Missing 'libnvidia-ngx.so.1' for 'libnvidia-ngx.so.515.65.01'
Missing 'libEGL.so.1' for 'libEGL.so.1.1.0'
Missing 'libnvidia-nvvm.so.4' for 'libnvidia-nvvm.so.515.65.01'
Missing 'libnvidia-encode.so.1' for 'libnvidia-encode.so.515.65.01'
Missing 'libGLX_nvidia.so.0' for 'libGLX_nvidia.so.515.65.01'
Missing 'libGLESv2.so.2' for 'libGLESv2.so.2.1.0'
Missing 'libnvidia-egl-gbm.so.1' for 'libnvidia-egl-gbm.so.1.1.0'
Missing 'libEGL.so.1' for 'libEGL.so.515.65.01'
Missing 'libOpenCL.so.1' for 'libOpenCL.so.1.0.0'
Missing 'libnvidia-fbc.so.1' for 'libnvidia-fbc.so.515.65.01'
Missing 'libnvidia-opticalflow.so.1' for 'libnvidia-opticalflow.so.515.65.01'
Missing 'libGLESv1_CM.so.1' for 'libGLESv1_CM.so.1.2.0'
Missing 'libGLESv1_CM_nvidia.so.1' for 'libGLESv1_CM_nvidia.so.515.65.01'
Missing 'libnvidia-cfg.so.1' for 'libnvidia-cfg.so.515.65.01'
Missing 'libnvidia-ptxjitcompiler.so.1' for 'libnvidia-ptxjitcompiler.so.515.65.01'
Missing 'libnvidia-opencl.so.1' for 'libnvidia-opencl.so.515.65.01'
Missing 'libnvidia-ml.so.1' for 'libnvidia-ml.so.515.65.01'
Missing 'libcuda.so.1' for 'libcuda.so.515.65.01'
Missing 'libEGL_nvidia.so.0' for 'libEGL_nvidia.so.515.65.01'
Missing 'libnvcuvid.so.1' for 'libnvcuvid.so.515.65.01'
This is the script that I used to verify which libraries are missing their SONAME
symlink, and which don't need it:
#! /usr/bin/env bash
for lib in $(find . -maxdepth 1 -type f -name 'lib*.so.*' -printf '%P\n'); do
soname="$(readelf -d "${lib}" | awk '/SONAME/{print $5}' | tr -d '[]')"
[ -n "${soname}" ] || continue
[ "${lib}" == "${soname}" ] && continue
[ ! -e "${soname}" ] && echo "Missing '${soname}' for '${lib}'"
done
%global tesla_515 515.65.01 | ||
%global tesla_515_libdir %{_cross_libdir}/nvidia/tesla/%{tesla_515} | ||
%global tesla_515_bindir %{_cross_libexecdir}/nvidia/tesla/bin/%{tesla_515} | ||
%global tesla_515_firmwaredir %{_cross_libdir}/firmware/nvidia/%{tesla_515} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for other reviewers: I checked out this branch and then ran:
git diff --no-index --word-diff packages/kmod-*-nvidia/kmod-*-nvidia.spec
There's a lot of churn in the diff that stems from changing %{tesla_470}
to %{tesla_515}
. That's somewhat unavoidable here but in the interests of simplifying future diffs, it might be good to go ahead and rename this macro to tesla_ver
. That way the next diff will be easier to examine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this for both kmod packages in another PR 👍
From the commit message:
Have you confirmed this works as expected? |
Yes, I'll update the description with my testing |
bea365e
to
ef4c390
Compare
Forced push includes:
|
This adds the sources to compile the 515 NVIDIA driver for the 5.15 kernel. This version only supports the GPU architectures Maxwell, Pascal, Volta, Turing, Ampere, and forward. The driver will use the GPU System Processor (GSP) feature if the underlying hardware supports it by loading the binary file `/lib/firmware/nvidia/<version>/gsp.bin`. Signed-off-by: Arnaldo Garcia Rincon <[email protected]>
ef4c390
to
4552243
Compare
( Forced push removed strange commit ) |
Issue number:
Part of #2374
Description of changes:
This change is required to release k8s-1.24-nvidia variants, since the 470 NVIDIA driver does not work with kernels > 5.10.
Testing done:
This is just a cherry pick from #2286, the only difference is the driver version, same testing applied, I ran a local variant that uses this driver and confirmed that the pods can access the GPUs:
For aarch64, I'm having problems with making a node joining a cluster (unrelated to this PR), but I can see the driver is working:
I verified that the GSP firmware was loaded in the supported architectures, as part of #2286:
[ 32.672564] NVRM: The NVIDIA Tesla K80 GPU installed in this system is [ 32.672564] NVRM: supported through the NVIDIA 470.xx Legacy drivers
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.