You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I install the vgpu-manager, the error appears to be /usr/local/bin/nvidia-driver: line 1: popd: directory stack empty. I check dmesg log in the node it gives me Direct firmware load for nvidia/550.54.10/gsp_ga10x.bin failed with error -2. It looks like the firmware not loaded properly
Can somebody help me how to resolve this error with step by step reference?
3. Information to attach (optional if deemed irrelevant)
Logs pod nvidia-vgpu-manager-daemonset
+ DRIVER_VERSION=550.54.10
+ DRIVER_ARCH=x86_64
+ DRIVER_RESET_RETRIES=10
++ uname -r
+ KERNEL_VERSION=5.15.0-117-generic
+ RUN_DIR=/run/nvidia
+ export DEBIAN_FRONTEND=noninteractive
+ DEBIAN_FRONTEND=noninteractive
+ '[' 1 -eq 0 ']'
+ command=init
+ shift
+ case "${command}" in
++ getopt -l accept-license -o a --
+ options=' --'
+ '[' 0 -ne 0 ']'
+ eval set -- ' --'
++ set -- --
+ ACCEPT_LICENSE=
++ uname -r
+ KERNEL_VERSION=5.15.0-117-generic
+ PRIVATE_KEY=
+ PACKAGE_TAG=
+ for opt in ${options}
+ case "$opt" in
+ shift
+ break
+ '[' 0 -ne 0 ']'
+ init
+ trap 'echo '\''Caught signal'\''; exit 1' HUP INT QUIT PIPE TERM
+ trap _shutdown EXIT
+ _unload_driver
+ rmmod_args=()
+ local rmmod_args
+ local nvidia_deps=0
+ local nvidia_refs=0
+ local nvidia_vgpu_vfio_refs=0
+ echo 'Stopping NVIDIA vGPU Manager...'
+ '[' -f /var/run/nvidia-vgpu-mgr/nvidia-vgpu-mgr.pid ']'
+ echo 'Unloading NVIDIA driver kernel modules...'
+ '[' -f /sys/module/nvidia_vgpu_vfio/refcnt ']'
Stopping NVIDIA vGPU Manager...
Unloading NVIDIA driver kernel modules...
+ '[' -f /sys/module/nvidia/refcnt ']'
+ nvidia_refs=0
+ rmmod_args+=("nvidia")
+ '[' 1 -gt 0 ']'
+ rmmod nvidia
+ '[' 0 '!=' 0 ']'
+ return 0
+ _unmount_rootfs
Unmounting NVIDIA driver rootfs...
+ echo 'Unmounting NVIDIA driver rootfs...'
+ findmnt -r -o TARGET
+ grep /run/nvidia/driver
+ umount -l -R /run/nvidia/driver
Updating the package cache...
+ _update_package_cache
+ '[' '' '!=' builtin ']'
+ echo 'Updating the package cache...'
+ apt-get -qq update
+ _resolve_kernel_version
++ apt-cache show linux-headers-5.15.0-117-generic
++ sed -nE 's/^Version:\s+(([0-9]+\.){2}[0-9]+)[-.]([0-9]+).*/\1-\3/p'
++ head -1
+ local version=5.15.0-117
++ echo 5.15.0-117-generic
++ sed 's/[^a-z]*//'
++ grep -Ev '^generic|virtual'
+ local flavor=
+ echo 'Resolving Linux kernel version...'
+ '[' -z 5.15.0-117 ']'
Resolving Linux kernel version...
+ KERNEL_VERSION=5.15.0-117-generic
+ echo 'Proceeding with Linux kernel version 5.15.0-117-generic'
+ return 0
Proceeding with Linux kernel version 5.15.0-117-generic
+ _install_prerequisites
++ mktemp -d
+ local tmp_dir=/tmp/tmp.kG79fBJSB3
+ trap 'popd; rm -rf /tmp/tmp.kG79fBJSB3' RETURN EXIT
+ pushd /tmp/tmp.kG79fBJSB3
/tmp/tmp.kG79fBJSB3 /driver
+ rm -rf /lib/modules/5.15.0-117-generic
+ mkdir -p /lib/modules/5.15.0-117-generic/proc
+ echo 'Installing Linux kernel headers...'
Installing Linux kernel headers...
+ apt-get -qq install --no-install-recommends linux-headers-5.15.0-117-generic
+ echo 'Installing Linux kernel module files...'
+ apt-get -qq download linux-image-5.15.0-117-generic
Installing Linux kernel module files...
+ dpkg -x linux-image-5.15.0-117-generic_5.15.0-117.127_amd64.deb .
+ mv lib/modules/5.15.0-117-generic/modules.builtin lib/modules/5.15.0-117-generic/modules.builtin.modinfo lib/modules/5.15.0-117-generic/modules.order /lib/modules/5.15.0-117-generic
+ mv lib/modules/5.15.0-117-generic/kernel /lib/modules/5.15.0-117-generic
+ depmod 5.15.0-117-generic
+ echo 'Generating Linux kernel version string...'
Generating Linux kernel version string...
+ file boot/vmlinuz-5.15.0-117-generic
+ awk 'BEGIN { RS="," } $1=="version" { print $2 }' -
+ '[' -z 5.15.0-117-generic ']'
+ mv version /lib/modules/5.15.0-117-generic/proc
/driver
++ popd
++ rm -rf /tmp/tmp.kG79fBJSB3
Creating '/dev/char' directory
+ _create_dev_char_directory
+ '[' '!' -d /dev/char ']'
+ echo 'Creating '\''/dev/char'\'' directory'
+ mkdir -p /dev/char
+ _install_driver
++ mktemp -d
+ local tmp_dir=/tmp/tmp.GGwiHUdShK
+ sh NVIDIA-Linux-x86_64-550.54.10-vgpu-kvm.run --ui=none --no-questions --tmpdir /tmp/tmp.GGwiHUdShK --no-systemd
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.54.10......................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Welcome to the NVIDIA Software Installer for Unix/Linux
Detected 128 CPUs online; setting concurrency level to 32.
Unable to locate any tools for listing initramfs contents.
Unable to scan initramfs: no tool found
This system requires use of the NVIDIA open kernel modules; these will be selected by default.
Installing NVIDIA driver version 550.54.10.
Performing CC sanity check with CC="/usr/bin/cc".
Performing CC check.
Kernel source path: '/lib/modules/5.15.0-117-generic/build'
Kernel output path: '/lib/modules/5.15.0-117-generic/build'
Performing Compiler check.
Performing Dom0 check.
Performing Xen check.
Performing PREEMPT_RT check.
Performing vgpu_kvm check.
Cleaning kernel module build directory.
Building kernel modules:
[##############################] 100%
Kernel module compilation complete.
Kernel messages:
[ 3941.692907] nvidia 0000:03:00.0: driver left SR-IOV enabled after remove
[ 3941.693251] nvidia 0000:64:00.0: driver left SR-IOV enabled after remove
[ 3941.693477] nvidia 0000:63:00.0: driver left SR-IOV enabled after remove
[ 3941.693818] nvidia 0000:e4:00.0: driver left SR-IOV enabled after remove
[ 3941.694212] nvidia 0000:e3:00.0: driver left SR-IOV enabled after remove
[ 3941.694639] NVOC: __nvoc_objDelete: Child class OBJIOVASPACE not freed from parent class OBJVMM.
[ 3941.694790] nvidia-nvlink: Unregistered Nvlink Core, major device number 499
[ 3989.137665] nvidia-nvlink: Nvlink Core is being initialized, major device number 499
[ 3989.137675] NVRM: The NVIDIA probe routine was not called for 256 device(s).
[ 3989.570567] NVRM: This can occur when another driver was loaded and
NVRM: obtained ownership of the NVIDIA device(s).
[ 3989.570570] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 3989.570590] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 550.54.10 Release Build (dvs-builder@U16-I3-B13-2-1) Wed Feb 14 16:21:59 UTC 2024
[ 3989.716774] nvidia 0000:84:00.0: driver left SR-IOV enabled after remove
[ 3989.717546] nvidia 0000:83:00.0: driver left SR-IOV enabled after remove
[ 3989.718001] nvidia 0000:04:00.0: driver left SR-IOV enabled after remove
[ 3989.718288] nvidia 0000:03:00.0: driver left SR-IOV enabled after remove
[ 3989.718588] nvidia 0000:64:00.0: driver left SR-IOV enabled after remove
[ 3989.719319] nvidia 0000:63:00.0: driver left SR-IOV enabled after remove
[ 3989.719774] nvidia 0000:e4:00.0: driver left SR-IOV enabled after remove
[ 3989.720154] nvidia 0000:e3:00.0: driver left SR-IOV enabled after remove
[ 3989.720839] nvidia-nvlink: Unregistered Nvlink Core, major device number 499
Searching for conflicting files:: Searching
[##############################] 100%
Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (550.54.10):: Installing
[# ] 0%
Unable to determine whether NVIDIA kernel modules are present in the initramfs. Existing NVIDIA kernel modules in the initramfs, if any, may interfere with the newly installed driver.
[##############################] 100%
Driver file installation is complete.
Running distribution scripts: Executing /usr/lib/nvidia/post-install
[##############################] 100%
Running post-install sanity check:: Checking
[##############################] 100%
Post-install sanity check passed.
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 550.54.10) is now complete.
+ _load_driver
+ /usr/bin/nvidia-vgpud
+ '[' '!' -f /sys/module/nvidia_vgpu_vfio/refcnt ']'
+ /usr/bin/nvidia-vgpu-mgr
+ '[' '!' -f /sys/module/nvidia/refcnt ']'
+ return 0
+ _mount_rootfs
+ echo 'Mounting NVIDIA driver rootfs...'
+ mount -o remount,rw /sys
Mounting NVIDIA driver rootfs...
+ mount --make-runbindable /sys
+ mount --make-private /sys
+ mkdir -p /run/nvidia/driver
+ mount --rbind / /run/nvidia/driver
+ _enable_vfs
+ local retry
+ (( retry = 0 ))
+ (( retry <= 10 ))
+ /usr/lib/nvidia/sriov-manage -e ALL
GPU at 0000:03:00.0 already has VFs enabled.
GPU at 0000:04:00.0 already has VFs enabled.
GPU at 0000:63:00.0 already has VFs enabled.
GPU at 0000:64:00.0 already has VFs enabled.
GPU at 0000:83:00.0 already has VFs enabled.
GPU at 0000:84:00.0 already has VFs enabled.
GPU at 0000:e3:00.0 already has VFs enabled.
GPU at 0000:e4:00.0 already has VFs enabled.
+ return 0
+ pgrep nvidia-vgpu-mgr
+ nvidia-vgpud
+ echo 'Restarting nvidia-vgpu-mgr after previously killed'
+ nvidia-vgpu-mgr
Restarting nvidia-vgpu-mgr after previously killed
+ set +x
Done, now waiting for signal
ERROR: nvidia-vgpu-mgr daemon is no longer running. Exiting.
/usr/local/bin/nvidia-driver: line 1: popd: directory stack empty
Dmesg log
Direct firmware load for nvidia/550.54.10/gsp_ga10x.bin failed with error -2
Kernel Version
Check GSP Firmware Version (N/A Value)
for gpu in /proc/driver/nvidia/gpus/*/information; do
echo "File: $gpu"
cat "$gpu"
echo "-----------------------------"
done
File: /proc/driver/nvidia/gpus/0000:03:00.0/information
Model: NVIDIA L40S
IRQ: 94
GPU UUID: GPU-d03ff6db-34c7-dc00-484c-3adc1cc61b03
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:03:00.0
Device Minor: 4
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:04:00.0/information
Model: NVIDIA L40S
IRQ: 58
GPU UUID: GPU-0418f843-80fe-7d93-cb41-72ecf0a117de
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:04:00.0
Device Minor: 5
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:63:00.0/information
Model: NVIDIA L40S
IRQ: 91
GPU UUID: GPU-6c383654-2e10-7167-a1e0-fb8e8ba4b7bc
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:63:00.0
Device Minor: 2
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:64:00.0/information
Model: NVIDIA L40S
IRQ: 51
GPU UUID: GPU-6d0940a2-6511-6aa0-2255-ce36f96b530b
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:64:00.0
Device Minor: 3
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:83:00.0/information
Model: NVIDIA L40S
IRQ: 890
GPU UUID: GPU-9a36f396-473f-b5b7-ba8c-b6e6c2cfd93e
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:83:00.0
Device Minor: 6
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:84:00.0/information
Model: NVIDIA L40S
IRQ: 70
GPU UUID: GPU-f6916d2a-2c75-c840-5106-af1e5b80f25c
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:84:00.0
Device Minor: 7
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:e3:00.0/information
Model: NVIDIA L40S
IRQ: 889
GPU UUID: GPU-71d50d8a-31d7-2028-4bca-e728fe84441c
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:e3:00.0
Device Minor: 0
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
File: /proc/driver/nvidia/gpus/0000:e4:00.0/information
Model: NVIDIA L40S
IRQ: 44
GPU UUID: GPU-e580ef25-9fe7-74f7-33c9-03bfa563ebb2
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:e4:00.0
Device Minor: 1
GPU Firmware: N/A
GPU Excluded: No
-----------------------------
The text was updated successfully, but these errors were encountered:
1. Quick Debug Information
2. Issue or feature description
When I install the vgpu-manager, the error appears to be
/usr/local/bin/nvidia-driver: line 1: popd: directory stack empty
. I check dmesg log in the node it gives meDirect firmware load for nvidia/550.54.10/gsp_ga10x.bin failed with error -2
. It looks like the firmware not loaded properlyCan somebody help me how to resolve this error with step by step reference?
3. Information to attach (optional if deemed irrelevant)
nvidia-vgpu-manager-daemonset
The text was updated successfully, but these errors were encountered: