You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a 6-nodes microk8s cluster, VMs on a three-nodes proxmox environment.
mk8s1, mk8s4 are on proxmox-node1, mk8s1 has GPU passed-through
mk8s2, mk8s5 are on proxmox-node1
mk8s3, mk8s6 are on proxmox-node1, mk8s3 has GPU passed-through
I do expect nfd/gpu-operator to find and label mk8s1 and mk8s3 with gpu, not mk8s4 and mk8s6 (on the same proxmox nodes)
mk8s2 and mk8s5, on a proxmox node without GPU, are not labeled.
Thing is that mk8s4 has no gpu but gets labeled anyway:
ubuntu@mk8s4:~$ modinfo nvidia | grep ^version
modinfo: ERROR: Module nvidia not found.
ubuntu@mk8s4:~$ nvidia-smi
Command 'nvidia-smi' not found, but can be installed with:
sudo apt install nvidia-utils-525 # version 525.147.05-0ubuntu1, or
sudo apt install nvidia-utils-525-server # version 525.147.05-0ubuntu1
sudo apt install nvidia-utils-470 # version 470.256.02-0ubuntu0.24.04.1
sudo apt install nvidia-utils-470-server # version 470.256.02-0ubuntu0.24.04.1
sudo apt install nvidia-utils-535 # version 535.183.01-0ubuntu0.24.04.1
sudo apt install nvidia-utils-535-server # version 535.216.01-0ubuntu0.24.04.1
sudo apt install nvidia-utils-550 # version 550.120-0ubuntu0.24.04.1
sudo apt install nvidia-utils-550-server # version 550.127.05-0ubuntu0.24.04.1
ubuntu@mk8s4:~$ kubectl get nodes mk8s4 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
mk8s4 Ready <none> 27d v1.31.5 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,feature.node.kubernetes.io/cpu-cpuid.ADX=true,feature.node.kubernetes.io/cpu-cpuid.AESNI=true,feature.node.kubernetes.io/cpu-cpuid.AVX2=true,feature.node.kubernetes.io/cpu-cpuid.AVX=true,feature.node.kubernetes.io/cpu-cpuid.AVXVNNI=true,feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true,feature.node.kubernetes.io/cpu-cpuid.FLUSH_L1D=true,feature.node.kubernetes.io/cpu-cpuid.FMA3=true,feature.node.kubernetes.io/cpu-cpuid.FSRM=true,feature.node.kubernetes.io/cpu-cpuid.FXSR=true,feature.node.kubernetes.io/cpu-cpuid.FXSROPT=true,feature.node.kubernetes.io/cpu-cpuid.GFNI=true,feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true,feature.node.kubernetes.io/cpu-cpuid.IA32_ARCH_CAP=true,feature.node.kubernetes.io/cpu-cpuid.IBPB=true,feature.node.kubernetes.io/cpu-cpuid.IBRS=true,feature.node.kubernetes.io/cpu-cpuid.LAHF=true,feature.node.kubernetes.io/cpu-cpuid.MD_CLEAR=true,feature.node.kubernetes.io/cpu-cpuid.MOVBE=true,feature.node.kubernetes.io/cpu-cpuid.MOVDIR64B=true,feature.node.kubernetes.io/cpu-cpuid.MOVDIRI=true,feature.node.kubernetes.io/cpu-cpuid.OSXSAVE=true,feature.node.kubernetes.io/cpu-cpuid.SERIALIZE=true,feature.node.kubernetes.io/cpu-cpuid.SHA=true,feature.node.kubernetes.io/cpu-cpuid.SPEC_CTRL_SSBD=true,feature.node.kubernetes.io/cpu-cpuid.STIBP=true,feature.node.kubernetes.io/cpu-cpuid.STOSB_SHORT=true,feature.node.kubernetes.io/cpu-cpuid.SYSCALL=true,feature.node.kubernetes.io/cpu-cpuid.SYSEE=true,feature.node.kubernetes.io/cpu-cpuid.VAES=true,feature.node.kubernetes.io/cpu-cpuid.VMX=true,feature.node.kubernetes.io/cpu-cpuid.VPCLMULQDQ=true,feature.node.kubernetes.io/cpu-cpuid.WAITPKG=true,feature.node.kubernetes.io/cpu-cpuid.X87=true,feature.node.kubernetes.io/cpu-cpuid.XGETBV1=true,feature.node.kubernetes.io/cpu-cpuid.XSAVE=true,feature.node.kubernetes.io/cpu-cpuid.XSAVEC=true,feature.node.kubernetes.io/cpu-cpuid.XSAVEOPT=true,feature.node.kubernetes.io/cpu-cpuid.XSAVES=true,feature.node.kubernetes.io/cpu-hardware_multithreading=false,feature.node.kubernetes.io/cpu-model.family=6,feature.node.kubernetes.io/cpu-model.id=186,feature.node.kubernetes.io/cpu-model.vendor_id=Intel,feature.node.kubernetes.io/kernel-config.NO_HZ=true,feature.node.kubernetes.io/kernel-config.NO_HZ_FULL=true,feature.node.kubernetes.io/kernel-version.full=6.8.0-53-generic,feature.node.kubernetes.io/kernel-version.major=6,feature.node.kubernetes.io/kernel-version.minor=8,feature.node.kubernetes.io/kernel-version.revision=0,feature.node.kubernetes.io/pci-1234.present=true,feature.node.kubernetes.io/pci-1af4.present=true,feature.node.kubernetes.io/storage-nonrotationaldisk=true,feature.node.kubernetes.io/system-os_release.ID=ubuntu,feature.node.kubernetes.io/system-os_release.VERSION_ID.major=24,feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=04,feature.node.kubernetes.io/system-os_release.VERSION_ID=24.04,kubernetes.io/arch=amd64,kubernetes.io/hostname=mk8s4,kubernetes.io/os=linux,microk8s.io/cluster=true,node.kubernetes.io/microk8s-controlplane=microk8s-controlplane,nvidia.com/cuda.driver-version.full=550.127.08,nvidia.com/cuda.driver-version.major=550,nvidia.com/cuda.driver-version.minor=127,nvidia.com/cuda.driver-version.revision=08,nvidia.com/cuda.driver.major=550,nvidia.com/cuda.driver.minor=127,nvidia.com/cuda.driver.rev=08,nvidia.com/cuda.runtime-version.full=12.4,nvidia.com/cuda.runtime-version.major=12,nvidia.com/cuda.runtime-version.minor=4,nvidia.com/cuda.runtime.major=12,nvidia.com/cuda.runtime.minor=4,nvidia.com/gfd.timestamp=1737098640,nvidia.com/gpu.compute.major=7,nvidia.com/gpu.compute.minor=5,nvidia.com/gpu.count=1,nvidia.com/gpu.family=turing,nvidia.com/gpu.machine=Standard-PC-i440FX-PIIX-1996,nvidia.com/gpu.memory=8192,nvidia.com/gpu.mode=graphics,nvidia.com/gpu.product=NVIDIA-T1000-8GB,nvidia.com/gpu.replicas=4,nvidia.com/gpu.sharing-strategy=time-slicing,nvidia.com/mig.capable=false,nvidia.com/mig.strategy=single,nvidia.com/mps.capable=false,nvidia.com/vgpu.present=false
The actual gpu-operator pods only run on the right nodes (mk8s1 and mk8s3):
Actually, there's no real effect on the functionality (only the right microk8s nodes get scheduled for the gpu, it seems), but I do not understand why the labels get attached to nodes without the gpu if they run on proxmox nodes with gpu, as if the nfd/gpu-operator see the host's gpu.
Thank you.
The text was updated successfully, but these errors were encountered:
GFD will respond based upon the label added by NFD. NFD will label based on the discovered physical PCI devices. VMs typically surface the PCI devices even if one hasn't been explicitly passed through. This is probably why you're seeing that result.
Hi, I have a 6-nodes microk8s cluster, VMs on a three-nodes proxmox environment.
I do expect nfd/gpu-operator to find and label mk8s1 and mk8s3 with gpu, not mk8s4 and mk8s6 (on the same proxmox nodes)
mk8s2 and mk8s5, on a proxmox node without GPU, are not labeled.
Thing is that mk8s4 has no gpu but gets labeled anyway:
The actual gpu-operator pods only run on the right nodes (mk8s1 and mk8s3):
Actually, there's no real effect on the functionality (only the right microk8s nodes get scheduled for the gpu, it seems), but I do not understand why the labels get attached to nodes without the gpu if they run on proxmox nodes with gpu, as if the nfd/gpu-operator see the host's gpu.
Thank you.
The text was updated successfully, but these errors were encountered: