Here are some useful commands while working with PCI devices in Harvester.
When you enable a PCI Device for passthrough, it creates a PCIDeviceClaim, then the PCIDeviceClaim controller sees the new claim and then:
- Get the PCIDevice for the new claim
- It permits the new host device in KubeVirt
- It enables PCI passthrough on the device by binding the underlying PCI device to the vfio-pci driver
- It creates a DevicePlugin which uses a UNIX domain socket to allow KubeVirt to request devices for VMs.
- if the device's
resourceName
already has a DevicePlugin, then it adds that device to the existing deviceplugin
- if the device's
% kubectl get pcidevice janus-000004000
NAME ADDRESS VENDOR ID DEVICE ID NODE NAME DESCRIPTION KERNEL DRIVER IN USE
janus-000004000 0000:04:00.0 10de 1c02 janus VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] vfio-pci
Notice the KERNEL DRIVER IN USE
column, if it says vfio-pci
, then the underlying PCI device is ready for PCI passthrough, assuming that it's true.
But for Harvester to be able to recognize it as enabled, it needs a PCIDeviceClaim
, which should have the same name as the PCIDevice
, so run
% kubectl get pcideviceclaim janus-000004000
NAME ADDRESS NODE NAME USER NAME KERNEL DRIVER ΤΟ UNBIND PASSTHROUGH ENABLED
janus-000004000 0000:04:00.0 janus admin true
The existence of this PCIDeviceClaim with a passthrough enabled value of true
is sufficient for Harvester to recognize this device is ready for passthrough to a VM.
The next diagnostic is checking KubeVirt's config to see if the device has been permitted to be attached to a VM.
% kubectl get kubevirts.kubevirt.io -n harvester-system kubevirt -o yaml | yq .spec.configuration.permittedHostDevices.pciHostDevices
- externalResourceProvider: true
pciVendorSelector: 10de:1c02
resourceName: nvidia.com/GP106_GEFORCE_GTX_1060_3GB
- externalResourceProvider: true
pciVendorSelector: 10de:10f1
resourceName: nvidia.com/GP106_HIGH_DEFINITION_AUDIO_CONTROLLER
To get the resourceName of your device, run:
% kubectl get pcidevice janus-000004000 -o yaml | yq '.status.resourceName'
nvidia.com/GP106_GEFORCE_GTX_1060_3GB
So we can see that the device is permitted. If it's not in there, you can work around this by running kubectl edit kubevirts.kubevirt.io -n harvester-system kubevirt
and just cowboy-editing the pciHostDevices
yourself. Make sure to set externalResourceProvider
to true so that our custom deviceplugins are used.
Now, the existence of a PCIDeviceClaim
object might in principle be incorrect, if some unexpeceted condition occurs where the object becomes stale. To check what the Linux kernel says, get the PCI devices' address and then query lspci
to see if the device is actually bound to vfio-pci
# Get the PCI address
% kubectl get pcideviceclaim janus-000004000 -o yaml | yq '.spec.address'
0000:04:00.0
# SSH Into the Node
% ssh rancher@$(kubectl get pcideviceclaim janus-000004000 -o yaml | yq '.spec.nodeName')
rancher@janus:~> sudo su
janus:/home/rancher # lspci -s 0000:04:00.0 -v | tail -5
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: vfio-pci
Notice how it says vfio-pci
is currently in use. This means that the PCIDeviceClaim's kernelDriverInUse: "vfio-pci"
entry is correct.
DevicePlugins are little programs that manage a set of devices with the same resourceName. In our example, that would be nvidia.com/GP106_GEFORCE_GTX_1060_3GB
. To make this more concrete, assume you did the ssh step in part 3 above, and you are currently sshed into the node and have root privileges through sudo su
:
# Change directory to where the kubelet keeps the device plugins
janus:/home/rancher # cd /var/lib/kubelet/device-plugins/
# Look at all the device plugin sockets:
janus:/var/lib/kubelet/device-plugins # ls
DEPRECATION kubelet.sock kubelet_internal_checkpoint kubevirt-kvm.sock kubevirt-nvidia.com-GP106_GEFORCE_GTX_1060_3GB.sock kubevirt-nvidia.com-GP106_HIGH_DEFINITION_AUDIO_CONTROLLER.sock kubevirt-tun.sock kubevirt-vhost-net.sock
Notice the kubevirt-nvidia.com-GP106_GEFORCE_GTX_1060_3GB.sock
file, that's the socket that the kubelet uses to expose KubeVirt to the local PCI Devices.
The RPC messages that get sent on the socket are:
- ListAndWatch to see which devices are available
- Allocate to take a device and attach it to a VM
Those two methods do the bulk of the work on the DevicePlugin side. The other way to look at if the deviceplugins are behaving is by checking the node status:
% kubectl get nodes janus -o yaml | yq .status.capacity
cpu: "8"
devices.kubevirt.io/kvm: 1k
devices.kubevirt.io/tun: 1k
devices.kubevirt.io/vhost-net: 1k
ephemeral-storage: 102626232Ki
hugepages-2Mi: "0"
memory: 24575392Ki
nvidia.com/GP106_GEFORCE_GTX_1060_3GB: "1"
nvidia.com/GP106_HIGH_DEFINITION_AUDIO_CONTROLLER: "1"
pods: "110"
Notice the resourceName
on the left and the count on the right. That shows the deviceplugin status. If you had two GTX 1060 cards on that node, then when the second one was enabled, it should look like nvidia.com/GP106_GEFORCE_GTX_1060_3GB: "1"
Finally, the capacity just shows the number of devices, but when KubeVirt calls Allocate
(see above) to attach the device to a VM, the .status.allocatable
needs to be nonzero, here's how to check that:
% kubectl get nodes janus -o yaml | yq .status.allocatable
cpu: "8"
devices.kubevirt.io/kvm: 1k
devices.kubevirt.io/tun: 1k
devices.kubevirt.io/vhost-net: 1k
ephemeral-storage: "99834798412"
hugepages-2Mi: "0"
memory: 24575392Ki
nvidia.com/GP106_GEFORCE_GTX_1060_3GB: "1"
nvidia.com/GP106_HIGH_DEFINITION_AUDIO_CONTROLLER: "1"
pods: "110"