-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Support #1293
Comments
I think we can start working on this now. I know it has been in the backlog for a long time. I will move it to the current active project. Questions I need to research:
|
GPUs can be attached to a VM using I have a nvidia GPU, and couldn't do dynamic unbinding/binding while the machine is running. So instead I gave vfio control over the gpu (and other neighboring devices) through kernel params as described here. I imagine it won't be necessary on the node since the gpu shouldn't be bound to any driver but I didn't get to this yet. The part about "neighboring devices" is that the gpu belongs to an "IOMMU group" and the VM should control all devices belonging to this group, in my case it was an audio and a usb device. It's possible to bypass this but it's with risks (didn't read them yet). The gpu appears successfully in the VM but a driver should be installed then to allow using it. AFAIK, the kernel we use doesn't allow dynamic module using. So it must be enabled to do so (or the driver should be pre-installed(?), but it looks like a complicated solution). This all was tried on my machine, not a node. I think its kernel must be updated to include vfio support. TLDR:
Next:
Notes:
|
We get a lot of questions on GPU support from AI/ML users and farmers. So the demand is definitely there.
We had the GPU support on the roadmap a while ago but I do not know where we are at right now. Can we open a discussion about it?
The text was updated successfully, but these errors were encountered: