Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyNVML 12 #215

Merged
merged 2 commits into from
Dec 21, 2024
Merged

Use PyNVML 12 #215

merged 2 commits into from
Dec 21, 2024

Conversation

jakirkham
Copy link
Member

Bump pynvml from 11 to 12. This version of pynvml also now depends on nvidia-ml-py for core functionality.

@jakirkham jakirkham requested a review from rjzamora December 19, 2024 04:20
@jakirkham
Copy link
Member Author

/ok to test

@jakirkham
Copy link
Member Author

Seeing the following test failures on CI:

______ ERROR collecting jupyterlab_nvdashboard/tests/test_cpu_handlers.py ______
jupyterlab_nvdashboard/apps/gpu.py:15: in <module>
    nvlink_ver = pynvml.nvmlDeviceGetNvLinkVersion(gpu_handles[0], 0)
/opt/conda/envs/test/lib/python3.13/site-packages/pynvml.py:3846: in nvmlDeviceGetNvLinkVersion
    _nvmlCheckReturn(ret)
/opt/conda/envs/test/lib/python3.13/site-packages/pynvml.py:979: in _nvmlCheckReturn
    raise NVMLError(ret)
E   pynvml.NVMLError_NotSupported: Not Supported

During handling of the above exception, another exception occurred:
jupyterlab_nvdashboard/tests/test_cpu_handlers.py:5: in <module>
    from jupyterlab_nvdashboard.apps.cpu import CPUResourceWebSocketHandler
jupyterlab_nvdashboard/__init__.py:13: in <module>
    from .handlers import setup_handlers
jupyterlab_nvdashboard/handlers.py:2: in <module>
    from . import apps
jupyterlab_nvdashboard/apps/__init__.py:2: in <module>
    from . import gpu
jupyterlab_nvdashboard/apps/gpu.py:32: in <module>
    except (IndexError, pynvml.nvml.NVMLError_NotSupported):
E   AttributeError: module 'pynvml' has no attribute 'nvml'
______ ERROR collecting jupyterlab_nvdashboard/tests/test_gpu_handlers.py ______
jupyterlab_nvdashboard/apps/gpu.py:15: in <module>
    nvlink_ver = pynvml.nvmlDeviceGetNvLinkVersion(gpu_handles[0], 0)
/opt/conda/envs/test/lib/python3.13/site-packages/pynvml.py:3846: in nvmlDeviceGetNvLinkVersion
    _nvmlCheckReturn(ret)
/opt/conda/envs/test/lib/python3.13/site-packages/pynvml.py:979: in _nvmlCheckReturn
    raise NVMLError(ret)
E   pynvml.NVMLError_NotSupported: Not Supported

During handling of the above exception, another exception occurred:
jupyterlab_nvdashboard/tests/test_gpu_handlers.py:5: in <module>
    from jupyterlab_nvdashboard.apps.gpu import (
jupyterlab_nvdashboard/__init__.py:13: in <module>
    from .handlers import setup_handlers
jupyterlab_nvdashboard/handlers.py:2: in <module>
    from . import apps
jupyterlab_nvdashboard/apps/__init__.py:2: in <module>
    from . import gpu
jupyterlab_nvdashboard/apps/gpu.py:32: in <module>
    except (IndexError, pynvml.nvml.NVMLError_NotSupported):
E   AttributeError: module 'pynvml' has no attribute 'nvml'

@rjzamora
Copy link
Member

rjzamora commented Dec 19, 2024

E AttributeError: module 'pynvml' has no attribute 'nvml'

There is no nvml module in nvidia-ml-py. The check should probably be pynvml.NVMLError_NotSupported (this should work before pynvml<12 as well).

It looks like we need to update 3 lines in jupyterlab_nvdashboard/apps/gpu.py

The `pynvml.nvml` namespace wasn't needed in PyNVML 11 and does not
exist in PyNVML 12. So drop the `nvml` and use `pynvml` directly.
@jakirkham
Copy link
Member Author

Thanks Rick! 🙏

How do the changes above look?

@jakirkham jakirkham marked this pull request as ready for review December 20, 2024 07:34
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, based on my read of https://github.com/gpuopenanalytics/pynvml/blob/38da53a9bb031277690d6135dd3be02deb9296ac/pynvml_utils/smi.py and @rjzamora 's comments.

Would be good if @rjzamora can approve too (or at least, comment on this before the next release of jupyterlab-nvdashboard).

@jakirkham
Copy link
Member Author

Asked Rick offline, but didn't hear back. Am guessing he is already on holiday

Given the issue above was caused by a test failure, which is resolved by the change, and Rick gave such a good description of the change required there wasn't another way to interpret it, think we have addressed his review

Also want to make sure these PRs are not lingering after the holidays. So am going to go ahead and merge

If Rick has more feedback, am happy to follow up with him

@jakirkham jakirkham merged commit b5fdd67 into rapidsai:branch-0.13 Dec 21, 2024
11 checks passed
@jakirkham jakirkham deleted the use_pynvml_12 branch December 21, 2024 00:09
@jakirkham
Copy link
Member Author

Fixed issue: #150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants