Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Build failure with CUDA 11.4 and NVML #20467

Open
matteosal opened this issue Jul 24, 2021 · 3 comments
Open

Build failure with CUDA 11.4 and NVML #20467

matteosal opened this issue Jul 24, 2021 · 3 comments

Comments

@matteosal
Copy link
Contributor

Building with CUDA 11.4 on Linux produces these errors:

/home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/storage_profiler.cc: In member function ‘void mxnet::profiler::GpuDeviceStorageProfiler::DumpProfile() const’:
/home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/storage_profiler.cc:113:78: error: cannot convert ‘nvmlProcessInfo_st*’ to ‘nvmlProcessInfo_v1_t*’ {aka ‘nvmlProcessInfo_v1_st*’}
  113 |     nvmlDeviceGetComputeRunningProcesses(nvml_device, &info_count, infos.data());
      |                                                                    ~~~~~~~~~~^~
      |                                                                              |
      |                                                                              nvmlProcessInfo_st*
In file included from /home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/storage_profiler.cc:22:
/usr/local/cuda/include/nvml.h:7973:127: note:   initializing argument 3 of ‘nvmlReturn_t nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t, unsigned int*, nvmlProcessInfo_v1_t*)’
 7973 | nvmlReturn_t DECLDIR nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t device, unsigned int *infoCount, nvmlProcessInfo_v1_t *infos);
      |                                                                                                         ~~~~~~~~~~~~~~~~~~~~~~^~~~~
In file included from /home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/storage_profiler.cc:31:
/home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/storage_profiler.cc:115:88: error: cannot convert ‘nvmlProcessInfo_st*’ to ‘nvmlProcessInfo_v1_t*’ {aka ‘nvmlProcessInfo_v1_st*’}
  115 |     NVML_CALL(nvmlDeviceGetComputeRunningProcesses(nvml_device, &info_count, infos.data()));
      |                                                                              ~~~~~~~~~~^~
      |                                                                                        |
      |                                                                                        nvmlProcessInfo_st*
/home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/../common/cuda/utils.h:190:28: note: in definition of macro ‘NVML_CALL’
  190 |     nvmlReturn_t result = (func);                       \
      |                            ^~~~
In file included from /home/matteo/Git/mxnet-build/Build/Linux-x86-64/CUDA/mxnet/src/profiler/storage_profiler.cc:22:
/usr/local/cuda/include/nvml.h:7973:127: note:   initializing argument 3 of ‘nvmlReturn_t nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t, unsigned int*, nvmlProcessInfo_v1_t*)’
 7973 | nvmlReturn_t DECLDIR nvmlDeviceGetComputeRunningProcesses(nvmlDevice_t device, unsigned int *infoCount, nvmlProcessInfo_v1_t *infos);

Is there place one can go check for a list of supported/recommended CUDA versions?

@leezu
Copy link
Contributor

leezu commented Jul 27, 2021

Do you need nvml? Otherwise you may workaround this issue by setting USE_NVML=0.

@matteosal
Copy link
Contributor Author

Yes, USE_NVML=0 fixes the problem. Thanks!

@leezu leezu reopened this Jul 28, 2021
@leezu leezu changed the title Build failure with CUDA 11.4 Build failure with CUDA 11.4 and NVML Jul 28, 2021
@leezu
Copy link
Contributor

leezu commented Jul 28, 2021

FYI @TristonC @ptrendx

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants