hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda #4257

jsquyres · 2017-09-24T12:40:12Z

Be a little more deliberate about convering OMPI's --with-cuda CLI value to hwloc's --enable-cuda configure option.

Also, unconditionally disable hwloc NVML support (because Open MPI is not currently using it).

Signed-off-by: Jeff Squyres [email protected]

This is a followup to the original master PR #4245, and the subsequent comments from @ggouaillardet, @bgoglin, and @sjeaugey on #4251.

Be a little more deliberate about convering OMPI's --with-cuda CLI value to hwloc's --enable-cuda configure option. Also, unconditionally disable hwloc NVML support (because Open MPI is not currently using it). Signed-off-by: Jeff Squyres <[email protected]>

sjeaugey · 2017-09-25T16:01:18Z

@jsquyres sorry I failed to reply to your question in #4251 ... To my knowledge, we don't use the CUDA part of hwloc in Open MPI. @bgoglin may be able to explain what --enable-cuda changes in hwloc, I personally don't know.

Now, with that patch, we're no longer adding new dependencies nor creating mismatches between Open MPI and hwloc, so I guess it doesn't really matter.

jsquyres · 2017-09-25T16:07:14Z

@sjeaugey Got it. Is there a reason we're not using CUDA stuff from hwloc? I.e., is the hwloc CUDA information not useful? Or just has no one implemented anything that uses it? (this is a curiosity question)

sjeaugey · 2017-09-25T17:08:35Z

It could be useful, we just never used it because it was hard to do so. The main reason is that we're using the same BTLs for CPU and GPU communication. The Open IB BTLs are chosen during MPI_Init based on their distance with the CPU.

When the CUDA-aware code is initialized (during the first CUDA transfer), we should rediscover the BTLs and select those which are close to the GPU to use them on CUDA transfers. But that would mean a separate BTL selection for CPU and GPU transfers.

UCX may implement that better.

bgoglin · 2017-09-25T17:14:11Z

@sjeaugey @jsquyres Enabling CUDA in hwloc adds hwloc objects that correspond to GPU, hence giving you their locality in the hwloc tree. Mostly useful for two cases:

choosing which GPU based on which cores or NUMA node it should be close to
allocating host data buffers close a specific GPU

(1) is likely done in the application, not in OMPI itself.
(2) is also likely done in the application. OMPI could use it if there was large OMPI-internal buffers to allocate close to a GPU.

Aside of this, I don't see much use cases for OMPI using a CUDA-aware hwloc. Moreover, there are also ways to get GPU affinity from hwloc without enabling CUDA support (using hwloc/cuda.h or hwloc/cudart.h).

bgoglin · 2017-09-25T17:38:21Z

Indeed selecting a NIC/HCA close to the GPU is a good use case extending (2).

jsquyres · 2017-09-25T17:42:16Z

Ok. Based on these comments, it sounds like I shouldn't whack the logic I just added (and replace it with enable_cuda=no). Right?

bgoglin · 2017-09-25T18:16:08Z

+1

sjeaugey · 2017-09-28T17:03:06Z

More on this. We need to set enable_cuda=no in hwloc in all cases.

CUDA support in Open MPI needs CUDA includes to build, but it is in fact using the CUDA driver API and loading symbols dynamically (so as I understand it, the library can work on any system, even those without CUDA).

Enabling CUDA in hwloc makes a hard dependency on CUDA, so this is clearly a regression.

Also, currently, configure seems to fail to include the relevant -L so master is broken. https://mtt.open-mpi.org/index.php?do_redir=2485

There is no usage of CUDA hwloc objects in the v2.0.x branch, and linking in CUDA can cause problems (per open-mpi#4257 (comment)). Partially cherry-picked from c341b53. Signed-off-by: Jeff Squyres <[email protected]>

There is no usage of CUDA hwloc objects in the v3.0.x branch, and linking in CUDA can cause problems (per open-mpi#4257 (comment)). Partially cherry-picked from c341b53. Signed-off-by: Jeff Squyres <[email protected]>

There is no usage of CUDA hwloc objects in the v2.x branch, and linking in CUDA can cause problems (per open-mpi#4257 (comment)). Partially cherry-picked from c341b53. Signed-off-by: Jeff Squyres <[email protected]>

There is no usage of CUDA hwloc objects in the v2.0.x branch, and linking in CUDA can cause problems (per open-mpi#4257 (comment)). Partially cherry-picked from c341b53. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit f28fcbe)

jsquyres added bug enhancement Target: v2.0.x Target: v2.x Target: v3.0.x labels Sep 24, 2017

jsquyres requested review from ggouaillardet and sjeaugey September 24, 2017 12:40

This was referenced Sep 24, 2017

v3.0.x: disable hwloc CUDA support when relevant #4251

Merged

Do not build CUDA support in hwloc when --without-cuda is specified #4248

Closed

ggouaillardet approved these changes Sep 25, 2017

View reviewed changes

jsquyres merged commit db10da9 into open-mpi:master Sep 25, 2017

jsquyres deleted the pr/moar-hwloc-cuda-cleanup branch September 25, 2017 15:35

jsquyres mentioned this pull request Oct 10, 2017

v3.0.x: hwloc1112: always disable building CUDA support. #4320

Merged

This was referenced Oct 10, 2017

v3.1.x: hwloc1112: always disable building CUDA support. #4321

Merged

Move help text output regarding PSM2_CUDA environment variable #4323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda #4257

hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda #4257

jsquyres commented Sep 24, 2017

sjeaugey commented Sep 25, 2017

jsquyres commented Sep 25, 2017

sjeaugey commented Sep 25, 2017

bgoglin commented Sep 25, 2017

bgoglin commented Sep 25, 2017

jsquyres commented Sep 25, 2017

bgoglin commented Sep 25, 2017

sjeaugey commented Sep 28, 2017

hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda #4257

hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda #4257

Conversation

jsquyres commented Sep 24, 2017

sjeaugey commented Sep 25, 2017

jsquyres commented Sep 25, 2017

sjeaugey commented Sep 25, 2017

bgoglin commented Sep 25, 2017

bgoglin commented Sep 25, 2017

jsquyres commented Sep 25, 2017

bgoglin commented Sep 25, 2017

sjeaugey commented Sep 28, 2017