Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hwloc2a/configure.m4: be more careful in with_cuda->enable_cuda #4257

Merged
merged 1 commit into from
Sep 25, 2017

Conversation

jsquyres
Copy link
Member

Be a little more deliberate about convering OMPI's --with-cuda CLI value to hwloc's --enable-cuda configure option.

Also, unconditionally disable hwloc NVML support (because Open MPI is not currently using it).

Signed-off-by: Jeff Squyres [email protected]

This is a followup to the original master PR #4245, and the subsequent comments from @ggouaillardet, @bgoglin, and @sjeaugey on #4251.

Be a little more deliberate about convering OMPI's --with-cuda CLI
value to hwloc's --enable-cuda configure option.

Also, unconditionally disable hwloc NVML support (because Open MPI is
not currently using it).

Signed-off-by: Jeff Squyres <[email protected]>
@jsquyres jsquyres merged commit db10da9 into open-mpi:master Sep 25, 2017
@jsquyres jsquyres deleted the pr/moar-hwloc-cuda-cleanup branch September 25, 2017 15:35
@sjeaugey
Copy link
Member

@jsquyres sorry I failed to reply to your question in #4251 ... To my knowledge, we don't use the CUDA part of hwloc in Open MPI. @bgoglin may be able to explain what --enable-cuda changes in hwloc, I personally don't know.

Now, with that patch, we're no longer adding new dependencies nor creating mismatches between Open MPI and hwloc, so I guess it doesn't really matter.

@jsquyres
Copy link
Member Author

@sjeaugey Got it. Is there a reason we're not using CUDA stuff from hwloc? I.e., is the hwloc CUDA information not useful? Or just has no one implemented anything that uses it? (this is a curiosity question)

@sjeaugey
Copy link
Member

It could be useful, we just never used it because it was hard to do so. The main reason is that we're using the same BTLs for CPU and GPU communication. The Open IB BTLs are chosen during MPI_Init based on their distance with the CPU.

When the CUDA-aware code is initialized (during the first CUDA transfer), we should rediscover the BTLs and select those which are close to the GPU to use them on CUDA transfers. But that would mean a separate BTL selection for CPU and GPU transfers.

UCX may implement that better.

@bgoglin
Copy link
Contributor

bgoglin commented Sep 25, 2017

@sjeaugey @jsquyres Enabling CUDA in hwloc adds hwloc objects that correspond to GPU, hence giving you their locality in the hwloc tree. Mostly useful for two cases:

  1. choosing which GPU based on which cores or NUMA node it should be close to
  2. allocating host data buffers close a specific GPU

(1) is likely done in the application, not in OMPI itself.
(2) is also likely done in the application. OMPI could use it if there was large OMPI-internal buffers to allocate close to a GPU.

Aside of this, I don't see much use cases for OMPI using a CUDA-aware hwloc. Moreover, there are also ways to get GPU affinity from hwloc without enabling CUDA support (using hwloc/cuda.h or hwloc/cudart.h).

@bgoglin
Copy link
Contributor

bgoglin commented Sep 25, 2017

Indeed selecting a NIC/HCA close to the GPU is a good use case extending (2).

@jsquyres
Copy link
Member Author

Ok. Based on these comments, it sounds like I shouldn't whack the logic I just added (and replace it with enable_cuda=no). Right?

@bgoglin
Copy link
Contributor

bgoglin commented Sep 25, 2017

+1

@sjeaugey
Copy link
Member

More on this. We need to set enable_cuda=no in hwloc in all cases.

CUDA support in Open MPI needs CUDA includes to build, but it is in fact using the CUDA driver API and loading symbols dynamically (so as I understand it, the library can work on any system, even those without CUDA).

Enabling CUDA in hwloc makes a hard dependency on CUDA, so this is clearly a regression.

Also, currently, configure seems to fail to include the relevant -L so master is broken. https://mtt.open-mpi.org/index.php?do_redir=2485

jsquyres added a commit to jsquyres/ompi that referenced this pull request Oct 10, 2017
There is no usage of CUDA hwloc objects in the v2.0.x branch, and
linking in CUDA can cause problems (per
open-mpi#4257 (comment)).

Partially cherry-picked from c341b53.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Oct 10, 2017
There is no usage of CUDA hwloc objects in the v2.0.x branch, and
linking in CUDA can cause problems (per
open-mpi#4257 (comment)).

Partially cherry-picked from c341b53.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Oct 10, 2017
There is no usage of CUDA hwloc objects in the v2.0.x branch, and
linking in CUDA can cause problems (per
open-mpi#4257 (comment)).

Partially cherry-picked from c341b53.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Oct 10, 2017
There is no usage of CUDA hwloc objects in the v3.0.x branch, and
linking in CUDA can cause problems (per
open-mpi#4257 (comment)).

Partially cherry-picked from c341b53.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Oct 10, 2017
There is no usage of CUDA hwloc objects in the v2.x branch, and
linking in CUDA can cause problems (per
open-mpi#4257 (comment)).

Partially cherry-picked from c341b53.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Oct 10, 2017
There is no usage of CUDA hwloc objects in the v2.0.x branch, and
linking in CUDA can cause problems (per
open-mpi#4257 (comment)).

Partially cherry-picked from c341b53.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit f28fcbe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants