Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cray MPI GPU compatibility #717

Closed
wants to merge 2 commits into from
Closed

Cray MPI GPU compatibility #717

wants to merge 2 commits into from

Conversation

simonbyrne
Copy link
Member

Possible alternative to #716

Possible alternative to #716
@simonbyrne simonbyrne requested a review from JBlaschke February 27, 2023 19:15
@@ -103,6 +103,10 @@ function __init__()
end

@static if Sys.isunix()
# Cray MPICH apparently requires you to manually load a separate GPU Transport Layer (GTL) if you want GPU-aware MPI.
if MPIPreferences.binary == "system" && MPIPreferences.abi == "MPICH" && get(ENV, "MPICH_GPU_SUPPORT_ENABLED", "0") == "1"
Libdl.dlopen_e(joinpath(dirname(libmpi), "libmpi_gtl_cuda"), Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the library is in the same directory as libmpi: is this the case @JBlaschke?

Also, is there any way to determine whether we should load libmpi_gtl_cuda or libmpi_gtl_hsa?(dlopen_e should fail silently, so we could just try opening both?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rely on the linker finding it automatically just by giving the basename?

Copy link

@haampie haampie Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtl libs are from a separate rpm, they aren't necessarily in the same directory. You can have multiple libmpi.sos (for every compiler, for different version numbers, different fabrics) and just one or a few libmpi_gtl_{cuda,hsa}.so:

/opt/cray/pe/mpich/8.1.18/ofi/gnu/9.1/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/intel/19.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/cray/10.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/nvidia/20.7/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/aocc/3.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/intel/19.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/cray/10.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/gnu/9.1/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/aocc/3.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/nvidia/20.7/lib/libmpi.so
/opt/cray/pe/mpich/8.1.21/ofi/gnu/9.1/lib/libmpi.so

...

/opt/cray/pe/mpich/8.1.18/gtl/lib/libmpi_gtl_cuda.so
/opt/cray/pe/mpich/8.1.21/gtl/lib/libmpi_gtl_cuda.so
/opt/cray/pe/mpich/8.1.23/gtl/lib/libmpi_gtl_cuda.so

however, there's also a symlink typically for both libmpi.so (o which one you ask?! a random one, probably) and the gtl lib in /opt/cray/pe/lib64, which is in the ld cache (so no modules required).

Opening by name is probably better than opening by path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonbyrne see my comments below. To address the questions here:

  1. No they are not necessarily in the same place
  2. CC --cray-print-opts=libs will show which libraries get loaded. This doesn't actually contain any information about when they are needed, but a safe bet would be before MPI_Init

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are weak symbols I think they must be loaded before... Otherwise the dlopen on libmpi will try to resolve the weak symbol elsewhere

Copy link
Member Author

@simonbyrne simonbyrne Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giordano pointed out that we actually dlopen libmpi in MPIPreferences.__init__, so we may need to do it there:

Libdl.dlopen(libmpi, Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing: https://github.com/JuliaParallel/MPI.jl/pull/716/files#diff-496cd019dfada6f9862870c26d09d6ea9f6a24aa6d0055fb03e916b8959b82a8R13 -- a list of preloads might be better, because @luraess just told me that other centers might also need cudart preloaded. 😔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on Perlmutter we must dlopen before __init__.

But a Libdl.dlopen_e("libmpi_gtl_cuda", Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL) seems to do the trick.

@@ -103,6 +103,10 @@ function __init__()
end

@static if Sys.isunix()
# Cray MPICH apparently requires you to manually load a separate GPU Transport Layer (GTL) if you want GPU-aware MPI.
if MPIPreferences.binary == "system" && MPIPreferences.abi == "MPICH" && get(ENV, "MPICH_GPU_SUPPORT_ENABLED", "0") == "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer guarding this behind a "vendor == "cray""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we don't currently expose that in MPIPreferences (we could though?)

@giordano
Copy link
Member

I mentioned on discord that I'm not super excited by doing this vendor-specific business. Would it be possible to simply have a list of unnamed (we don't need to ccall into them) libraries that we need to load before libmpi? This might be useful with other vendors as well (maybe also some mpi profilers? But probably those need to be loaded before entire Julia process).

@simonbyrne
Copy link
Member Author

I mentioned on discord that I'm not super excited by doing this vendor-specific business. Would it be possible to simply have a list of unnamed (we don't need to ccall into them) libraries that we need to load before libmpi? This might be useful with other vendors as well (maybe also some mpi profilers? But probably those need to be loaded before entire Julia process).

That was my original thought too (make a preloads preference), but from an ease-of-use point of view, ideally we would be able to just "make it work" without any user intervention. We do some similar things for UCX, so it's not entirely unprecedented.

@JBlaschke
Copy link
Contributor

@simonbyrne I like this approach in spirit. I want to share some concerns -- and if they are overrated, we can go ahead with this:

  1. "Just working" is something that is difficult to do on HPC. One example is: what if environment variables are named differently on different systems (that can happen with the same vendor even). I think we can do this here also -- for example: @vchuravy 's idea of including search logic based on CC --cray-print-opts=libs to identify "the right" libraries to preload.
  2. HPC systems are frequently cutting edge -- We need to have a way for the user to overwrite any automatic logic.
  3. MPIPreferences is a much better venue for this than MPI's __init__ function IMO.

@JBlaschke JBlaschke mentioned this pull request Feb 27, 2023
@JBlaschke
Copy link
Contributor

My preference would be to change gtl_names to a list of preloads in #716 , and to have MPIPreferences populate this by the list of libraries form CC --cray-print-opts=libs. This way someone could run something like MPIPreferences.use_ssytem_binary(;vendor="cray") which automatically finds the right libraries on a Cray machine. And if that breaks, they can still to the route of manually specifying preloads.

@simonbyrne
Copy link
Member Author

My reluctance at using MPIPreferences for this is that it's much more difficult to maintain: to change it, we have 3 different things (MPI.jl package, MPIPreferences.jl package, preferences stored in the .toml) that need to be kept in sync and kept backwards compatible. It's also not clear that any changes we make to MPIPreferences would be compatible with any arbitrary "feature" future HPC vendors ship.

@JBlaschke
Copy link
Contributor

I understand your reluctance @simonbyrne . The way I see it, you have two options:

  1. MPIPreferences which can do fancy things like interrogate CC to automatically detect which libraries to load
  2. An env variable which lists the names of libraries to preload.

Adding automation to 2 is a bad idea (imagine 100k ranks calling CC --cray-print-opt. I've seen things like this fail spectacularly). Sadly we also can't/shouldn't rely on these libraries being in standard locations (again, image 100k ranks traipsing over the file system looking for patterns in library names -- nvm that Perlmutter contains the rocm version of GTL in addition to the cuda one)

@simonbyrne simonbyrne closed this Jul 13, 2023
@simonbyrne simonbyrne deleted the sb/gtl branch July 13, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants