Cray MPI GPU compatibility #717

simonbyrne · 2023-02-27T19:12:20Z

Possible alternative to #716

simonbyrne · 2023-02-27T19:19:25Z

src/MPI.jl

@@ -103,6 +103,10 @@ function __init__()
    end

    @static if Sys.isunix()
+        # Cray MPICH apparently requires you to manually load a separate GPU Transport Layer (GTL) if you want GPU-aware MPI.
+        if MPIPreferences.binary == "system" && MPIPreferences.abi == "MPICH" && get(ENV, "MPICH_GPU_SUPPORT_ENABLED", "0") == "1"
+            Libdl.dlopen_e(joinpath(dirname(libmpi), "libmpi_gtl_cuda"), Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL)


This assumes that the library is in the same directory as libmpi: is this the case @JBlaschke?

Also, is there any way to determine whether we should load libmpi_gtl_cuda or libmpi_gtl_hsa?(dlopen_e should fail silently, so we could just try opening both?)

Or rely on the linker finding it automatically just by giving the basename?

gtl libs are from a separate rpm, they aren't necessarily in the same directory. You can have multiple libmpi.sos (for every compiler, for different version numbers, different fabrics) and just one or a few libmpi_gtl_{cuda,hsa}.so:

/opt/cray/pe/mpich/8.1.18/ofi/gnu/9.1/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ofi/intel/19.0/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ofi/cray/10.0/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ofi/nvidia/20.7/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ofi/aocc/3.0/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ucx/intel/19.0/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ucx/cray/10.0/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ucx/gnu/9.1/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ucx/aocc/3.0/lib/libmpi.so /opt/cray/pe/mpich/8.1.18/ucx/nvidia/20.7/lib/libmpi.so /opt/cray/pe/mpich/8.1.21/ofi/gnu/9.1/lib/libmpi.so ... /opt/cray/pe/mpich/8.1.18/gtl/lib/libmpi_gtl_cuda.so /opt/cray/pe/mpich/8.1.21/gtl/lib/libmpi_gtl_cuda.so /opt/cray/pe/mpich/8.1.23/gtl/lib/libmpi_gtl_cuda.so

however, there's also a symlink typically for both libmpi.so (o which one you ask?! a random one, probably) and the gtl lib in /opt/cray/pe/lib64, which is in the ld cache (so no modules required).

Opening by name is probably better than opening by path

@simonbyrne see my comments below. To address the questions here:

No they are not necessarily in the same place

CC --cray-print-opts=libs will show which libraries get loaded. This doesn't actually contain any information about when they are needed, but a safe bet would be before MPI_Init

If they are weak symbols I think they must be loaded before... Otherwise the dlopen on libmpi will try to resolve the weak symbol elsewhere

@giordano pointed out that we actually dlopen libmpi in MPIPreferences.__init__, so we may need to do it there:

MPI.jl/lib/MPIPreferences/src/system.jl

Line 12 in e09446f

Libdl.dlopen(libmpi, Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL)

I ended up doing: https://github.com/JuliaParallel/MPI.jl/pull/716/files#diff-496cd019dfada6f9862870c26d09d6ea9f6a24aa6d0055fb03e916b8959b82a8R13 -- a list of preloads might be better, because @luraess just told me that other centers might also need cudart preloaded. 😔

So on Perlmutter we must dlopen before __init__.

But a Libdl.dlopen_e("libmpi_gtl_cuda", Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL) seems to do the trick.

vchuravy · 2023-02-27T19:32:28Z

src/MPI.jl

@@ -103,6 +103,10 @@ function __init__()
    end

    @static if Sys.isunix()
+        # Cray MPICH apparently requires you to manually load a separate GPU Transport Layer (GTL) if you want GPU-aware MPI.
+        if MPIPreferences.binary == "system" && MPIPreferences.abi == "MPICH" && get(ENV, "MPICH_GPU_SUPPORT_ENABLED", "0") == "1"


I would prefer guarding this behind a "vendor == "cray""

yeah, we don't currently expose that in MPIPreferences (we could though?)

giordano · 2023-02-27T19:40:14Z

I mentioned on discord that I'm not super excited by doing this vendor-specific business. Would it be possible to simply have a list of unnamed (we don't need to ccall into them) libraries that we need to load before libmpi? This might be useful with other vendors as well (maybe also some mpi profilers? But probably those need to be loaded before entire Julia process).

simonbyrne · 2023-02-27T19:46:36Z

I mentioned on discord that I'm not super excited by doing this vendor-specific business. Would it be possible to simply have a list of unnamed (we don't need to ccall into them) libraries that we need to load before libmpi? This might be useful with other vendors as well (maybe also some mpi profilers? But probably those need to be loaded before entire Julia process).

That was my original thought too (make a preloads preference), but from an ease-of-use point of view, ideally we would be able to just "make it work" without any user intervention. We do some similar things for UCX, so it's not entirely unprecedented.

JBlaschke · 2023-02-27T21:46:09Z

@simonbyrne I like this approach in spirit. I want to share some concerns -- and if they are overrated, we can go ahead with this:

"Just working" is something that is difficult to do on HPC. One example is: what if environment variables are named differently on different systems (that can happen with the same vendor even). I think we can do this here also -- for example: @vchuravy 's idea of including search logic based on CC --cray-print-opts=libs to identify "the right" libraries to preload.
HPC systems are frequently cutting edge -- We need to have a way for the user to overwrite any automatic logic.
MPIPreferences is a much better venue for this than MPI's __init__ function IMO.

JBlaschke · 2023-02-27T22:04:24Z

My preference would be to change gtl_names to a list of preloads in #716 , and to have MPIPreferences populate this by the list of libraries form CC --cray-print-opts=libs. This way someone could run something like MPIPreferences.use_ssytem_binary(;vendor="cray") which automatically finds the right libraries on a Cray machine. And if that breaks, they can still to the route of manually specifying preloads.

simonbyrne · 2023-02-27T23:11:01Z

My reluctance at using MPIPreferences for this is that it's much more difficult to maintain: to change it, we have 3 different things (MPI.jl package, MPIPreferences.jl package, preferences stored in the .toml) that need to be kept in sync and kept backwards compatible. It's also not clear that any changes we make to MPIPreferences would be compatible with any arbitrary "feature" future HPC vendors ship.

JBlaschke · 2023-02-28T01:36:48Z

I understand your reluctance @simonbyrne . The way I see it, you have two options:

MPIPreferences which can do fancy things like interrogate CC to automatically detect which libraries to load
An env variable which lists the names of libraries to preload.

Adding automation to 2 is a bad idea (imagine 100k ranks calling CC --cray-print-opt. I've seen things like this fail spectacularly). Sadly we also can't/shouldn't rely on these libraries being in standard locations (again, image 100k ranks traipsing over the file system looking for patterns in library names -- nvm that Perlmutter contains the rocm version of GTL in addition to the cuda one)

Cray MPI GPU compatibility

f8d0843

Possible alternative to #716

simonbyrne requested a review from JBlaschke February 27, 2023 19:15

Update MPI.jl

0900b62

simonbyrne commented Feb 27, 2023

View reviewed changes

vchuravy reviewed Feb 27, 2023

View reviewed changes

JBlaschke mentioned this pull request Feb 27, 2023

Add GTL #716

Merged

simonbyrne closed this Jul 13, 2023

simonbyrne deleted the sb/gtl branch July 13, 2023 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cray MPI GPU compatibility #717

Cray MPI GPU compatibility #717

simonbyrne commented Feb 27, 2023

simonbyrne Feb 27, 2023

giordano Feb 27, 2023

haampie Feb 27, 2023 •

edited

Loading

JBlaschke Feb 27, 2023

vchuravy Feb 28, 2023

simonbyrne Feb 28, 2023 •

edited

Loading

JBlaschke Feb 28, 2023

vchuravy Mar 26, 2023

vchuravy Feb 27, 2023

simonbyrne Feb 27, 2023

giordano commented Feb 27, 2023

simonbyrne commented Feb 27, 2023

JBlaschke commented Feb 27, 2023

JBlaschke commented Feb 27, 2023

simonbyrne commented Feb 27, 2023

JBlaschke commented Feb 28, 2023

Cray MPI GPU compatibility #717

Cray MPI GPU compatibility #717

Conversation

simonbyrne commented Feb 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haampie Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonbyrne Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giordano commented Feb 27, 2023

simonbyrne commented Feb 27, 2023

JBlaschke commented Feb 27, 2023

JBlaschke commented Feb 27, 2023

simonbyrne commented Feb 27, 2023

JBlaschke commented Feb 28, 2023

haampie Feb 27, 2023 •

edited

Loading

simonbyrne Feb 28, 2023 •

edited

Loading