-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cray MPI GPU compatibility #717
Conversation
Possible alternative to #716
@@ -103,6 +103,10 @@ function __init__() | |||
end | |||
|
|||
@static if Sys.isunix() | |||
# Cray MPICH apparently requires you to manually load a separate GPU Transport Layer (GTL) if you want GPU-aware MPI. | |||
if MPIPreferences.binary == "system" && MPIPreferences.abi == "MPICH" && get(ENV, "MPICH_GPU_SUPPORT_ENABLED", "0") == "1" | |||
Libdl.dlopen_e(joinpath(dirname(libmpi), "libmpi_gtl_cuda"), Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the library is in the same directory as libmpi
: is this the case @JBlaschke?
Also, is there any way to determine whether we should load libmpi_gtl_cuda
or libmpi_gtl_hsa
?(dlopen_e
should fail silently, so we could just try opening both?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rely on the linker finding it automatically just by giving the basename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gtl libs are from a separate rpm, they aren't necessarily in the same directory. You can have multiple libmpi.so
s (for every compiler, for different version numbers, different fabrics) and just one or a few libmpi_gtl_{cuda,hsa}.so:
/opt/cray/pe/mpich/8.1.18/ofi/gnu/9.1/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/intel/19.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/cray/10.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/nvidia/20.7/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ofi/aocc/3.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/intel/19.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/cray/10.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/gnu/9.1/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/aocc/3.0/lib/libmpi.so
/opt/cray/pe/mpich/8.1.18/ucx/nvidia/20.7/lib/libmpi.so
/opt/cray/pe/mpich/8.1.21/ofi/gnu/9.1/lib/libmpi.so
...
/opt/cray/pe/mpich/8.1.18/gtl/lib/libmpi_gtl_cuda.so
/opt/cray/pe/mpich/8.1.21/gtl/lib/libmpi_gtl_cuda.so
/opt/cray/pe/mpich/8.1.23/gtl/lib/libmpi_gtl_cuda.so
however, there's also a symlink typically for both libmpi.so (o which one you ask?! a random one, probably) and the gtl lib in /opt/cray/pe/lib64
, which is in the ld cache (so no modules required).
Opening by name is probably better than opening by path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonbyrne see my comments below. To address the questions here:
- No they are not necessarily in the same place
CC --cray-print-opts=libs
will show which libraries get loaded. This doesn't actually contain any information about when they are needed, but a safe bet would be beforeMPI_Init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are weak symbols I think they must be loaded before... Otherwise the dlopen on libmpi will try to resolve the weak symbol elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giordano pointed out that we actually dlopen libmpi in MPIPreferences.__init__
, so we may need to do it there:
MPI.jl/lib/MPIPreferences/src/system.jl
Line 12 in e09446f
Libdl.dlopen(libmpi, Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up doing: https://github.com/JuliaParallel/MPI.jl/pull/716/files#diff-496cd019dfada6f9862870c26d09d6ea9f6a24aa6d0055fb03e916b8959b82a8R13 -- a list of preloads might be better, because @luraess just told me that other centers might also need cudart preloaded. 😔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So on Perlmutter we must dlopen
before __init__
.
But a Libdl.dlopen_e("libmpi_gtl_cuda", Libdl.RTLD_LAZY | Libdl.RTLD_GLOBAL)
seems to do the trick.
@@ -103,6 +103,10 @@ function __init__() | |||
end | |||
|
|||
@static if Sys.isunix() | |||
# Cray MPICH apparently requires you to manually load a separate GPU Transport Layer (GTL) if you want GPU-aware MPI. | |||
if MPIPreferences.binary == "system" && MPIPreferences.abi == "MPICH" && get(ENV, "MPICH_GPU_SUPPORT_ENABLED", "0") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer guarding this behind a "vendor == "cray""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we don't currently expose that in MPIPreferences (we could though?)
I mentioned on discord that I'm not super excited by doing this vendor-specific business. Would it be possible to simply have a list of unnamed (we don't need to ccall into them) libraries that we need to load before libmpi? This might be useful with other vendors as well (maybe also some mpi profilers? But probably those need to be loaded before entire Julia process). |
That was my original thought too (make a |
@simonbyrne I like this approach in spirit. I want to share some concerns -- and if they are overrated, we can go ahead with this:
|
My preference would be to change |
My reluctance at using MPIPreferences for this is that it's much more difficult to maintain: to change it, we have 3 different things (MPI.jl package, MPIPreferences.jl package, preferences stored in the .toml) that need to be kept in sync and kept backwards compatible. It's also not clear that any changes we make to MPIPreferences would be compatible with any arbitrary "feature" future HPC vendors ship. |
I understand your reluctance @simonbyrne . The way I see it, you have two options:
Adding automation to 2 is a bad idea (imagine 100k ranks calling |
Possible alternative to #716