-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GTL #716
Add GTL #716
Conversation
What would they have to do right now to make it work? Or asked differently: Is this changing things from "inconvenient" to "even more inconvenient" for users on systems without sysadmin support, or from "impossible" to "doable but inconvenient"? If it is the latter, I think it will be an improvement nonetheless, wouldn't it? |
This is quite a cumbersome patch just to deal with Cray MPI, I wish there was a better way. What does mpi4py do? |
Since it's only required at runtime, we could just do it based on environment variables?
How is this logic handled for C programs? |
Right now, they would either have to use:
or add
before the first Re advice for users who can't ask the sysadmin, I would document an example
would cover both Perlmutter and Frontier. |
They do the same thing that Cray always tells us to do: "use the compiler wrappers to build mpi4py". |
Urgh ... If it where up to me alone, then sure! Let's put in an env variable. But I kinda like the idea of having preferences managed by ... well ... Preferences (with a capital "P"). Anyway, GTL is part of using the system binary, so I think keeping this alongside libmpi makes sense.
They are compiled using the Cray compiler wrappers -- I don't know how the compiler wrappers work in detail (no / very limited documentation). I suspect they futz around with the linker to make sure that GTL is linked before MPI. Note: when you build a program with GTL enabled, it can't run on CPU nodes. So the compiler wrappers do insert something.... |
Ah, that's disappointing. I was hoping there was some magic environment variable set on your GPU nodes that we could rely upon. Ah well.
Why do you need two Preferences.toml files for each type of node? |
I need to go to bed, but I can get on board with this if we make it a little less Cray-specific: what if we just called it |
That's what I was hoping for also...
One with the preloads and one without. |
I like this! It would be a bit more effort, but would cover a broader set of use cases. If we can define preloads that depend on an env var ( |
So I am wondering if we should do something:
Now the big question for me is how to deal with rocm vs cuda... and do we need something like a |
Vendor flags might make a lot of sense for a different reason @vchuravy : as it stands right now, if a user loads say I build different Julia modules for each PE, but if a user rolls their own Julia environment then it might rely on a specific PE. Having smarter logic in MPI.jl would fix that.
The rocm version of gtl is called |
How is libgtl related to libmpi? The latter requires something (symbols?) from the former? If so, does libmpi dynamically link to libgtl (i.e. what's the output of |
@giordano my assumption is that they use dlsym to see if the library is preloaded/linked into the binary. Right now I just want to shout at cray and have everyone use OpenMPI. |
Yea, there are no symbols the libmpi needs from What @vchuravy says makes sense. I can't find any documentation on this (other than: if you see this error, recompile with the cray compiler wrappers). Following this up at NERSC, to see if Cray would be willing to change the behavior of Cray MPICH. We should still work on vendor flags in the meantime. |
I don't understand: wouldn't you always want the preloads for the GPU nodes, and no preloads on non-GPU? |
You mean CPU? Mainly for sanity: I don't know what GTL will do on a system without GPUs ... This is also more general: NERSC has a history of systems with different kinds of nodes. @simonbyrne I like your approach of keeping it general. So in general different kinds of nodes might keep libraries in different places, etc. NERSC has been using slurm and the module system to give users a way to deploy their codes on different hardware (e.g. Cori GPU). This also isn't unique to NERSC |
Are you able to join the JuliaHPC meeting on Tuesday? It might be easier to discuss there. |
@vchuravy in 20 years we'll have compatible ABIs https://www.mpich.org/abi/ |
Sadly no. I can do an impromptu call at 9am PT tomorrow (Monday) |
Quick update: I just confirmed that no preloads are necessary when setting Doesn't get us off the hook completely though, as we still need preload GTL for GPU-aware MPI. The nice thing is that we don't strictly need to not preload GTL, as I am going to work on vendor flags regardless, as they might still be useful for autmaticaly adding vendor preloads (e.g. picking the "right" GTL for AMD vs Nvidia) |
What if we were to load the GTL if |
Possible alternative to #716
What is |
Right, so that's the spirit behind #717 -- that would solve part of the problem, but makes us vulnerable to env var names changing. Also it doesn't help deciding between |
@vchuravy Here you go: |
So now the question is what is exported on Frontier/Crusher. My worry is that the symbols overlap and it would only be legal to preload one of them |
Ah! looks like cleaning up formatting seems to have solved the docs-build problem |
Can someone familiar with CI comment on what I should do with the failing tests. Right now I don't understand how and if my changes triggered these regressions |
@simonbyrne any chance you can merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To try to keep backward compatibility, we should only update the _format
key if it uses the new features. Otherwise, we can keep it at "1.0"
.
Only bump format for where the new version is needed Co-authored-by: Simon Byrne <[email protected]>
only require v1.1 if vendor is input Co-authored-by: Simon Byrne <[email protected]>
Co-authored-by: Simon Byrne <[email protected]>
Co-authored-by: Simon Byrne <[email protected]>
Co-authored-by: Simon Byrne <[email protected]>
Co-authored-by: Valentin Churavy <[email protected]>
LGTM! |
Ok, so I've cleaned things up a bit. I moved all the preload logic to @simonbyrne @vchuravy feel free to merge. |
LGTM Just need to add the docstring to the docs: |
@simonbyrne Docstring added |
Can you bump the patch version of MPIPreferences? |
This looks good -- @simonbyrne do you also want to bump the MPI.jl patch version? |
Yay!!! GPU-Aware MPI breaks more ABIs. Here's an example what happens without loading GTL before a
libmpi
that needs it:(sorta makes sense I guess 🤨 ... vendors don't want to have to compile two different libmpis ... just insert a libgtl whenever GPUs are around ... yes, that's wayyyy better 😛 )
In the case of some system MPI libraries, GPU-aware MPI is implemented as another library -- bearing the fancy name of GPU Transport Layer (GTL). For example, on Perlmutter it's called
libmpi_gtl_cuda.so
. Often it's important that this library is loaded before libmpi. These changes do:MPIPreferences
has an option:gtl_names
, which -- if notnothing
-- is a list of possible names for the GTL libraryMPI
will dlopenlibgtl
beforelibmpi
(if notnothing
).I have tested this on Perlmutter. Will test on Crusher next. Also I don't know if I accidentally broke MPITrampoline, which I will do asap.
This PR represents a tradeoff. Clearly there is no standard way that GTL is defined. So I avoided creating a default search strategy. One could be tempted to look for Cray systems and then "just load GTL". This would cause problems on our CPU nodes, which have the GTL libraries installed (we want to have a single SW image for all nodes), but don't support it (what GPUs? this is a CPU node!).
This PR allows us (the helpful sysadmins) to provide two different
Preferences.toml
files for each type of node. It does come at the cost of forcing users to potentially having to manage differentLocalPreferences.toml
files if they have MPI in theLocalPreferences
.