-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: provide mpiacc wrapper script #12228
Comments
We could make a gesture for a particular community, but where do we stop ? I don't share your concerns here, configurations tools (autoconf, cmake) have their ways to identify the compile and link flags needed to pass to any compiler to compile applications. For everything else you can always fallback on |
I understand the pain of users just trying to compile / link their applications, particularly when trying to mix multiple tools -- such as MPI and CUDA. However, I'm not sure that MPI needs to be the integration point for all compilation and linking. For example, if Open MPI includes an And to @bosilca's point, how does Open MPI also keep up with ROCM flags that are needed over time? ... etc. Open MPI's release schedule is orthogonal to all the release schedule of other HPC tools; what happens when incompatible changes are made and Open MPI now has stale / incorrect flags for a peer tool? That seems undesirable, and just creates more user confusion and frustration. Even if you flip the script and make CUDA be the integration point, how would CUDA keep up with the changing set of Open MPI (and MPICH and ...) flags over time? Rather than everyone having to keep up with everyone else's flags, Open MPI's approach has been to provide multiple mechanisms to extract the flags from our wrapper compilers, and also to allow nesting of wrapper compilers. We -- Open MPI -- can't know exactly what the end user will want from their other tools, or what systems they will want to compile/link against. As such, all we can do is provide both standardized and Open MPI-specific ways to extract what is needed to compile/link against Open MPI.
Are these existing mechanisms not sufficient? Note: I'm not asking if they're trivially easy to use -- I'm asking if they're insufficient to allow correct compiling and linking of Open MPI to other systems. I understand the compiling / linking large HPC applications can be challenging. But no matter how it is done, some level of expertise is going to be needed by the end user. Perhaps better documentation and/or examples are needed...? If there's something that can be done in Open MPI's docs, for example, I'm open to suggestions (let's do this in Open MPI v5.0.x docs and beyond -- i.e., https://docs.open-mpi.org/ -- there's no much point in doing this for v4.1.x and earlier). |
FWIW, that can be achieved locally by the end users. From the install directory:
As @jsquyres pointed, some other adjustments might be required. |
Is your feature request related to a problem? Please describe.
The problem is described in this issue: #12225 . Users with applications written in heterogeneous programming languages, where all files of all translation units are, e.g.,
.cu
CUDA files (or hip, or sycl), often run into the issue that they can't compile their application using any of the provided wrappers. They try and struggle to, e.g., compile them using the C++ wrapper as follows:The heterogeneous compilers often call different compilers themselves. For example,
nvcc
expands the source code into a device file compiled with a device-only compiler, and a host c++ file that is then - in the case of a CUDA C++ MPI application - compiled using a host C++ MPI compiler wrapper likempicxx
(and a host c++ compiler like g++ or clang++ otherwise).The feedback I've gotten from users multiple times is that they struggle to do this, they spend time fiddling with compiler wrapper options, environment variables, end up modifying their application (e.g. splitting the code that uses an accelerator from the code that initializes the program to simplify compiler), or have to go grab complex build systems like CMake to compile a single-file "MPI + CUDA C++ hello world", since CMake will query all include / link flags from the wrapper correctly, prefered compiler, and pass those to the heterogeneous compiler.
Describe the solution you'd like
Compiling an application that mixes MPI with an heterogeneous language (like CUDA C++, HIP, etc.) should be as easy as:
Compiling multiple translation units should be as easy as compiling them with
mpiacc
, and linking them together.Describe alternatives you've considered
See above. There are many workarounds, but none of them provide a smooth experience for beginner MPI programmers willing to extend a single GPU application to multiple GPUs.
Additional context
This proposal was discussed in this week's MPICH developer call, and there is an issue tracking it here. pmodels/mpich#6867
It would be best for users if the MPI wrapper for heteregoeneous compilers would have a similar API in both implementations.
The text was updated successfully, but these errors were encountered: