Best Practices for Building/Running MPI Under Flux #3098
Replies: 2 comments
-
The details for running Spectrum MPI on the CORAL systems are already documented in our RTD: https://flux-framework.readthedocs.io/en/latest/coral.html#launching-spectrum-mpi-within-flux The documentation does not include any details on alternate MPIs that could potentially run on the CORAL systems, and how to build/run them. |
Beta Was this translation helpful? Give feedback.
-
My experiences with running various MPIs on NERSC's Cori: Intel MPILots of errors unless you use the shared memory "fabric", but the multi-node does work:
MPICHWorks without warnings or errors, but the only system-provided MPICH is a debug build:
OpenMPIWorks without warnings or errors, but requires environment variables to be set:
Applying the following patch to the master branch allows openmpi to boostrap under Flux without manually setting the environment variables. diff --git a/src/shell/lua.d/openmpi.lua b/src/shell/lua.d/openmpi.lua
index ff833e411..38a1d4385 100644
--- a/src/shell/lua.d/openmpi.lua
+++ b/src/shell/lua.d/openmpi.lua
@@ -11,3 +11,5 @@
local f = require 'flux'.new ()
local rundir = f:getattr ('broker.rundir')
shell.setenv ("OMPI_MCA_orte_tmpdir_base", rundir)
+shell.setenv ("OMPI_MCA_pmix", "flux")
+shell.setenv ("OMPI_MCA_schizo", "flux") |
Beta Was this translation helpful? Give feedback.
-
With most vendor-specific MPIs being hard to support and experiencing frequent breaks under Flux, we should document the best practices for both building MPIs that are maximally compatible with Flux and for running different MPI variants under Flux.
Beta Was this translation helpful? Give feedback.
All reactions