You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The sha advance for the 3rd-party/prrte done in PR #12101 pulled in a bug that broke support for PRTE_MCA_ras_base_launch_orted_on_hn when set to 1.
This parameter is important when
using a system managed with SLURM or Cray PALS (native launcher)
the vendor supported method for acquiring RDMA credentials to use "high speed" networks relies on mechanisms within the SLURM/PALS launch procedure to provide the application processes with this info (the case for HPE SS11).
The bug is supposed to be fixed in PRRTe master but not yet in release branches used by Open MPI.
This problem is in both main and 5.0.x at the moment.
I'm marking this as critical because for sites using HPE SS11 and not supporting PMIx in SLURM or PALS, there's no alternative to using prrte based launch so currently there's a failure to launch (at least easily) on these platforms. I believe is the case for ORNL systems.
The text was updated successfully, but these errors were encountered:
Are you saying that advancing the submodule pointers does not fix the problem? Or are you just filing this as a reminder to update the pointers before release (which is planned anyway)?
Are you saying that advancing the submodule pointers does not fix the problem? Or are you just filing this as a reminder to update the pointers before release (which is planned anyway)?
the later so we (Open MPI) don't forget to advance the shas to pull in the fix once you've committed to prrte release branches.
The sha advance for the 3rd-party/prrte done in PR #12101 pulled in a bug that broke support for PRTE_MCA_ras_base_launch_orted_on_hn when set to 1.
This parameter is important when
The bug is supposed to be fixed in PRRTe master but not yet in release branches used by Open MPI.
This problem is in both main and 5.0.x at the moment.
I'm marking this as critical because for sites using HPE SS11 and not supporting PMIx in SLURM or PALS, there's no alternative to using prrte based launch so currently there's a failure to launch (at least easily) on these platforms. I believe is the case for ORNL systems.
The text was updated successfully, but these errors were encountered: