-
Notifications
You must be signed in to change notification settings - Fork 76
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #708 from oahull0112/gh-pages
Add description of stall library to Kestrel performance recs
- Loading branch information
Showing
1 changed file
with
15 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,4 +28,18 @@ export MPICH_COLL_OPT_OFF=mpi_allreduce | |
These environment variables turn off some collective optimizations that we have seen can cause slowdowns. For more information on these environment variables, visit HPE's documentation site [here](https://cpe.ext.hpe.com/docs/mpt/mpich/intro_mpi_ucx.html). | ||
|
||
4. For hybrid MPI/OpenMP codes, requesting more threads per task than you tend to request on Eagle. This may yield performance improvements. | ||
|
||
|
||
### MPI Stall Library | ||
For calculations requesting more than ~10 nodes, you can use the cray mpich stall library. This library can help reduce slowdowns in your calculation runtime caused by congestion in MPI communication, a possible performance bottleneck on Kestrel for calculations using ~10 nodes or more. To use the library, you must first make sure your code has been compiled within one of the `PrgEnv-gnu`, `PrgEnv-cray`, or `PrgEnv-intel` programming environments. Then, add the following lines to your sbatch submit script: | ||
``` | ||
stall_path=/nopt/nrel/apps/cray-mpich-stall | ||
export LD_LIBRARY_PATH=$stall_path/libs_mpich_nrel_{PRGENV-NAME}:$LD_LIBRARY_PATH | ||
export MPICH_OFI_CQ_STALL=1 | ||
``` | ||
Where {PRGENV-NAME} is replaced with one of `cray`, `intel`, or `gnu`. For example, if you compiled your code within the default `PrgEnv-gnu` environment, then you would export the following lines: | ||
``` | ||
stall_path=/nopt/nrel/apps/cray-mpich-stall | ||
export LD_LIBRARY_PATH=$stall_path/libs_mpich_nrel_gnu:$LD_LIBRARY_PATH | ||
export MPICH_OFI_CQ_STALL=1 | ||
``` | ||
The default "stall" of the MPI tasks is 12 microseconds, which we recommend trying before manually adjusting the stall time. You can adjust the stall to be longer or shorter with `export MPICH_OFI_CQ_STALL_USECS=[time in microseconds]` e.g. for 6 microseconds, `export MPICH_OFI_CQ_STALL_USECS=6`. A stall time of 0 would be the same as "regular" MPI. As stall time increases, the amount of congestion decreases, up to a calculation-dependent "optimal" stall time. If you need assistance in using this stall library, please email [email protected]. |