Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow > 9999 MPI processes #1055

Merged
merged 2 commits into from
Jan 28, 2020
Merged

Allow > 9999 MPI processes #1055

merged 2 commits into from
Jan 28, 2020

Conversation

davegill
Copy link
Contributor

@davegill davegill commented Jan 14, 2020

TYPE: enhancement

KEYWORDS: MPI, rsl

SOURCE: internal

DESCRIPTION OF CHANGES:

Problem:

  1. Previously, when the model requested more than 9999 MPI processes, the attempt
    to generate the file names for the out and error files from processes 10000 and up,
    caused the model to ungracefully stop.

  2. Previously, when the model requested more that 10000 MPI processes, a maximum
    number of MPI slots was exceeded.

  3. If a user did run with a large number of MPI processes (< 9999, but still a large number),
    the default behavior was to create an out and error file for each process. While a user may
    override this default behavior of file generation, it was not was not easy to figure out.

Solution:

  1. The file name is now a function of the total number of MPI processes that are
    requested. When requesting 1 through 9999 MPI processes (inclusive), the files are
    named as before: rsl.out.0000 through rsl.out.9999. When the total number of MPI
    processes > 9999 (for example, mpirun -np 10001 wrf.exe), then the file names
    become rsl.out.00000000 through rsl.out.00010000.

  2. A hard-coded maximum value of 10000 MPI processes in a header file was removed,
    and replaced with a more realistic value of 100001 (easier to search for than 100000).

  3. For users with very large MPI process counts, the WRF model has an existing ability
    to remove all standard out and error files except for the master process. That
    compile-time option is now easily accessible in the configure.defaults file. This mod
    is not a change in function, but an easy way to get access to this single-file option.

  4. This PR replaces "More than 10000 MPI tasks with WRF" More than 10000 MPI tasks with WRF #704.

LIST OF MODIFIED FILES:
modified: ../../arch/configure.defaults
modified: ../../external/RSL_LITE/c_code.c
modified: ../../external/RSL_LITE/module_dm.F
modified: ../../external/RSL_LITE/rsl_lite.h

TESTS CONDUCTED:

  1. To test the logic of a large number of MPI processes modifying the rsl file names,
    a temporary change was introduced in the code. Instead of the cutoff being 10000
    MPI processes, the value chosen was 3. For this temporary test, any MPI job with
    1 or 2 processes should then produce file names such as rsl.out.0000, and any MPI
    job with 3 or more MPI processes should produce a file name such as rsl.out.00000000.

Here is a small piece of the logic (the committed code of course uses 10000 for the
ORIG_RSL_CUTOFF instead of 3). This single line is required for the temporary test.

#define ORIG_RSL_CUTOFF 3

if ( *size < ORIG_RSL_CUTOFF )
{
  sprintf(filename,"rsl.out.%04d",*me) ;
}
else
{
  sprintf(filename,"rsl.out.%08d",*me) ;
}

Here are tests with the maximum MPI process counts set to 1, 2, and 3, respectively.

> rm rsl* ; mpirun -np 1 real.exe ; ls -ls rsl*
 starting wrf task            0  of            1
16 -rw-r--r--  1 gill  1500   6903 Jan 14 15:58 rsl.error.0000
40 -rw-r--r--  1 gill  1500  19854 Jan 14 15:58 rsl.out.0000
> rm rsl* ; mpirun -np 2 real.exe ; ls -ls rsl*
 starting wrf task            0  of            2
 starting wrf task            1  of            2
16 -rw-r--r--  1 gill  1500   6903 Jan 14 15:58 rsl.error.0000
16 -rw-r--r--  1 gill  1500   6699 Jan 14 15:58 rsl.error.0001
40 -rw-r--r--  1 gill  1500  19854 Jan 14 15:58 rsl.out.0000
40 -rw-r--r--  1 gill  1500  19650 Jan 14 15:58 rsl.out.0001
> rm rsl* ; mpirun -np 3 real.exe ; ls -ls rsl*
 starting wrf task            1  of            3
 starting wrf task            2  of            3
 starting wrf task            0  of            3
16 -rw-r--r--  1 gill  1500   6903 Jan 14 15:58 rsl.error.00000000
16 -rw-r--r--  1 gill  1500   6343 Jan 14 15:58 rsl.error.00000001
16 -rw-r--r--  1 gill  1500   6699 Jan 14 15:58 rsl.error.00000002
40 -rw-r--r--  1 gill  1500  19854 Jan 14 15:58 rsl.out.00000000
40 -rw-r--r--  1 gill  1500  19294 Jan 14 15:58 rsl.out.00000001
40 -rw-r--r--  1 gill  1500  19650 Jan 14 15:58 rsl.out.00000002

RELEASE NOTE: The WRF model is now able to use > 9999 MPI processes. Users will notice that the file names are modified for jobs requesting > 9999 MPI processes. For example, the command mpirun -np 10001 wrf.exe will produce file names rsl.out.00000000 through rsl.out.00010000. For convenience, to only have output go to the master rsl files, a run-time commented-out flag is provided in the configure.wrf file, on the CFLAGS_LOCAL variable.

TYPE: enhancement

KEYWORDS: MPI, rsl

SOURCE: internal

DESCRIPTION OF CHANGES:

Problem:
Previously, when the model requested more than 9999 MPI processes, the attempt
to generate the fielnames for the out and error files from processes 10000 and up,
caused the model to ungracefully stop.

Previously, when the model requested more that 10000 MPI processes, a maximum
number of MPI slots was exceeded.

Solution:
1. The file name is now a function of the total number of MPI processes that are
requested. When requeted 1 through 9999 MPI processes (inclusive), the files are
named as before: rsl.out.0000 through rsl.out.9999. When the total number of MPI
processes > 9999 (for example, mpirun -np 10000 wrf.exe), then the file names
become rsl.out.00000000 through rsl.out.00010000.

2. A hard-coded maximum value of 10000 MPI processes in a header file was removed,
and replaced with a more realistic value of 100000.

3. For users with very large MPI process counts, the WRF model has an existing ability
to remove all standard out and error files except for the master process. That
compile-time option is now easily accessible in the configure.defaults file. This mod
is not a change in function, but an easy way to get access to this single-file option.

LIST OF MODIFIED FILES:
modified:   ../../arch/configure.defaults
modified:   ../../external/RSL_LITE/c_code.c
modified:   ../../external/RSL_LITE/module_dm.F
modified:   ../../external/RSL_LITE/rsl_lite.h

TESTS CONDUCTED:
1. To test the logic of a large number of MPI processes modifying the rsl file names,
a temporary change was introduced in the code. Instead of the cutoff being 10000
MPI processes, the value chosen was 3. Any MPI job with 1 or 2 processes should
then produce file names such as rsl.out.0000, and any MPI job with 3 or more MPI
processes should produce a file name such as rsl.out.00000000.

Here is a small piece of the logic.

```

if ( *size < ORIG_RSL_CUTOFF )
{
  sprintf(filename,"rsl.out.%04d",*me) ;
}
else
{
  sprintf(filename,"rsl.out.%08d",*me) ;
}
```

Here are tests with the maximum MPI process count at 1, 2, and 3.
```
> rm rsl* ; mpirun -np 1 real.exe ; ls -ls rsl*
 starting wrf task            0  of            1
16 -rw-r--r--  1 gill  1500   6903 Jan 14 15:58 rsl.error.0000
40 -rw-r--r--  1 gill  1500  19854 Jan 14 15:58 rsl.out.0000
```

```
> rm rsl* ; mpirun -np 2 real.exe ; ls -ls rsl*
 starting wrf task            0  of            2
 starting wrf task            1  of            2
16 -rw-r--r--  1 gill  1500   6903 Jan 14 15:58 rsl.error.0000
16 -rw-r--r--  1 gill  1500   6699 Jan 14 15:58 rsl.error.0001
40 -rw-r--r--  1 gill  1500  19854 Jan 14 15:58 rsl.out.0000
40 -rw-r--r--  1 gill  1500  19650 Jan 14 15:58 rsl.out.0001
```

```
> rm rsl* ; mpirun -np 3 real.exe ; ls -ls rsl*
 starting wrf task            1  of            3
 starting wrf task            2  of            3
 starting wrf task            0  of            3
16 -rw-r--r--  1 gill  1500   6903 Jan 14 15:58 rsl.error.00000000
16 -rw-r--r--  1 gill  1500   6343 Jan 14 15:58 rsl.error.00000001
16 -rw-r--r--  1 gill  1500   6699 Jan 14 15:58 rsl.error.00000002
40 -rw-r--r--  1 gill  1500  19854 Jan 14 15:58 rsl.out.00000000
40 -rw-r--r--  1 gill  1500  19294 Jan 14 15:58 rsl.out.00000001
40 -rw-r--r--  1 gill  1500  19650 Jan 14 15:58 rsl.out.00000002
```

RELEASE NOTE: The WRF model is now able to use > 9999 MPI processes. Users will notice that the file names are modified jobs request > 9999 MPI processes. For example, the command "mpirun -np 10000 wrf.exe" will produce file names rsl.out.00000000 through rsl.out.00010000.
@davegill
Copy link
Contributor Author

davegill commented Jan 15, 2020

@davegill
Problems reported by regression, all fixed with release-v4.1.4 PRs
output_2:2 = STATUS test_002m hwrf nmm_real 34 1NE - OK, fixed with HWRF PR
output_2:2 = STATUS test_002m hwrf nmm_real 34 2NE - OK, fixed with HWRF PR
output_9:9 = STATUS test_009o fire em_fire 33 01 - OK, fixed with fire pop PR

@kkeene44
Copy link
Collaborator

@davegill
Is there an advantage to creating an error/out file per processor, even when not using an extremely large number of processors? i.e., is there a reason why it shouldn't be default to always only have a master error/out file, instead of several?

@davegill
Copy link
Contributor Author

@kkeene44 @mgduda

Is there an advantage to creating an error/out file per processor, even when not using an extremely large number of processors? i.e., is there a reason why it shouldn't be default to always only have a master error/out file, instead of several?

The only trouble is that our stderr / stdout set up is not as slick as MPAS. If everything runs OK for MPAS, you get a single print out file. If there are troubles, you only get the error messages from the problematic MPI processes.

For WRF, the option is either 1) all processes report, or 2) just the master reports. For operational codes that you know are always going to work, you can certainly select the build option of only getting a single file. For most development or research when something goes wrong, you would be likely wanting to know if and where that CFL occurred (and that takes getting output from every file).

For timing, having a couple of hundred rsl files is tedious but not really an impact on the system. However, having 50k rsl files would clog up the machine and absolutely impact performance.

This build option has always been in WRF, with the commented-out change to configure.defaults, I am just making it easier for a user to activate.

@davegill
Copy link
Contributor Author

@kkeene44
Kelly,
Let me know if I have answered everything for you

@kkeene44
Copy link
Collaborator

@davegill
Yes, sorry. That answers my question. I just wanted to make sure there wasn't an easy way to do something that would potentially make more sense in the long run, but it sounds like it's not possible in WRF. Thanks!

@davegill davegill merged commit 115c946 into wrf-model:develop Jan 28, 2020
@cponder
Copy link

cponder commented Apr 20, 2021

Will this go into the next WRF release?

@davegill
Copy link
Contributor Author

@cponder

Will this go into the next WRF release?

Carl,
Yes, this is already in the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants