-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More than 9999 MPI tasks with WRF #697
Comments
@thomasedds How would you like to generate the pull request for this, and walk the whole procedure through to completion?
On our local machine 280 nodes at 36 MPI ranks/node yields 10080 cores. This gives a default WRF decomposition of cores into 105x96. A domain that has the y-dimension >= 1100 and the x-dimension >= 1000 would work. For the test of less than 10k cores, 275 nodes at 36 MPI ranks/node yields 9900 cores. This gives a default WRF decomposition of cores into 100x99.
Take a look at our pull requests to get a "feel" for what we would like to see. There is a template to follow that github will suggest, and the same template is the WRF/tools/commit_form.txt file. Take as your starting point (the base of your pull request) the top of the develop branch. The pull request would be from your fork / your branch to the wrf-model fork / develop branch. |
If I just try running WRF 3.8.1 with large process counts, I see the output-files being named
So for <10,000 it pads out to 4 digits, and it just uses as many digits as it needs after that, which I think satisfies requirement #1. |
@cponder We added PR #1055, based on these suggestions. To see the files that were changed:
There were a number of magical "10000" values floating around (and the associated number of allowable digits" that needed to be changed. Some compilers permitted exceeding these bounds, but that was not robust. |
We added PR #1055, based on these suggestions. |
Currently, WRF only permits < 10000 MPI tasks. As the grid size increases, more than 10000 cores could be necessary. Therefore, the value of RSL_MAXPROC in external/RSL_LITE/rsl_lite.h needs to be increased to e.g. 10000.
Also the sprintf statements in external/RSL_LITE/c_code.c needs to changed permitting "06d" integers.
The text was updated successfully, but these errors were encountered: