-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release/P7b: add fix for omp reproducibility issue for P7b #689
Release/P7b: add fix for omp reproducibility issue for P7b #689
Conversation
@DeniseWorthen @jiandewang would you please run a short P7b test to confirm this fixes the reproducibility issue? Thanks |
I tested 24hr for 20131001. I compiled and ran and then re-compiled and ran a second time. The coupler history file at the end of the 24hrs is identical for the two runs. |
Moorthi,
I assume this is the issue you mentioned. Thanks for fixing it
…On Tue, Jul 13, 2021 at 5:23 PM Denise Worthen ***@***.***> wrote:
I tested 24hr for 20131001. I compiled and ran and then re-compiled and
ran a second time. The coupler history file at the end of the 24hrs is
identical for the two runs.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKY5N2OYLZGLMHZ42E2VUH3TXSVFTANCNFSM5AJ523LQ>
.
--
*Fanglin Yang, Ph.D.*
*Chief, Model Physics Group*
*Modeling and Data Assimilation Branch*
*NOAA/NWS/NCEP Environmental Modeling Center*
*https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/
<https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/>*
|
Denise, thanks for doing the testing. |
my two runs just finished, they generated identical results |
Thanks Jiande for testing. Now the P7b branch is updated.
…On Tue, Jul 13, 2021 at 6:39 PM jiandewang ***@***.***> wrote:
I tested 24hr for 20131001. I compiled and ran and then re-compiled and
ran a second time. The coupler history file at the end of the 24hrs is
identical for the two runs.
my two runs just finished, they generated identical results
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TMRIGUQDMUWQR3VRPTTXS6CVANCNFSM5AJ523LQ>
.
|
Yes.
Moorthi
On Tue, Jul 13, 2021 at 5:29 PM Fanglin Yang ***@***.***>
wrote:
… Moorthi,
I assume this is the issue you mentioned. Thanks for fixing it
On Tue, Jul 13, 2021 at 5:23 PM Denise Worthen ***@***.***>
wrote:
> I tested 24hr for 20131001. I compiled and ran and then re-compiled and
> ran a second time. The coupler history file at the end of the 24hrs is
> identical for the two runs.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <
#689 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AKY5N2OYLZGLMHZ42E2VUH3TXSVFTANCNFSM5AJ523LQ
>
> .
>
--
*Fanglin Yang, Ph.D.*
*Chief, Model Physics Group*
*Modeling and Data Assimilation Branch*
*NOAA/NWS/NCEP Environmental Modeling Center*
*https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/
<https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/>*
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#689 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALLVRYVDSKNXT2UD4GKQI4DTXSVZ7ANCNFSM5AJ523LQ>
.
--
Dr. Shrinivas Moorthi
Research Meteorologist
Modeling and Data Assimilation Branch
Environmental Modeling Center / National Centers for Environmental
Prediction
5830 University Research Court - (W/NP23), College Park MD 20740 USA
Tel: (301)683-3718
e-mail: ***@***.***
Phone: (301) 683-3718 Fax: (301) 683-3718
|
## DESCRIPTION OF CHANGES: Cleaning up bugs in the machine files. The first bug prompted this PR, and the rest were found subsequently. The bugs (and their fixes) are as follows: 1) A space is missing after the `print_info_msg` and `print_err_msg_exit` function calls in the `file_location` functions. Inserting a space gets passed this bug, but subsequent issues were found as described below. **For machine files that call the `print_info_msg` function in `file_location` (`cheyenne.sh`, `hera.sh`, `jet.sh`, and `orion.sh`):** Fixing this bug leads to other failures because when the "*" stanza is encountered in the `file_location` function, the `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` variable gets set to the message that `file_location` returns. Since that message contains spaces, it leads to other failures in downstream scripts (the ex-scripts). Simply removing the printing out of the message (thus causing `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` to be set to a null string) fixes the failures, so this was the fix implemented. If desired, a message for an empty value for `EXTRN_MDL_SYSBASEDIR_ICS|LBCS` can be placed in another script (where those variables are used). **For machine files that use `print_err_msg_exit` in `file_location` (`stampede.sh` and `wcoss_dell_p3.sh`):** These should not exit if the file location is not available since the experiment can still complete successfully. So just removing the `print_err_msg_exit` call should work (and make the behavior of these machine files consistent with the set above). 2) In all the machine files, the variable `FV3GFS_FILE_FMT_ICS` should be changed to `FV3GFS_FILE_FMT_LBCS` in the definition of `EXTRN_MDL_SYSBASEDIR_LBCS`. This was fixed in all the files. 3) In `stampede.sh`, a variable named `SYSBASEDIR_ICS` is defined. This is a typo. Modify to `EXTRN_MDL_SYSBASEDIR_ICS`. ## TESTS CONDUCTED: Ran the WE2E test `grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_GSD_SAR` on: * Hera -- successful * Jet -- successful except for UPP tasks * Cheyenne -- successful except for UPP tasks The UPP task failures are new and being experienced by other PRs as well (e.g. #689). The original issue with machine files seems resolved. ## CONTRIBUTORS (optional): @JeffBeck-NOAA encountered and reported the original error.
* Tweaks for running with containers on azure * added config.sh for GST on azure * added AWS to load_modules_run_task.sh * working on bare metal now * Changing to azure, aws, and singularity * updates for singularity * tweaks for running using singularity exec * tweaks for running using singularity exec * Converting to a single noaacloud type * slight changes to config.sh for aws * update machine file * added missing slash to namelist * changes for intel * more cleanup * cleaned up commented lines
PR Checklist
Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.
This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR
An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
are specified below.
If new or updated input data is required by this PR, it is clearly stated in the text of the PR.
Instructions: All subsequent sections of text should be filled in as appropriate.
The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.
Description
The PR is going to fix the run to run reproducibility issue in release/P7b branch.
Issue(s) addressed
Testing
How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)
Dependencies
If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).
Do PRs in upstream repositories need to be merged first?