Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/post 2d decomp #1077

Conversation

JesseMeng-NOAA
Copy link
Contributor

@JesseMeng-NOAA JesseMeng-NOAA commented Mar 3, 2022

The community upp is currently decomposed on latitudes (y-direction) only. As the ufs-weather-model is making 2D decomposition in both latitudes and longitudes (Y and X), one of the upp refactoring tasks is to support ufs-weather-model inline post 2D decomposition functionality.

A ufs-weather-model feature post_2d_decomp is created for this test with inline post turned on in the control_2dwrtdecomp and regional_control_2dwrtdecomp tests,
https://github.com/JesseMeng-NOAA/ufs-weather-model/tree/feature/post_2d_decomp

A fv3atm feature post_2d_decomp is created with modified io drivers to call upp/post_2d_decomp,
https://github.com/JesseMeng-NOAA/fv3atm/tree/feature/post_2d_decomp

The upp/post_2d_decomp branch can be found at
https://github.com/WenMeng-NOAA/UPP/tree/post_2d_decomp

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Description

To support ufs-weather-model inline post 2D decomposition functionality.

Issue(s) addressed

#1078

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • opnReqTest for newly added/changed feature
  • CI

Dependencies

The upp/post_2d_decomp branch,
https://github.com/WenMeng-NOAA/UPP/tree/post_2d_decomp

@junwang-noaa
Copy link
Collaborator

@JesseMeng-NOAA @WenMeng-NOAA Please let me know when the issues with inline vs standalone post for global and regional test cases are resolved, we can start running RT.

@WenMeng-NOAA
Copy link
Contributor

@junwang-noaa From my tests, I saw the grib2 results from RT control & control_2dwrtdecomp, and RT regional_control & regional_control_2dwrtdecomp are not consistent. You might run your own RT testing.

@JesseMeng-NOAA
Copy link
Contributor Author

@junwang-noaa From my tests, I saw the grib2 results from RT control & control_2dwrtdecomp, and RT regional_control & regional_control_2dwrtdecomp are not consistent. You might run your own RT testing.

Trying to figure this out. This did not happen in my earlier tests. I am tracking all the recent changes.

@JesseMeng-NOAA
Copy link
Contributor Author

@junwang-noaa From my tests, I saw the grib2 results from RT control & control_2dwrtdecomp, and RT regional_control & regional_control_2dwrtdecomp are not consistent. You might run your own RT testing.

@junwang-noaa @WenMeng-NOAA

https://github.com/JesseMeng-NOAA/ufs-weather-model/tree/feature/post_2d_decomp
Submodules of fv3 and upp are updated. Regression test for global control_2dwrtdecomp was done. Netcdf files and grib2 files are checked. The only thing "looks" not ok is GFSPRS.GrbF24 in which 4 records are all undefined in both baseline and control_2dwrtdecomp. This branch is updated with ufs-weather-model/develop. Ready for review and tests. I will be working on regional. However I believe at the moment we can confirm global first.

baseline dir = /gpfs/dell2/stmp/Jesse.Meng/FV3_RT/REGRESSION_TEST/control
working dir = /gpfs/dell2/ptmp/Jesse.Meng/FV3_RT/rt_29954/control_2dwrtdecomp
Checking test 001 control_2dwrtdecomp results ....
Comparing sfcf000.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf024.nc .........OK
Comparing GFSFLX.GrbF00 .........OK
Comparing GFSFLX.GrbF24 .........OK
Comparing GFSPRS.GrbF00 .........OK
Comparing GFSPRS.GrbF24 .........NOT OK

GFSPRS.GrbF24
897:52457348:MCDC:middle cloud layer:rpn_corr=-nan:rpn_rms=undefined
899:52533898:HCDC:high cloud layer:rpn_corr=-nan:rpn_rms=undefined
901:52622649:TCDC:entire atmosphere (considered as a single layer):rpn_corr=-nan:rpn_rms=undefined
903:52716460:HGT:cloud ceiling:rpn_corr=-nan:rpn_rms=undefined

@junwang-noaa
Copy link
Collaborator

@JesseMeng-NOAA From the message you showed, the RT failed. If GFSPRS.GrbF24 has issue, please fix it. If the field has undefined value, then both 1D and 2D need to have same results.

@JesseMeng-NOAA
Copy link
Contributor Author

@JesseMeng-NOAA From the message you showed, the RT failed. If GFSPRS.GrbF24 has issue, please fix it. If the field has undefined value, then both 1D and 2D need to have same results.

The two files have been checked with wgrib and confirmed the baseline were reproduced.

@junwang-noaa
Copy link
Collaborator

@JesseMeng-NOAA From the message you showed, the RT failed. If GFSPRS.GrbF24 has issue, please fix it. If the field has undefined value, then both 1D and 2D need to have same results.

The two files have been checked with wgrib and confirmed the baseline were reproduced.

The field cwatclm is different.

@JesseMeng-NOAA
Copy link
Contributor Author

@JesseMeng-NOAA From the message you showed, the RT failed. If GFSPRS.GrbF24 has issue, please fix it. If the field has undefined value, then both 1D and 2D need to have same results.

The two files have been checked with wgrib and confirmed the baseline were reproduced.

The field cwatclm is different.

Fixed and regression test passed.
baseline dir = /gpfs/dell2/stmp/Jesse.Meng/FV3_RT/REGRESSION_TEST/control
working dir = /gpfs/dell2/ptmp/Jesse.Meng/FV3_RT/rt_21862/control_2dwrtdecomp
Checking test 001 control_2dwrtdecomp results ....
Comparing sfcf000.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf024.nc .........OK
Comparing GFSFLX.GrbF00 .........OK
Comparing GFSFLX.GrbF24 .........OK
Comparing GFSPRS.GrbF00 .........OK
Comparing GFSPRS.GrbF24 .........OK

Branch updated
https://github.com/JesseMeng-NOAA/ufs-weather-model/tree/feature/post_2d_decomp

@junwang-noaa
Copy link
Collaborator

@JesseMeng-NOAA Thanks for resolving the issue with control_2dwrtdecomp. How about the regional_control_2dwrtdecomp test, the regional control test with 2D post decomposition? Currently the inline post is turned off in this test. If turned on, it should generate same POST results as regional_control test. Thanks

@JesseMeng-NOAA
Copy link
Contributor Author

@JesseMeng-NOAA Thanks for resolving the issue with control_2dwrtdecomp. How about the regional_control_2dwrtdecomp test, the regional control test with 2D post decomposition? Currently the inline post is turned off in this test. If turned on, it should generate same POST results as regional_control test. Thanks

Just to confirm the work for global fv3 inline post is completed.
Regional is ongoing.

@JesseMeng-NOAA
Copy link
Contributor Author

JesseMeng-NOAA commented Apr 1, 2022

@JesseMeng-NOAA Thanks for resolving the issue with control_2dwrtdecomp. How about the regional_control_2dwrtdecomp test, the regional control test with 2D post decomposition? Currently the inline post is turned off in this test. If turned on, it should generate same POST results as regional_control test. Thanks

@junwang-noaa @WenMeng-NOAA
Resolved. Both global and regional regression tests passed.
Branch updated
https://github.com/JesseMeng-NOAA/ufs-weather-model/tree/feature/post_2d_decomp

Fixed a bug in ufs-weather-model/FV3/io/post_regional.F90
To run regression tests, the new base (control and regional_control) can be generated either using my branch,
or using the ufs-weather-model develop branch and manually copy
/u/Jesse.Meng/noscrub/post_refactor_2022q2/ufs-weather-model.dev/FV3/io/post_regional.F90
Either way produces identical results.
All nc files and grib files are identical between the new base and 2d decomp runs.

baseline dir = /gpfs/dell2/stmp/Jesse.Meng/FV3_RT/REGRESSION_TEST/control
working dir = /gpfs/dell2/ptmp/Jesse.Meng/FV3_RT/rt_24823/control_2dwrtdecomp
Checking test 001 control_2dwrtdecomp results ....
Comparing sfcf000.nc .........OK
Comparing sfcf024.nc .........OK
Comparing atmf000.nc .........OK
Comparing atmf024.nc .........OK
Comparing GFSFLX.GrbF00 .........OK
Comparing GFSFLX.GrbF24 .........OK
Comparing GFSPRS.GrbF00 .........OK
Comparing GFSPRS.GrbF24 .........OK

[0] The total amount of wall time = 150.152431
[0] The maximum resident set size (KB) = 461736

Test 001 control_2dwrtdecomp PASS

baseline dir = /gpfs/dell2/stmp/Jesse.Meng/FV3_RT/REGRESSION_TEST/fv3_regional_control
working dir = /gpfs/dell2/ptmp/Jesse.Meng/FV3_RT/rt_24823/regional_control_2dwrtdecomp
Checking test 002 regional_control_2dwrtdecomp results ....
Comparing dynf000.nc .........OK
Comparing dynf024.nc .........OK
Comparing phyf000.nc .........OK
Comparing phyf024.nc .........OK
Comparing PRSLEV.GrbF00 .........OK
Comparing PRSLEV.GrbF24 .........OK
Comparing NATLEV.GrbF00 .........OK
Comparing NATLEV.GrbF24 .........OK

[0] The total amount of wall time = 366.543488
[0] The maximum resident set size (KB) = 574436

Test 002 regional_control_2dwrtdecomp PASS

@JesseMeng-NOAA
Copy link
Contributor Author

Sorry. I was trying to delete an under-drafting comment but did not know somehow hitting that "Delete" ended up close the PR.

@JesseMeng-NOAA
Copy link
Contributor Author

This PR is closed. There have been many updates of upp and other submodules since this PR was opened but unmerged.
A new PR
#1211
is opened to support the same features of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants