Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test/fix cycled mode in feature/coupled_sprint #427

Closed
KateFriedman-NOAA opened this issue Aug 23, 2021 · 15 comments
Closed

Test/fix cycled mode in feature/coupled_sprint #427

KateFriedman-NOAA opened this issue Aug 23, 2021 · 15 comments
Assignees

Comments

@KateFriedman-NOAA
Copy link
Member

Began testing feature/coupled_sprint in cycled mode on supported platforms as part of pre-commit testing for PR #418. Will commit needed updates/fixes to feature/coupled_sprint.

Testing C192C96L127 on WCOSS-Dell, Hera, Orion.

@KateFriedman-NOAA KateFriedman-NOAA self-assigned this Aug 23, 2021
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Aug 23, 2021
- duplicate case block from config.fcst for setting OCNRES based on model resolution
- change CASE to CASE_ENKF to use EnKF resolution for efcs jobs

Refs: NOAA-EMC#427
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Aug 26, 2021
- change deterministic RESTART copy into loop over restart_interval list
  and correct script error at runtime that failed to create RESTART folder
- add missing IAU RESTART file copy to SEND block to support cycling

Refs: NOAA-EMC#427
@KateFriedman-NOAA
Copy link
Member Author

@CoryMartin-NOAA I am testing cycled mode on Mars (C192C96L127) using the new feature/coupled_sprint branch, which is the develop branch plus changes to support the coupled system (coming in piece-by-piece from feature/coupled_crow). We haven't touched much outside of the forecast jobs yet and this is my first try at cycled mode after we updated the forecast jobs. My eobs job is not running normally, it doesn't fail but it also doesn't produce several needed output files (cnvstat.ensmean, oznstat.ensmean, radstat.ensmean) so the eupd job fails:

3.926 + mpirun -n 1 cfp /gpfs/dell3/stmp/Kate.Friedman/RUNDIRS/blah/2020090200/gdas/blah_gdaseupd_00.70417192/mp_untar.sh
tar: /gpfs/dell3/ptmp/Kate.Friedman/comrot/blah/enkfgdas.20200902/00/atmos/gdas.t00z.cnvstat.ensmean: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
tar: /gpfs/dell3/ptmp/Kate.Friedman/comrot/blah/enkfgdas.20200902/00/atmos/gdas.t00z.oznstat.ensmean: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
tar: /gpfs/dell3/ptmp/Kate.Friedman/comrot/blah/enkfgdas.20200902/00/atmos/gdas.t00z.radstat.ensmean: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now

log: /gpfs/dell3/ptmp/Kate.Friedman/comrot/blah/logs/2020090200/gdaseupd.log

I notice the following in the log (INTERPOLATION GRID NOT MONOTONIC IN SIMPIN1):

 4782  create_ges_tendencies: successfully complete
 4783  create_ges_derivatives: successfully complete
 4784   INTERPOLATION GRID NOT MONOTONIC IN SIMPIN1
 4785   i,x1grid,dx=           1  -4.65113432491888      -1.630027530344513E-002
 4786   i,x1grid,dx=           2  -4.66743460022233      -1.627868196438609E-002
 4787   i,x1grid,dx=           3  -4.68371328218671      -1.618311194521738E-002
 4788   i,x1grid,dx=           4  -4.69989639413193        3.12910006733703
 4789   i,x1grid,dx=           5  -1.57079632679490        3.12910006733703
 4790   i,x1grid,dx=           6   1.55830374054214      -1.618311194521738E-002

log: /gpfs/dell3/ptmp/Kate.Friedman/comrot/blah/logs/2020090200/gdaseobs.log

I ran the develop branch for the same date/resolution combo and it's eobs job goes right into GENERAL_READ_GFSATM_NC at the same point:

 create_ges_tendencies: successfully complete
 create_ges_derivatives: successfully complete
GENERAL_READ_GFSATM_NC: read lonb,latb,levs=   384   192   127, scatter nlon,nlat=   384   192, hour=   3.0, idate=   18    9    1 2020 sigf03
 READ_GFS_NETCDF: l_cld_derived =  T
GENERAL_READ_GFSATM_NC: read lonb,latb,levs=   384   192   127, scatter nlon,nlat=   384   192, hour=   4.0, idate=   18    9    1 2020 sigf04
 READ_GFS_NETCDF: l_cld_derived =  T
GENERAL_READ_GFSATM_NC: read lonb,latb,levs=   384   192   127, scatter nlon,nlat=   384   192, hour=   5.0, idate=   18    9    1 2020 sigf05
 READ_GFS_NETCDF: l_cld_derived =  T

log: /gpfs/dell3/ptmp/Kate.Friedman/comrot/blahdev/logs/2020090200/gdaseobs.log

I'm not finding a reason for the difference in the two runs for eobs, nor what goes wrong with generating the ensmean files. I notice that the enkfgdas.20200901/18/atmos/gsidiags folder subfolders are empty so I can see that that leads to issues later on. The namelists appear to be the same between the two runs.

Could there be an issue in the prior half cycle that would lead to that run behavior? Thanks for any help!

@RussTreadon-NOAA
Copy link
Contributor

@RussTreadon-NOAA is investigated.

Looked at blah and blahdev gdas.t18z.atmf006.nc. ncdump -v grid_yt gdas.t18z.atmf006.nc show blah and blahdev have latitudes in different orders. blahdev latitude goes from positive to negative.

data:

 grid_yt = 89.6416480725934, 89.1774327980165, 88.7104760813539,
    88.2428999560018, 87.7750888743169, 87.3071640967019, 86.8391757792909,
    86.3711483847176, 85.9030952554449, 85.435024284034, 84.966940436571,

    -86.3711483847176, -86.8391757792909, -87.3071640967019,
    -87.7750888743169, -88.2428999560018, -88.7104760813539,
    -89.1774327980165, -89.6416480725934 ;
}

blah latitude goes from negative to positive.

 grid_yt = -89.6416480725934, -89.1774327980165, -88.7104760813539,
    -88.2428999560018, -87.7750888743169, -87.3071640967019,
    -86.8391757792909, -86.3711483847176, -85.9030952554449,

    84.498846993417, 84.966940436571, 85.435024284034, 85.9030952554449,
    86.3711483847176, 86.8391757792909, 87.3071640967019, 87.7750888743169,
    88.2428999560018, 88.7104760813539, 89.1774327980165, 89.6416480725934 ;
}

Same behavior found for lat. GSI extracts and uses grid_yt for grid latitudes.

Why did the order of grid_yt change in blah?

@RussTreadon-NOAA
Copy link
Contributor

blah uses a forecast model executable built from ufs-community/ufs-weather-model at 9bbb6d4

The change log for 9bbb6d4 reads

* adds option to use sea-ice albedo predicted by the ice model in atmospheric radiation calculation
* fixes some issue surface cycling with respect to fractional grid by fixing the land-sea-ice mask for the fractional grid
* restores the North-South flipping capability in the history files

The last bullet likely explains the change in the order of grid_yt in the blah gdas.t18z.atmf006.nc.

@RussTreadon-NOAA
Copy link
Contributor

Variable write_nsflip controls the N/S flipping capability.

This variable is referenced in ufs_model.fd/FV3/io/module_wrt_grid_comp.F90. It seems write_nsflip can be externally set. If write_nsflip=.true. will the latitudes be written in the current GFS v16 order (positive to negative)?

@junwang-noaa
Copy link
Contributor

@RussTreadon-NOAA that is correct. When we cleaned up the nemsio from the ufs-weather-model, the "write_nemsioflip" configuration variable was removed. @jswhit2 told us about the impact of this change on GSI, so the option of flipping the lat direction in the history files was added back in ufs-weather-model PR#723, but the configuration variable is now "write_nsflip", when it is set to true, the latitudes will be written in the current GFS v16 order (positive to negative). Thanks

@RussTreadon-NOAA
Copy link
Contributor

Thank you, @junwang-noaa for confirming that write_nsflip is a configurable variable.

As a test, I modified a copy of the NOAA-EMC-GSI master to multiply by -1 the grid_yt values read from blah atmfXXX.nc files. With this change global_gsi.x ran to completion. This change is not proposed as a solution. It was simply a test demonstrating that the change in the order (ie, sign) of grid_yt caused the gsi behavior Kate reported.

@RussTreadon-NOAA
Copy link
Contributor

ufs_model.fd/FV3/io/module_wrt_grid_comp.F90 defaults write_nsflip to .false.

    call ESMF_ConfigGetAttribute(config=CF,value=write_nsflip,default=.false., &
                                  label='write_nsflip:',rc=rc)

Can we add write_nsflip to parsing_model_configure_FV3.sh with a default value of .true.? That is, revise to read

write_groups:            ${WRITE_GROUP:-1}
write_tasks_per_group:   ${WRTTASK_PER_GROUP:-24}
output_history:          ${OUTPUT_HISTORY:-".true."}
write_dopost:            ${WRITE_DOPOST:-".false."}
write_nsflip:            ${WRITE_NSFLIP:-".true."}
num_files:               ${NUM_FILES:-2}

If we prefer to retain a default value of .false for write_nsflip, in parsing_model_configure_FV3.sh, we can set WRITE_NSFLIP to .true. in config.base as suggested below

export QUILTING=".true."
export OUTPUT_GRID="gaussian_grid"
export OUTPUT_FILE="netcdf"
export WRITE_DOPOST=".true."
export WRITE_NSFLIP=".true."

@KateFriedman-NOAA
Copy link
Member Author

@junwang-noaa Thanks for confirming the change in the ufs-weather-model. @RussTreadon-NOAA Thank you for investigating this and finding the flip setting! I have add the recommended WRITE_NSFLIP variable to both parsing_model_configure_FV3.sh and config.base (see git diff below). I set the default to be .false. and have the config changing it to .true..

[Kate.Friedman@m71a1 feature-coupled_sprint]$ git diff ush/parsing_model_configure_FV3.sh parm/config/config.base.emc.dyn
diff --git a/parm/config/config.base.emc.dyn b/parm/config/config.base.emc.dyn
index c1e239e6..2bc72a0a 100755
--- a/parm/config/config.base.emc.dyn
+++ b/parm/config/config.base.emc.dyn
@@ -196,6 +196,7 @@ export QUILTING=".true."
 export OUTPUT_GRID="gaussian_grid"
 export OUTPUT_FILE="netcdf"
 export WRITE_DOPOST=".true."
+export WRITE_NSFLIP=".true."

 # suffix options depending on file format
 if [ $OUTPUT_FILE = "netcdf" ]; then
diff --git a/ush/parsing_model_configure_FV3.sh b/ush/parsing_model_configure_FV3.sh
index 6060e3d8..6d628785 100755
--- a/ush/parsing_model_configure_FV3.sh
+++ b/ush/parsing_model_configure_FV3.sh
@@ -39,6 +39,7 @@ write_groups:            ${WRITE_GROUP:-1}
 write_tasks_per_group:   ${WRTTASK_PER_GROUP:-24}
 output_history:          ${OUTPUT_HISTORY:-".true."}
 write_dopost:            ${WRITE_DOPOST:-".false."}
+write_nsflip:            ${WRITE_NSFLIP:-".false."}
 num_files:               ${NUM_FILES:-2}
 filename_base:           'atm' 'sfc'
 output_grid:             $OUTPUT_GRID

I also updated config.base.nco.static to add export WRITE_NSFLIP=".true.".

I have started a fresh C192C96L127 test on Mars with WRITE_NSFLIP=.true.:

clone: /gpfs/dell2/emc/modeling/save/Kate.Friedman/git/global-workflow/feature-coupled_sprint
EXPDIR: /gpfs/dell2/emc/modeling/save/Kate.Friedman/expdir/fcscyc
COMROT: /gpfs/dell3/ptmp/Kate.Friedman/comrot/fcscyc

@KateFriedman-NOAA
Copy link
Member Author

@RussTreadon-NOAA Running with WRITE_NSFLIP=.true. seems to have resolved the error, thanks again! I now have the previously missing cnvstat.ensmean, oznstat.ensmean, radstat.ensmean files generated by the eobs/ediag jobs. The gdas[gfs]anal and gdaseupd jobs are running now...waiting to see if they encounter other issues...

@KateFriedman-NOAA
Copy link
Member Author

No further job-ending errors encountered in anal/eupd jobs in new fcscyc test. Need to scan logs for silent errors still.

The 18z half cycle is complete. The 00z gdas/enkf suites are complete. The 00z gfs suite is wrapping up still. The 06z cycle has started. Will let the test run a few more cycles to check stability and the verification pieces. So far cycled atmos-only mode is working in feature/coupled_sprint branch with minimal modifications. Hooray!

@KateFriedman-NOAA
Copy link
Member Author

Made new branch off feature/coupled_sprint to hold cycled updates: https://github.com/KateFriedman-NOAA/global-workflow/tree/feature/coupled_sprint_cycling

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Aug 27, 2021
- if-block that changed LATB_CASE to "190" when LATB_CASE is "192" is removed
- if-block was needed previously to handle missing fix file for this case
- new fix set includes symlinks to redirect fix files for certain resolutions
- LATB_CASE change created forecast output on the wrong grid dimensions; removing block resolves that

Refs: NOAA-EMC#427
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Aug 27, 2021
- add WRITE_NSFLIP variable to parsing_model_configure_FV3.sh with default set to .false.
- add WRITE_NSFLIP variable to config.base and set to .true.
- running with WRITE_NSFLIP=.true. writes latitudes in current GFSv16 order (positive to negative)
- running with WRITE_NSFLIP=.false. writes latitudes negative to positive and causes issues with the GSI

Refs: NOAA-EMC#427
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Aug 30, 2021
@KateFriedman-NOAA KateFriedman-NOAA added the blocked Issue is currently being blocked by another issue label Sep 7, 2021
@KateFriedman-NOAA
Copy link
Member Author

Awaiting resolution of ufs-weather-model issue #785 before continuing testing with IAU on. Branch currently cycles without job failure with IAU off.

@KateFriedman-NOAA
Copy link
Member Author

PR #785 in ufs-weather-model is now complete. Will look to test updated hash upon return from annual leave on 9/20.

@KateFriedman-NOAA KateFriedman-NOAA removed the blocked Issue is currently being blocked by another issue label Sep 10, 2021
@KateFriedman-NOAA
Copy link
Member Author

Now awaiting feature/coupled_crow to be updated to (or past) ufs-community/ufs-weather-model@29fa453 (hash when IAU issue was resolved). Will retest cycled mode with IAU on at that point. This issue will be on hold until then.

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Nov 9, 2021
- when stochastic physics is turned on the &nam_sfcperts namelist section
wasn't getting added to input.nml

Refs: NOAA-EMC#427
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Nov 9, 2021
- a timestep of 720 on WCOSS-Dell for a C96 enkf fcst job failed
- reducing the timestep to 450 produced success

Refs: NOAA-EMC#427
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Nov 10, 2021
- set restart_interval the same regardless of whether it's cold or warm
- use prior cold-start values all the time now
- deterministic forecast job restart_interval is now "3 6"
- ensemble forecast job restart_interval is now "3 -1"

Refs: NOAA-EMC#427
@WalterKolczynski-NOAA
Copy link
Contributor

Confirmed to work now in feature/coupled_sprint

lgannoaa pushed a commit to lgannoaa/global-workflow that referenced this issue Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants