-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug on GEFS fcst segment #3001
Comments
I just checked and this is definitely working correctly for gfs atm-only. Will try again with coupled, then gefs. |
Looks like the WW3 restart files are not being written to the correct directory. There is a CC: @aerorahul |
I've confirmed restart works correctly for S2S without waves. Will fix the wave restarts this week. |
Fixes some issues that were preventing wave restarts from operating correctly. First, the wave restart files were not being correctly linked from `$DATA` to `$DATArestart`. The files are placed in the root of `$DATA` instead of in `${DATA}/WAVE_RESTART`, so now links for the individual files are created. Second, the incorrect filenames were being searches for and copied as part of a rerun. Filenames were geared towards multigrid waves, which use the grid names, but single grid just uses a `ww3`. Since multigrid waves are deprecated in workflow and will soon be removed (NOAA-EMC#2637), these were updated only supporting the single-grid option. These fixes allow forecast segments (and emergency restarts) to work correctly when waves are on. Resolves NOAA-EMC#3001
Fixes some issues that were preventing wave restarts from operating correctly. First, the wave restart files were not being correctly linked from `$DATA` to `$DATArestart`. The files are placed in the root of `$DATA` instead of in `${DATA}/WAVE_RESTART`, so now links for the individual files are created. Second, the incorrect filenames were being searches for and copied as part of a rerun. Filenames were geared towards multigrid waves, which use the grid names, but single grid just uses a `ww3`. Since multigrid waves are deprecated in workflow and will soon be removed (NOAA-EMC#2637), these were updated only supporting the single-grid option. These fixes allow forecast segments (and emergency restarts) to work correctly when waves are on. Resolves NOAA-EMC#3001
- Add elif for prior wave restart filename for when the new filename is not available in IC set. Refs NOAA-EMC#3001
What is wrong?
When run gefs in segment, the fcst hour seems overlapped, as below:
[Wei.Huang@hfe03 2021032312]$ grep cfhour fcst_mem00*
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 12.0000000000000 cfhour=012
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 24.0000000000000 cfhour=024
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 36.0000000000000 cfhour=036
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 48.0000000000000 cfhour=048
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 54.0000000000000 cfhour=054
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 60.0000000000000 cfhour=060
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 66.0000000000000 cfhour=066
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 72.0000000000000 cfhour=072
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 78.0000000000000 cfhour=078
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 84.0000000000000 cfhour=084
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 90.0000000000000 cfhour=090
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 96.0000000000000 cfhour=096
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 102.000000000000 cfhour=102
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 108.000000000000 cfhour=108
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 114.000000000000 cfhour=114
fcst_mem000_seg1.log: 6: in wrt run, nfhour= 120.000000000000 cfhour=120
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 12.0000000000000 cfhour=012
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 24.0000000000000 cfhour=024
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 36.0000000000000 cfhour=036
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 48.0000000000000 cfhour=048
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 54.0000000000000 cfhour=054
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 60.0000000000000 cfhour=060
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 66.0000000000000 cfhour=066
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 72.0000000000000 cfhour=072
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 78.0000000000000 cfhour=078
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 84.0000000000000 cfhour=084
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 90.0000000000000 cfhour=090
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 96.0000000000000 cfhour=096
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 102.000000000000 cfhour=102
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 108.000000000000 cfhour=108
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 114.000000000000 cfhour=114
fcst_mem001_seg1.log: 6: in wrt run, nfhour= 120.000000000000 cfhour=120
For mem000, seg 0 fcst from 00 - 48, then seg 1 from 12 to 120, should seg 1 be from 48 - 120?
For mem001 and mem002, seg 0 from 00 - 120, and then seg 1 from 12 to 120, seg 1 here is not needed at all, right?
rocotostat shows this:
CYCLE TASK JOBID STATE EXIT STATUS TRIES DURATION
================================================================================================================================
202103231200 stage_ic 803100 SUCCEEDED 0 1 21.0
202103231200 wave_init 803099 SUCCEEDED 0 1 28.0
202103231200 prep_emissions 803098 SUCCEEDED 0 1 17.0
202103231200 fcst_mem000_seg0 803195 SUCCEEDED 0 1 1164.0
202103231200 fcst_mem000_seg1 803964 SUCCEEDED 0 1 2812.0
202103231200 fcst_mem001_seg0 803196 SUCCEEDED 0 1 2859.0
202103231200 fcst_mem001_seg1 805320 SUCCEEDED 0 1 2890.0
202103231200 fcst_mem002_seg0 803197 SUCCEEDED 0 1 2850.0
202103231200 fcst_mem002_seg1 805321 SUCCEEDED 0 1 2884.0
it seems mem000 over-used 1/3 of CPU, and mem001, and mem002 doubled the CPU cost.
What should have happened?
We expect all members, if in two seg fcst, it should be:
seg 0, fcst from 00 -> 48,
seg 1, fcst from 48 -> 120.
What machines are impacted?
All or N/A
What global-workflow hash are you using?
The test is using EPIC's fork of global-workflow, which is point to the current develop.
Steps to reproduce
To produce on Hera:
HPC_ACCOUNT=epic
pslot=c48gefs
RUNTESTS=/scratch1/NCEPDEV/stmp2/$USER/GEFSTESTS
./workflow/create_experiment.py
--yaml ci/cases/pr/C48_S2SWA_gefs.yaml
Additional information
COMROOT and EXPDIR on Hera at:
[Wei.Huang@hfe03 GEFSTESTS]$ pwd
/scratch1/NCEPDEV/stmp2/Wei.Huang/GEFSTESTS
[Wei.Huang@hfe03 GEFSTESTS]$ ls -l
total 8
drwxr-sr-x 3 Wei.Huang stmp 4096 Oct 10 22:56 COMROOT
drwxr-sr-x 3 Wei.Huang stmp 4096 Oct 10 22:56 EXPDIR
[Wei.Huang@hfe03 GEFSTESTS]$ ls -l *
COMROOT:
total 4
drwxr-sr-x 4 Wei.Huang stmp 4096 Oct 10 22:57 c48gefs
EXPDIR:
total 4
drwxr-sr-x 3 Wei.Huang stmp 4096 Oct 11 14:10 c48gefs
Do you have a proposed solution?
No
The text was updated successfully, but these errors were encountered: