-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Add GDAS and GEFS datasets to the workflow #526
[develop] Add GDAS and GEFS datasets to the workflow #526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdwardSnyder-NOAA Thanks for addressing the changes from the last review. I left just a couple of additional comments/concerns below.
sed "s|f{fcst_hr:03d}|f`printf %03d $fcst_hr`|g" | | ||
sed "s|f{fcst_hr:02d}|f`printf %02d $fcst_hr`|g" ) | ||
echo $fn_mod | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of sed
will likely break functionality on a mac. Not sure if that matters here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function will go away with my latest changes. But I do have a sed command later in the script, which I changed to 'awk' instead.
awk -F "'" "{ print $ $name }" ) | ||
fn_list+=( "$filename" ) | ||
done | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's easier to get the cycle-specific filenames that were downloaded from the summary file? They're already in a bash array, IIR.
This solution imposes a specified arbitrary order on the items that are expected in the data_locations.yml file that was not a requirement before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to using the filenames found in the summary file. Let me know what you think.
external_model="GFS" | ||
input_type="gaussian_netcdf" | ||
fn_atm="${EXTRN_MDL_FNS[0]}" | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but there are 2 entries present at the file at that link for fn_atm_nemsio
and fn_sfc_nemsio
to support nemsio input files from GDAS.
Those entries are also present in the case statement below, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdwardSnyder-NOAA This looks great! Thanks for your mods!
I think I am fine with merging if you want to just open an Issue in our RRFS project board for supporting the nemsio GDAS files as I commented below. We can handle that with a much smaller PR later.
external_model="GFS" | ||
input_type="gaussian_netcdf" | ||
fn_atm="${EXTRN_MDL_FNS[0]}" | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry, I should have been more explicit. In the RRFS_dev1 version, under GDAS on line 337, there are these entries:
"GDASENKF")
tracers_input="[\"spfh\",\"clwmr\",\"o3mr\",\"icmr\",\"rwmr\",\"snmr\",\"grle\"]"
tracers="[\"sphum\",\"liq_wat\",\"o3mr\",\"ice_wat\",\"rainwat\",\"snowwat\",\"graupel\"]"
external_model="GFS"
input_type="gaussian_netcdf"
fn_atm_nemsio="${EXTRN_MDL_FNS[0]}"
fn_sfc_nemsio="${EXTRN_MDL_FNS[1]}"
;;
In your addition of GDAS here, you've added a single fn_atm
entry where the RRFS version has two lines fn_atm_nemsio
and fn_sfc_nemsio
. This means we can't use nemsio files for GDAS, we'd have to use netcdf. We need an option to use nemsio.
@EdwardSnyder-NOAA I am working through a code review and it looks good so far, but I wanted to bring this up early: does it make sense to add workflow end-to-end tests (or modify current ones) to include these new input data types? We could stage some static data for testing on platforms that do not have HPSS access. Having config files for these WE2E tests would have the added benefit of being a convenient example case for anyone wanting to try out this capability. |
@mkavulich - I like that idea a lot. We can modify one of the current WE2E tests. Would adding the GDAS and GEFS WE2E tests be included with this PR or could we create a GitHub issue for it and tackle it later? |
@EdwardSnyder-NOAA I think it's okay to start a new issue and address this later. It's probably about time we started a conversation about paring down the number of existing tests again, because we are starting to proliferate again and there's a lot of consolidating we could do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Eddy, sorry to take so long to get you a review here. I have not yet finished my review but at least one fix needs to be made. In addition, your code is based off a quite old commit (bf07542, from mid-December), can you please update your branch to the latest develop
branch? Thanks.
@@ -136,6 +140,13 @@ if [ $SYMLINK_FIX_FILES = "TRUE" ]; then | |||
additional_flags="$additional_flags \ | |||
--symlink" | |||
fi | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic doesn't appear to work correctly for ensemble cases. When I ran a test ensemble case (community_ensemble_008mems), the script fails with the following error:
Traceback (most recent call last):
File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 1020, in <module>
main(sys.argv[1:])
File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 808, in main
unavailable = get_requested_files(
File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 366, in get_requested_files
target_path = fill_template(cla.output_path, cla.cycle_date, mem=mem)
File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 226, in fill_template
return template_str.format(**format_values)
ValueError: Unknown format code 'd' for object of type 'str'
+ print_err_msg_exit 'Call to retrieve_data.py failed with a non-zero exit status.
The command was:
python3 -u /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py --debug --anl_or_fcst anl --config /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/parm/data_locations.yml --cycle_date 2019070100 --data_stores disk disk --external_model FV3GFS --fcst_hrs 0 --ics_or_lbcs ICS --output_path /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/expt_dirs/community_ensemble_008mems/2019070100/FV3GFS/for_ICS/mem{mem:03d} --summary_file extrn_mdl_var_defns.sh --file_type nemsio --input_file_path /mnt/lfs4/BMC/wrfruc/UFS_SRW_App/develop/input_model_data/FV3GFS/nemsio/2019070100 --symlink --members 1 8
'
FATAL ERROR:
ERROR:
From script: "exregional_get_extrn_mdl_files.sh"
Full path to script: "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/scripts/exregional_get_extrn_mdl_files.sh"
Call to retrieve_data.py failed with a non-zero exit status.
Full experiment on Jet here: /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/expt_dirs/community_ensemble_008mems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mike, no worries! I updated my fork branch to the latest develop
branch. I also found a bug in the code that was causing this error (line 347 needed members
to be defined as cla.members
. It should run through now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it appears to be working now!
…2022040400_ensemble_2mems and get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2mems.
@EdwardSnyder-NOAA I have added the two new WE2E tests to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the last round of changes, everything looks good, approving now.
Just a note: the GEFS and GDAS data currently only works for ensemble cases, but it's not noted explicitly in the documentation. It seems like it should be easy enough to make this data available for deterministic cases as well, but I concede that's beyond the scope of this PR. I will open an issue discussing whether we should add this capability, or just update the docs to clarify that GDAS and GEFS data is for ensemble cases only.
@@ -136,6 +140,13 @@ if [ $SYMLINK_FIX_FILES = "TRUE" ]; then | |||
additional_flags="$additional_flags \ | |||
--symlink" | |||
fi | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it appears to be working now!
Machine: hera |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdwardSnyder-NOAA These changes look good to me! The non-Hera Jenkins tests have passed successfully and the GHA Hera tests have passed as well. I will now approve and merge this work.
DESCRIPTION OF CHANGES:
This PR adds the ensemble (GDAS and GEFS) datasets as options for the SRW App and is part of a bigger effort to merge in RRFS capabilities into the develop branch.
Currently, the workflow ensemble tasks all use the same initial conditions. Because of the addition of the ensemble datasets, each ensemble task can now have its own initial conditions with the global ensemble number mapping to the SRW App ensemble number (i.e. GDAS mem002 corresponds to SRW App mem002).
The appropriate logic was added to these files so that the GEFS and GDAS datasets can run with the workflow without breaking existing functionality. A number of changes were made in the
retrieve_data.py
script to handle the--members
argument and GEFS filename list. A GEFS function was added to theexregional_get_extrn_mdl_files.sh
file to merge the GEFS files and update the summaryextrn_mdl_var_defns.sh
script.Testing was done on Jet and ran through experiments that used GEFS, GDAS, and FV3GFS as the initial conditions.
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
DOCUMENTATION:
Updated the ConfigWorkflow and InputOutputFiles rst files to reflect the addition of GEFS and GDAS as ICS/LBCS datasets.
ISSUE:
This resolves issue: #485 - Start ensemble members from global ensemble members
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):