Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Add GDAS and GEFS datasets to the workflow #526

Merged

Conversation

EdwardSnyder-NOAA
Copy link
Collaborator

@EdwardSnyder-NOAA EdwardSnyder-NOAA commented Dec 23, 2022

DESCRIPTION OF CHANGES:

This PR adds the ensemble (GDAS and GEFS) datasets as options for the SRW App and is part of a bigger effort to merge in RRFS capabilities into the develop branch.

Currently, the workflow ensemble tasks all use the same initial conditions. Because of the addition of the ensemble datasets, each ensemble task can now have its own initial conditions with the global ensemble number mapping to the SRW App ensemble number (i.e. GDAS mem002 corresponds to SRW App mem002).

The appropriate logic was added to these files so that the GEFS and GDAS datasets can run with the workflow without breaking existing functionality. A number of changes were made in the retrieve_data.py script to handle the --members argument and GEFS filename list. A GEFS function was added to the exregional_get_extrn_mdl_files.sh file to merge the GEFS files and update the summary extrn_mdl_var_defns.sh script.

Testing was done on Jet and ran through experiments that used GEFS, GDAS, and FV3GFS as the initial conditions.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

DOCUMENTATION:

Updated the ConfigWorkflow and InputOutputFiles rst files to reflect the addition of GEFS and GDAS as ICS/LBCS datasets.

ISSUE:

This resolves issue: #485 - Start ensemble members from global ensemble members

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdwardSnyder-NOAA Thanks for addressing the changes from the last review. I left just a couple of additional comments/concerns below.

sed "s|f{fcst_hr:03d}|f`printf %03d $fcst_hr`|g" |
sed "s|f{fcst_hr:02d}|f`printf %02d $fcst_hr`|g" )
echo $fn_mod
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of sed will likely break functionality on a mac. Not sure if that matters here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function will go away with my latest changes. But I do have a sed command later in the script, which I changed to 'awk' instead.

awk -F "'" "{ print $ $name }" )
fn_list+=( "$filename" )
done
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's easier to get the cycle-specific filenames that were downloaded from the summary file? They're already in a bash array, IIR.

This solution imposes a specified arbitrary order on the items that are expected in the data_locations.yml file that was not a requirement before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to using the filenames found in the summary file. Let me know what you think.

external_model="GFS"
input_type="gaussian_netcdf"
fn_atm="${EXTRN_MDL_FNS[0]}"
;;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but there are 2 entries present at the file at that link for fn_atm_nemsio and fn_sfc_nemsio to support nemsio input files from GDAS.

Those entries are also present in the case statement below, too.

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdwardSnyder-NOAA This looks great! Thanks for your mods!

I think I am fine with merging if you want to just open an Issue in our RRFS project board for supporting the nemsio GDAS files as I commented below. We can handle that with a much smaller PR later.

external_model="GFS"
input_type="gaussian_netcdf"
fn_atm="${EXTRN_MDL_FNS[0]}"
;;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I should have been more explicit. In the RRFS_dev1 version, under GDAS on line 337, there are these entries:

"GDASENKF")
  tracers_input="[\"spfh\",\"clwmr\",\"o3mr\",\"icmr\",\"rwmr\",\"snmr\",\"grle\"]"
  tracers="[\"sphum\",\"liq_wat\",\"o3mr\",\"ice_wat\",\"rainwat\",\"snowwat\",\"graupel\"]"
  external_model="GFS"
  input_type="gaussian_netcdf"
  fn_atm_nemsio="${EXTRN_MDL_FNS[0]}"
  fn_sfc_nemsio="${EXTRN_MDL_FNS[1]}"
  ;;

In your addition of GDAS here, you've added a single fn_atm entry where the RRFS version has two lines fn_atm_nemsio and fn_sfc_nemsio. This means we can't use nemsio files for GDAS, we'd have to use netcdf. We need an option to use nemsio.

@mkavulich
Copy link
Collaborator

@EdwardSnyder-NOAA I am working through a code review and it looks good so far, but I wanted to bring this up early: does it make sense to add workflow end-to-end tests (or modify current ones) to include these new input data types? We could stage some static data for testing on platforms that do not have HPSS access.

Having config files for these WE2E tests would have the added benefit of being a convenient example case for anyone wanting to try out this capability.

@EdwardSnyder-NOAA
Copy link
Collaborator Author

@EdwardSnyder-NOAA I am working through a code review and it looks good so far, but I wanted to bring this up early: does it make sense to add workflow end-to-end tests (or modify current ones) to include these new input data types? We could stage some static data for testing on platforms that do not have HPSS access.

Having config files for these WE2E tests would have the added benefit of being a convenient example case for anyone wanting to try out this capability.

@mkavulich - I like that idea a lot. We can modify one of the current WE2E tests. Would adding the GDAS and GEFS WE2E tests be included with this PR or could we create a GitHub issue for it and tackle it later?

@mkavulich
Copy link
Collaborator

@EdwardSnyder-NOAA I think it's okay to start a new issue and address this later. It's probably about time we started a conversation about paring down the number of existing tests again, because we are starting to proliferate again and there's a lot of consolidating we could do.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Eddy, sorry to take so long to get you a review here. I have not yet finished my review but at least one fix needs to be made. In addition, your code is based off a quite old commit (bf07542, from mid-December), can you please update your branch to the latest develop branch? Thanks.

@@ -136,6 +140,13 @@ if [ $SYMLINK_FIX_FILES = "TRUE" ]; then
additional_flags="$additional_flags \
--symlink"
fi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic doesn't appear to work correctly for ensemble cases. When I ran a test ensemble case (community_ensemble_008mems), the script fails with the following error:

Traceback (most recent call last):
  File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 1020, in <module>
    main(sys.argv[1:])
  File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 808, in main
    unavailable = get_requested_files(
  File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 366, in get_requested_files
    target_path = fill_template(cla.output_path, cla.cycle_date, mem=mem)
  File "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py", line 226, in fill_template
    return template_str.format(**format_values)
ValueError: Unknown format code 'd' for object of type 'str'
+ print_err_msg_exit 'Call to retrieve_data.py failed with a non-zero exit status.

The command was:

python3 -u /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/ush/retrieve_data.py   --debug   --anl_or_fcst anl   --config /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/parm/data_locations.yml   --cycle_date 2019070100   --data_stores disk disk   --external_model FV3GFS   --fcst_hrs 0   --ics_or_lbcs ICS   --output_path /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/expt_dirs/community_ensemble_008mems/2019070100/FV3GFS/for_ICS/mem{mem:03d}   --summary_file extrn_mdl_var_defns.sh      --file_type nemsio   --input_file_path /mnt/lfs4/BMC/wrfruc/UFS_SRW_App/develop/input_model_data/FV3GFS/nemsio/2019070100   --symlink   --members 1 8
'
FATAL ERROR:
ERROR:
  From script:  "exregional_get_extrn_mdl_files.sh"
  Full path to script:  "/mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/ufs-srweather-app/scripts/exregional_get_extrn_mdl_files.sh"
Call to retrieve_data.py failed with a non-zero exit status.

Full experiment on Jet here: /mnt/lfs4/HFIP/dtc-hurr/kavulich/workdir/PR_526/expt_dirs/community_ensemble_008mems

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mike, no worries! I updated my fork branch to the latest develop branch. I also found a bug in the code that was causing this error (line 347 needed members to be defined as cla.members. It should run through now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it appears to be working now!

…2022040400_ensemble_2mems and get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2mems.
@MichaelLueken
Copy link
Collaborator

@EdwardSnyder-NOAA I have added the two new WE2E tests to the PR.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the last round of changes, everything looks good, approving now.

Just a note: the GEFS and GDAS data currently only works for ensemble cases, but it's not noted explicitly in the documentation. It seems like it should be easy enough to make this data available for deterministic cases as well, but I concede that's beyond the scope of this PR. I will open an issue discussing whether we should add this capability, or just update the docs to clarify that GDAS and GEFS data is for ensemble cases only.

@@ -136,6 +140,13 @@ if [ $SYMLINK_FIX_FILES = "TRUE" ]; then
additional_flags="$additional_flags \
--symlink"
fi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it appears to be working now!

@MichaelLueken MichaelLueken added ci-hera-intel-WE Kicks off automated workflow test on hera with intel run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests labels Feb 10, 2023
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Feb 10, 2023
@venitahagerty
Copy link
Collaborator

venitahagerty commented Feb 10, 2023

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1176561214/20230210195018/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Succeeded on hera: community_ensemble_2mems_stoch
Experiment Succeeded on hera: pregen_grid_orog_sfc_climo
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on hera: MET_ensemble_verification
All experiments completed

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdwardSnyder-NOAA These changes look good to me! The non-Hera Jenkins tests have passed successfully and the GHA Hera tests have passed as well. I will now approve and merge this work.

@MichaelLueken MichaelLueken linked an issue Feb 10, 2023 that may be closed by this pull request
@MichaelLueken MichaelLueken merged commit a1076e1 into ufs-community:develop Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Start ensemble members from global ensemble members
5 participants