Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Add RAP/GFS bufr and GSI-fix file retrieval options #535

Merged
merged 35 commits into from
Feb 10, 2023

Conversation

ulmononian
Copy link
Collaborator

@ulmononian ulmononian commented Jan 12, 2023

DESCRIPTION OF CHANGES:

This PR is part of a series that will facilitate integration of RRFS features into srw-dev (e.g.: #526 by @EdwardSnyder-NOAA ).

RRFS requires certain observation files and data products to be retrieved & staged before the data processing and assimilation tasks are run. This work introduces some small changes to the retrieve_data.py script to allow retrieval/staging of local (machine-hosted; e.g.: Jet, Hera, noaacloud) and general, remotely located (extending beyond nomads/aws) files, primarily in the relaxation of/introduction of new parse_args function arguments.

Minor updates to the data_locations.yaml are included here (more will follow), including the addition of RAP bufr files and GSI-fix files from NOAA AWS S3 buckets. FV3GFS & RAP bufr data stream locations on Jet and Hera have been added to the respective ush/machine YAMLS.

Type of change

  • [x ] New feature (non-breaking change which adds functionality)

TESTS CONDUCTED:

Currently, the functionality has only been tested through direct call of the retrieve_data.py script on Jet and Hera (i.e.: "python retrieve_data.py --external_model RAP --data_stores local --cycle 2023011100 --config $path_to_srw_top_level/ush/machine/jet.yaml --anl_or_fcst obs --output_path $arbitrary_staging_area --symlink). However, once the RRFS observation retrieval/staging/processing JJOBS and exregional scripts are developed, this retrieval functionality will be added into the workflow.

Compiled/built srw-dev on Jet/Hera with changes from this fork w/out issue.

DEPENDENCIES:

None

ISSUE:

#484

CHECKLIST

  • [ x] My code follows the style guidelines in the Contributor's Guide
  • [x ] I have performed a self-review of my own code using the Code Reviewer's Guide
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • [ x] My changes do not require updates to the documentation (explain).
  • [ x] My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • [x ] enhancement

CONTRIBUTORS (optional):

Bruce Kropp (@BruceKropp-Raytheon)

ulmononian and others added 14 commits January 9, 2023 23:21
…s to jet machine file. Add in Bruce's retrieve_data.py updates to accomodate local file staging. Add JJOB and exregional jobs for observation data staging (preliminary).
…isk, machine specific) file paths should be located in the respective machine file rather than data_locations.yaml, so this is a start.
@ulmononian
Copy link
Collaborator Author

@christinaholtNOAA i went through and addressed your recent comments. thanks so much for looking again. i apologize for the discordance between my remote and local (i can't figure out why it is acting this way).

@mkavulich
Copy link
Collaborator

@ulmononian If @MichaelLueken hasn't reached out to you yet he will shortly. We had to rewrite history to remove a commit from the repository last week, so that is likely why you are seeing unexpected behavior with your fork. It should be a simple fix but we will need to fix your fork to remove this commit as well.

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulmononian This looks good to me! Thanks for the iterations. :)

Edit: Make sure it passes the tests though.

@ulmononian
Copy link
Collaborator Author

@christinaholtNOAA thanks for the final look!! may i ask which tests you are referring to (a set of WE2E's or something else)?

@MichaelLueken
Copy link
Collaborator

@ulmononian It looks like both of the unit tests have failed for this PR.

If you merge the latest develop into your feature/add_rrfs_obs branch, the Python unittests should pass.

Looking at the Python Functional Tests log, I'm seeing the following error for several of these tests:

python3 -m unittest: error: the following arguments are required: --file_set

It looks like you will need to make modifications to ush/test_retrieve_data.py to replace --anl_or_fcst with --file_set. I suspect that this will allow the Python Functional Tests to pass.

Both of these tests will need to pass before this work can be merged into develop. I think this is what @christinaholtNOAA meant by making sure that the changes in this PR pass the tests.

Change "--anl_or_fcst"  to "--file_set" to match update in retrieve_data.py
@ulmononian
Copy link
Collaborator Author

ulmononian commented Feb 10, 2023

@ulmononian It looks like both of the unit tests have failed for this PR.

If you merge the latest develop into your feature/add_rrfs_obs branch, the Python unittests should pass.

Looking at the Python Functional Tests log, I'm seeing the following error for several of these tests:

python3 -m unittest: error: the following arguments are required: --file_set

It looks like you will need to make modifications to ush/test_retrieve_data.py to replace --anl_or_fcst with --file_set. I suspect that this will allow the Python Functional Tests to pass.

Both of these tests will need to pass before this work can be merged into develop. I think this is what @christinaholtNOAA meant by making sure that the changes in this PR pass the tests.

thanks so much for clarifying (and identifying a root cause of the unit test failures) @MichaelLueken :) i merged in develop and updated test_retrieve_data.py to reflect the arg. name change from --anl_or_fcst to --file_set.

@MichaelLueken MichaelLueken added ci-hera-intel-WE Kicks off automated workflow test on hera with intel run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests labels Feb 10, 2023
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Feb 10, 2023
@venitahagerty
Copy link
Collaborator

venitahagerty commented Feb 10, 2023

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210155017/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Failed on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2023-02-10 16:12:13 +0000 :: hfe06 :: Task get_extrn_ics, jobid=41872471, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2023-02-10 16:12:13 +0000 :: hfe06 :: Task get_extrn_lbcs, jobid=41872472, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: community_ensemble_2mems_stoch
2023-02-10 16:12:12 +0000 :: hfe05 :: Task get_extrn_ics, jobid=41872468, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: community_ensemble_2mems_stoch
2023-02-10 16:12:12 +0000 :: hfe05 :: Task get_extrn_lbcs, jobid=41872469, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: MET_ensemble_verification
2023-02-10 16:12:11 +0000 :: hfe04 :: Task get_extrn_ics, jobid=41872474, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: MET_ensemble_verification
2023-02-10 16:12:11 +0000 :: hfe04 :: Task get_extrn_lbcs, jobid=41872475, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2023-02-10 16:12:10 +0000 :: hfe09 :: Task get_extrn_ics, jobid=41872461, in state DEAD (FAILED), ran for 14.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2023-02-10 16:12:10 +0000 :: hfe09 :: Task get_extrn_lbcs, jobid=41872462, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2023-02-10 16:12:12 +0000 :: hfe04 :: Task get_extrn_ics, jobid=41872450, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2023-02-10 16:12:12 +0000 :: hfe04 :: Task get_extrn_lbcs, jobid=41872451, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2023-02-10 16:12:14 +0000 :: hfe03 :: Task get_extrn_ics, jobid=41872455, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2023-02-10 16:12:14 +0000 :: hfe03 :: Task get_extrn_lbcs, jobid=41872456, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2023-02-10 16:12:06 +0000 :: hfe06 :: Task get_extrn_ics, jobid=41872477, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2023-02-10 16:12:06 +0000 :: hfe06 :: Task get_extrn_lbcs, jobid=41872478, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
2023-02-10 16:12:13 +0000 :: hfe02 :: Task get_extrn_ics, jobid=41872458, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
2023-02-10 16:12:13 +0000 :: hfe02 :: Task get_extrn_lbcs, jobid=41872459, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: pregen_grid_orog_sfc_climo
2023-02-10 16:12:14 +0000 :: hfe07 :: Task get_extrn_ics, jobid=41872452, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: pregen_grid_orog_sfc_climo
2023-02-10 16:12:14 +0000 :: hfe07 :: Task get_extrn_lbcs, jobid=41872453, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2023-02-10 16:12:11 +0000 :: hfe04 :: Task get_extrn_ics, jobid=41872464, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2023-02-10 16:12:11 +0000 :: hfe04 :: Task get_extrn_lbcs, jobid=41872465, in state DEAD (FAILED), ran for 12.0 seconds, exit status=256, try=1 (of 1)
All experiments completed

@ulmononian
Copy link
Collaborator Author

ulmononian commented Feb 10, 2023

for the test "grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16" , i see this call in the log "/scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210155017/expt_dirs/../nco_dirs/output/20190701/get_extrn_ics_2019070100.id_1676045187.log": python3 -u /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210155017/ufs-srweather-app/ush/retrieve_data.py --debug --anl_or_fcst anl --config /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210155017/ufs-srweather-app/parm/data_locations.yml --cycle_date 2019070100 --data_stores disk disk --external_model FV3GFS --fcst_hrs 0 --ics_or_lbcs ICS --output_path /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210155017/expt_dirs/../nco_dirs/ext/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2.20190701/00 --summary_file rrfs.t00z.FV3GFS.ICS.extrn_mdl_var_defns.sh --file_type nemsio --input_file_path /scratch2/BMC/det/UFS_SRW_App/develop/input_model_data/FV3GFS/nemsio/2019070100 --symlink

i am not sure why --anl_or_fcst is still showing up there as an argument since it's been removed from retrieve_data.py (and these updates are reflected in /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210155017/ufs-srweather-app/ush/retrieve_data.py).

@mkavulich i i grepped "anl_or_fcst" at the top of ufs-srweather-app: looks like i need to update the arg. name in /scripts/exregional_get_extrn_mdl_files.sh

@ulmononian
Copy link
Collaborator Author

ulmononian commented Feb 10, 2023

@MichaelLueken @venitahagerty necessary updates made to /scripts/exregional_get_extrn_mdl_files.sh. sorry about that. hopefully next set of WE2E's pass. @MichaelLueken would you mind re-adding the ci-hera-intel-WE label?

@MichaelLueken MichaelLueken added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Feb 10, 2023
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Feb 10, 2023
@venitahagerty
Copy link
Collaborator

venitahagerty commented Feb 10, 2023

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210172018/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on hera: MET_ensemble_verification
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
2023-02-10 17:44:08 +0000 :: hfe12 :: Task get_extrn_lbcs, jobid=41874336, in state DEAD (FAILED), ran for 14.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: pregen_grid_orog_sfc_climo
2023-02-10 17:44:11 +0000 :: hfe03 :: Task get_extrn_lbcs, jobid=41874372, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
2023-02-10 17:44:05 +0000 :: hfe09 :: Task get_extrn_lbcs, jobid=41874322, in state DEAD (FAILED), ran for 19.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
2023-02-10 17:44:05 +0000 :: hfe06 :: Task get_extrn_lbcs, jobid=41874327, in state DEAD (FAILED), ran for 16.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
2023-02-10 17:44:15 +0000 :: hfe12 :: Task get_extrn_lbcs, jobid=41874333, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: MET_ensemble_verification
2023-02-10 17:44:11 +0000 :: hfe10 :: Task get_extrn_lbcs, jobid=41874374, in state DEAD (FAILED), ran for 11.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
2023-02-10 17:44:12 +0000 :: hfe05 :: Task get_extrn_lbcs, jobid=41874339, in state DEAD (FAILED), ran for 14.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
2023-02-10 17:44:08 +0000 :: hfe12 :: Task get_extrn_lbcs, jobid=41874365, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
2023-02-10 17:44:09 +0000 :: hfe12 :: Task get_extrn_lbcs, jobid=41874343, in state DEAD (FAILED), ran for 13.0 seconds, exit status=256, try=1 (of 1)
Experiment Failed on hera: community_ensemble_2mems_stoch
2023-02-10 17:44:08 +0000 :: hfe04 :: Task get_extrn_lbcs, jobid=41874330, in state DEAD (FAILED), ran for 16.0 seconds, exit status=256, try=1 (of 1)

@ulmononian
Copy link
Collaborator Author

@MichaelLueken @venitahagerty looks like an unbound variable issue with --file_set. not sure if it was due to a typo (now fixed) or not, but that is the only update i pushed for now.

@MichaelLueken MichaelLueken added the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Feb 10, 2023
@venitahagerty venitahagerty removed the ci-hera-intel-WE Kicks off automated workflow test on hera with intel label Feb 10, 2023
@venitahagerty
Copy link
Collaborator

venitahagerty commented Feb 10, 2023

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1195772843/20230210182016/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-hera-intel-WE
Experiment Succeeded on hera: community_ensemble_2mems_stoch
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: MET_ensemble_verification
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
Experiment Succeeded on hera: community_ensemble_2mems_stoch
Experiment Succeeded on hera: pregen_grid_orog_sfc_climo
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot
Experiment Succeeded on hera: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on hera: MET_ensemble_verification
All experiments completed

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulmononian These changes look good to me! Thanks for iterating through the GHA Hera WE2E test errors! The non-Hera Jenkins tests have all completed successfully, as have the Hera GHA tests. I will now approve these changes and move forward with merging this work.

@MichaelLueken MichaelLueken merged commit 034d873 into ufs-community:develop Feb 10, 2023
@ulmononian ulmononian deleted the feature/add_rrfs_obs branch February 12, 2023 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data retrieval for observations
5 participants