Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial version of ccpp_track_variables.py #419

Merged
merged 34 commits into from
May 11, 2022

Conversation

mkavulich
Copy link
Collaborator

@mkavulich mkavulich commented Nov 19, 2021

Initial version of script for tracking which variables are modified by which schemes in a given suite.

This script takes in as input the following arguments:

  • A suite definition file
  • A path to scheme metadata files
  • A CCPP config file
  • A variable name

The script output changes based on the input:

  • If the given variable is found in any of the schemes for the given SDF, outputs a list of schemes (in their calling order, including duplicates) along with the variable intent for that scheme
  • If the given variable is not found, but a partial match is made to any of the found variables, the script outputs a list of schemes that contain variables with partial matches (along with those partial-match variables)
  • If the given variable is not found with no partial matches either, the script exits with an error indicating as such

This is a brand new script, so it does not change any existing interfaces. It does add a new variable and method to the Suite class: call_tree is a list of schemes in order (including duplicates) for the given Suite. The method make_call_tree is used to populate the call_tree list; it is not called by the write() method so must be called separately if desired.

Please provide feedback on the format and structure of the program!

I started with roughly the ccpp_prebuild script as a guide, but I am not tied to any particular style or philosophy here.

Testing:
test removed: None
unit tests: None
system tests: None
manual testing: Ran script with a variety of inputs, gives expected output. Examples below:

Running examples

No new python packages or other prerequisites are needed, so you can run on any platform that CCPP can already run on. It is also not necessary to build any code or run prebuild scripts; all you need is the ccpp-framework and ccpp-physics repositories, a model config file (this requirement can hopefully be removed in the future), and the xmls for the suites you wish to analyze.

Here is an example setup based on the ufs-weather-model on Hera:

git clone --recurse-submodules https://github.com/ufs-community/ufs-weather-model
cd ufs-weather-model/FV3/ccpp/framework/
git remote add mkavulich [email protected]:mkavulich/ccpp-framework
git fetch mkavulich
git checkout feature/track_variables_through_suite
cd ..

From here, you can run the following example commands:

$ framework/scripts/ccpp_track_variables.py --help
usage: ccpp_track_variables.py [-h] -s SDF -m METADATA_PATH -c CONFIG -v VARIABLE [--debug]

optional arguments:
  -h, --help            show this help message and exit
  -s SDF, --sdf SDF     suite definition file to parse
  -m METADATA_PATH, --metadata_path METADATA_PATH
                        path to CCPP scheme metadata files
  -c CONFIG, --config CONFIG
                        path to CCPP prebuild configuration file
  -v VARIABLE, --variable VARIABLE
                        variable to track through CCPP suite
  --debug               enable debugging output

Successful output: prints list of schemes that use the specified variable, along with the variable's intent

$ framework/scripts/ccpp_track_variables.py --config=config/ccpp_prebuild_config.py -s=suites/suite_FV3_GFS_v16_noahmp.xml -v canopy_water_amount -m ./physics/physics/
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_gas_optics_rrtmgp
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_arry
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_1scl
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_2str
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_nstr
For suite suites/suite_FV3_GFS_v16_noahmp.xml, the following schemes (in order) use the variable canopy_water_amount:
GFS_phys_time_vary_init (intent in)
noahmpdrv_run (intent inout)
noahmpdrv_run (intent inout)

Unknown variable: script exits with descriptive error message

$ framework/scripts/ccpp_track_variables.py --config=config/ccpp_prebuild_config.py -s=suites/suite_FV3_GFS_v16_noahmp.xml -v volcano -m ./physics/physics/
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_gas_optics_rrtmgp
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_arry
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_1scl
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_2str
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_nstr
ERROR: Variable volcano not found in any suites for sdf suites/suite_FV3_GFS_v16_noahmp.xml

Partial match for variable: outputs list of partial matches for each scheme

$ framework/scripts/ccpp_track_variables.py --config=config/ccpp_prebuild_config.py -s=suites/suite_FV3_GFS_v16_noahmp.xml -v latent_heat -m ./physics/physics/
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_gas_optics_rrtmgp
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_arry
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_1scl
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_2str
WARNING: Encountered closing statement "end type" without type name; assume type_name is ty_optical_props_nstr
ERROR: Variable latent_heat not found in any suites for sdf suites/suite_FV3_GFS_v16_noahmp.xml

Did find partial matches that may be of interest:

In GFS_suite_interstitial_2_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c']
In GFS_surface_generic_post_run found variable(s) ['surface_upward_potential_latent_heat_flux', 'soil_upward_latent_heat_flux', 'canopy_upward_latent_heat_flux', 'snow_deposition_sublimation_upward_latent_heat_flux', 'snow_freezing_rain_upward_latent_heat_flux', 'cumulative_soil_upward_latent_heat_flux_multiplied_by_timestep', 'cumulative_canopy_upward_latent_heat_flu_multiplied_by_timestep', 'cumulative_snow_deposition_sublimation_upward_latent_heat_flux_multiplied_by_timestep', 'cumulative_snow_freezing_rain_upward_latent_heat_flux_multiplied_by_timestep', 'cumulative_surface_upward_potential_latent_heat_flux_multiplied_by_timestep', 'multiplicative_tuning_parameter_for_reduced_latent_heat_flux_due_to_canopy_heat_storage']
In GFS_surface_composites_pre_run found variable(s) ['surface_upward_potential_latent_heat_flux_over_ice']
In GFS_surface_composites_post_run found variable(s) ['surface_upward_potential_latent_heat_flux', 'surface_upward_potential_latent_heat_flux_over_water', 'surface_upward_potential_latent_heat_flux_over_land', 'surface_upward_potential_latent_heat_flux_over_ice', 'kinematic_surface_upward_latent_heat_flux_over_water', 'kinematic_surface_upward_latent_heat_flux_over_land', 'kinematic_surface_upward_latent_heat_flux_over_ice', 'multiplicative_tuning_parameter_for_reduced_latent_heat_flux_due_to_canopy_heat_storage']
In sfc_nst_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'latent_heat_of_fusion_of_water_at_0c', 'kinematic_surface_upward_latent_heat_flux_over_water', 'surface_upward_potential_latent_heat_flux_over_water']
In noahmpdrv_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'latent_heat_of_fusion_of_water_at_0c', 'kinematic_surface_upward_latent_heat_flux_over_land', 'surface_upward_potential_latent_heat_flux_over_land', 'soil_upward_latent_heat_flux', 'canopy_upward_latent_heat_flux', 'snow_deposition_sublimation_upward_latent_heat_flux', 'snow_freezing_rain_upward_latent_heat_flux']
In sfc_sice_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'surface_upward_potential_latent_heat_flux_over_ice', 'kinematic_surface_upward_latent_heat_flux_over_ice', 'kinematic_surface_upward_latent_heat_flux_over_water']
In GFS_PBL_generic_post_run found variable(s) ['instantaneous_surface_upward_latent_heat_flux', 'cumulative_surface_upward_latent_heat_flux_for_coupling_multiplied_by_timestep', 'surface_upward_latent_heat_flux_for_coupling', 'cumulative_surface_upward_latent_heat_flux_for_diag_multiplied_by_timestep', 'instantaneous_surface_upward_latent_heat_flux_for_diag', 'latent_heat_of_vaporization_of_water_at_0c', 'surface_upward_latent_heat_flux_from_coupled_process', 'kinematic_surface_upward_latent_heat_flux_over_water']
In satmedmfvdifq_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c', 'latent_heat_of_fusion_of_water_at_0c', 'instantaneous_surface_upward_latent_heat_flux']
In samfdeepcnv_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c']
In samfshalcnv_run found variable(s) ['latent_heat_of_vaporization_of_water_at_0c']

…ree for that Suite:

in other words, a list of schemes in the order that they are called, including duplicates
and subcycle loops
… of schemes for a given SDF, and a dictionary that associates those schemes with their corresponding .meta files. Last step is to simply parse those .meta files for their variables!
…tadataTable objects for each scheme's .meta file; these objects can then be parsed to get all the information we need for the final step!
…subroutine name, and intent. If partial match, output list of partial matches
…le, variable name, and directory with metadata files, and outputs a list of schemes that use the given variable, along with their intent. Will convert into a more "graph-like" output later, along with other cleanup that is needed before review/testing by the group.
…ive a graphical representation of the calling tree

Also some more cleanup:
 - Change default logging level to WARNING to get rid of unnecessary log messages (debug level remains unchanged)
 - Change more strings to f strings
 - Remove more leftover debug printouts
 - Raise exception if create_metadata_filename_dict fails (esp for no .meta files found)
 - Remove check_var function (might re-introduce in the future)
@ligiabernardet
Copy link
Collaborator

ligiabernardet commented Nov 19, 2021

Thank you for this PR. This is a very helpful new addition to the CCPP. Going forward, we would like to have a visualization capability but having the script is the foundational step.
Do we have any mechanism to make sure ccpp_track_variables does not break when new development is submitted?

@mkavulich
Copy link
Collaborator Author

@ligiabernardet I haven't included any unit or regression tests yet, but I can make that a priority before opening this PR for review. I'll need to familiarize myself with the existing testing framework first

Copy link

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard an awful lot about this tool. Thought I'd pop in and see what it was all about. :)

if not success:
logging.error(f'Parsing suite definition file {sdf} failed.')
success = False
return (success, suite)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this type of error handling situation, it could be useful to use a try/except block to get more information about how the SDF parsing failed instead of only reporting that it didn't work.

This is also a nice example of how one who isn't regularly checking the value of success in the caller of this function could just power through the rest of main with an un-parsable Suite object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a limitation of how errors are handled in the existing object definitions; since they handle errors through a "success" flag I don't want to try to mess with that, especially since this code will be superseded in the (hopefully) near-future and need updating regardless.

raise Exception('Call to import_config failed.')

# Variables defined by the host model
(success, _, _) = gather_variable_definitions(config['variable_definition_files'], config['typedefs_new_metadata'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about this magic could be helpful. I assume given its name, it will update the config data structure in place, but it's not obvious. You are also opting (I'm assuming intentionally) to not store the output, which throws a wrench in the obvious nature of this call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest, I'm not 100% sure exactly what this step is doing, but it is necessary for converting metadata from an old format. I've added a comment trying to be as informative as possible.

@mkavulich mkavulich marked this pull request as ready for review February 24, 2022 06:14
@mkavulich
Copy link
Collaborator Author

After addressing some initial concerns (still a few to go) and updating for the most recent commits on main, this PR is ready for a proper review.

There are a few known issues/limitations right now, I think they can be addressed later on but I figured I'd point them out off the bat:

  • The extraneous statements of WARNING: Encountered closing statement "end type" without type name come from the metadata parser, and I believe are due to problems with the metadata itself.
  • The script can only be run from above the top-level framework directory in the ufs-weather-model examples as shown above; I believe this is a limitation specific to the ufs-weather-model ccpp_prebuild_config.py file and hopefully can be eliminated in the future when the config file is no longer needed.

@mkavulich
Copy link
Collaborator Author

@christinaholtNOAA I believe I have resolved or responded to all your comments, please let me know if you have any further comments or questions.

Copy link

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty nice. Just one remaining thing I'm really not sure about.

raise Exception('Call to import_config failed.')

# Variables defined by the host model; this call is necessary because it converts some old
# metadata formats so they can be used later in the script

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at gather_variable_definitions and it seems to be a function that doesn't act on any input data structures in place, or have any side effects. Perhaps you are calling this only for the data checking, but I really don't think that this call is doing anything to metadata that is used downstream, so the comment seems misleading. To have it convert metadata, I think you'd need to save its outputs in a data structure. Perhaps update one of the config entries? I'm not sure how that would work since I haven't dug into the details of the objects being acted on here.

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, I've got a few questions and suggestions ...

with that scheme"""

metadata_dict = {}
scheme_filenames=glob.glob(os.path.join(metapath, "*.meta"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you don't get those from ccpp_prebuild_config.py?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like using glob, and also the schemes may not all sit in the same directory. For example, the gsl folks have a chemistry fork where the chemistry metadata is in a separate subdirectory chemistry in the ccpp-physics repo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding getting the metadata files from ccpp_prebuild_config.py, I was originally hoping I could eliminate the dependency on that file, and allow users to specify whichever metadata path(s) they would like.

Re: glob, is the issue of using glob specifically or that the current design does not allow users to specify multiple directories? If it is the former I will have to re-think how the metadata files are specified, but if it is the latter I can easily put in a loop to allow multiple directories to be specified on the command line. Let me know if you think the second approach is acceptable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. 1: that's ok, even if the path(s) contain schemes that are not in use by this model (i.e. not in the prebuild config), the suites won't have those schemes.

Re. 2. There is a general aversion to glob, more so for cmake than for other tools, but nonetheless. If it can be avoided, fine, but if you need it than that's ok for me. Yes, you will need a capability to have one or more metadata paths if you don't use the prebuild config. Note though that you put the burden on the user to know where all the files are located that may be used in the suite.

success = False
return success

# Call tree of all schemes in SDF (with duplicates and subcycles)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first 20 lines are identical with the first twenty lines of the parse routine. Can this be combined to avoid code duplication?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have an optional (or mandatory) argument create_call_tree for the parse routine, that switches between what is done in the second half of the parse routine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I initially had the naive idea to try to not modify any existing routines so that I wouldn't have to run as many tests, but it makes more sense the way you suggested. I have implemented this change, let me know how it looks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for making the change.

…hod of Suite class with an optional argument to avoid duplicating existing code while not affecting existing calls to parse method.
@climbfuji
Copy link
Collaborator

My main comments have been addressed, so I pass the baton to @gold2718 ...

Copy link
Collaborator

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of questions that would help me understand the code but I don't see anything that should hold up getting it in. Sorry this took so long.

the name of the scheme and the intent of the variable within that scheme"""

# Create a list of tuples that will hold the in/out information for each scheme
var_graph=[]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a pattern where you use spaces around the = symbol and where you omit them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just regular sloppiness :) I've standardized this to use spaces except when in keyword arguments; I believe that's consistent with PEP8


# Loop through call tree, find matching filename for scheme via dictionary schemes_in_files,
# then parse that metadata file to find variable info
partial_matches = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have examples of partial matches? How does tracking them help?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is for when a user inputs something like "latent_heat" as their variable, which matches multiple standard names (e.g. latent_heat_of_vaporization_of_water_at_0c, surface_upward_potential_latent_heat_flux, etc.). This makes it easier for users who might not know the exact standard name of the variable they are looking for, or for something like, for example, any variable containing the word "temperature".

The final section in my PR message ("Partial match for variable") gives an example of this.

@grantfirl
Copy link
Collaborator

@mkavulich Would you please update this to the latest main in anticipation of merging?

@grantfirl grantfirl merged commit 1968d57 into NCAR:main May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants