-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial version of ccpp_track_variables.py #419
Add initial version of ccpp_track_variables.py #419
Conversation
…ing variable against standard names
…sh out argument parsing routine more
…ree for that Suite: in other words, a list of schemes in the order that they are called, including duplicates and subcycle loops
…etadata converted properly (?)
… of schemes for a given SDF, and a dictionary that associates those schemes with their corresponding .meta files. Last step is to simply parse those .meta files for their variables!
…tadataTable objects for each scheme's .meta file; these objects can then be parsed to get all the information we need for the final step!
…subroutine name, and intent. If partial match, output list of partial matches
…le, variable name, and directory with metadata files, and outputs a list of schemes that use the given variable, along with their intent. Will convert into a more "graph-like" output later, along with other cleanup that is needed before review/testing by the group.
…ive a graphical representation of the calling tree Also some more cleanup: - Change default logging level to WARNING to get rid of unnecessary log messages (debug level remains unchanged) - Change more strings to f strings - Remove more leftover debug printouts - Raise exception if create_metadata_filename_dict fails (esp for no .meta files found) - Remove check_var function (might re-introduce in the future)
…ve duplicate calls
Thank you for this PR. This is a very helpful new addition to the CCPP. Going forward, we would like to have a visualization capability but having the script is the foundational step. |
@ligiabernardet I haven't included any unit or regression tests yet, but I can make that a priority before opening this PR for review. I'll need to familiarize myself with the existing testing framework first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard an awful lot about this tool. Thought I'd pop in and see what it was all about. :)
scripts/ccpp_track_variables.py
Outdated
if not success: | ||
logging.error(f'Parsing suite definition file {sdf} failed.') | ||
success = False | ||
return (success, suite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this type of error handling situation, it could be useful to use a try/except
block to get more information about how the SDF parsing failed instead of only reporting that it didn't work.
This is also a nice example of how one who isn't regularly checking the value of success in the caller of this function could just power through the rest of main
with an un-parsable Suite
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a limitation of how errors are handled in the existing object definitions; since they handle errors through a "success" flag I don't want to try to mess with that, especially since this code will be superseded in the (hopefully) near-future and need updating regardless.
scripts/ccpp_track_variables.py
Outdated
raise Exception('Call to import_config failed.') | ||
|
||
# Variables defined by the host model | ||
(success, _, _) = gather_variable_definitions(config['variable_definition_files'], config['typedefs_new_metadata']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment about this magic could be helpful. I assume given its name, it will update the config
data structure in place, but it's not obvious. You are also opting (I'm assuming intentionally) to not store the output, which throws a wrench in the obvious nature of this call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest, I'm not 100% sure exactly what this step is doing, but it is necessary for converting metadata from an old format. I've added a comment trying to be as informative as possible.
…ctures from capgen
…ppropriate locations
… and intent(inout)
After addressing some initial concerns (still a few to go) and updating for the most recent commits on There are a few known issues/limitations right now, I think they can be addressed later on but I figured I'd point them out off the bat:
|
…from add_argument calls
… remove re-assignment of arguments and just manage them with "args" object
…" function from ccpp_prebuild
@christinaholtNOAA I believe I have resolved or responded to all your comments, please let me know if you have any further comments or questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty nice. Just one remaining thing I'm really not sure about.
raise Exception('Call to import_config failed.') | ||
|
||
# Variables defined by the host model; this call is necessary because it converts some old | ||
# metadata formats so they can be used later in the script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at gather_variable_definitions
and it seems to be a function that doesn't act on any input data structures in place, or have any side effects. Perhaps you are calling this only for the data checking, but I really don't think that this call is doing anything to metadata that is used downstream, so the comment seems misleading. To have it convert metadata, I think you'd need to save its outputs in a data structure. Perhaps update one of the config
entries? I'm not sure how that would work since I haven't dug into the details of the objects being acted on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, I've got a few questions and suggestions ...
scripts/ccpp_track_variables.py
Outdated
with that scheme""" | ||
|
||
metadata_dict = {} | ||
scheme_filenames=glob.glob(os.path.join(metapath, "*.meta")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you don't get those from ccpp_prebuild_config.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like using glob
, and also the schemes may not all sit in the same directory. For example, the gsl folks have a chemistry fork where the chemistry metadata is in a separate subdirectory chemistry
in the ccpp-physics repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding getting the metadata files from ccpp_prebuild_config.py, I was originally hoping I could eliminate the dependency on that file, and allow users to specify whichever metadata path(s) they would like.
Re: glob
, is the issue of using glob
specifically or that the current design does not allow users to specify multiple directories? If it is the former I will have to re-think how the metadata files are specified, but if it is the latter I can easily put in a loop to allow multiple directories to be specified on the command line. Let me know if you think the second approach is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. 1: that's ok, even if the path(s) contain schemes that are not in use by this model (i.e. not in the prebuild config), the suites won't have those schemes.
Re. 2. There is a general aversion to glob, more so for cmake
than for other tools, but nonetheless. If it can be avoided, fine, but if you need it than that's ok for me. Yes, you will need a capability to have one or more metadata paths if you don't use the prebuild config. Note though that you put the burden on the user to know where all the files are located that may be used in the suite.
scripts/mkstatic.py
Outdated
success = False | ||
return success | ||
|
||
# Call tree of all schemes in SDF (with duplicates and subcycles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first 20 lines are identical with the first twenty lines of the parse
routine. Can this be combined to avoid code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have an optional (or mandatory) argument create_call_tree
for the parse
routine, that switches between what is done in the second half of the parse
routine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I initially had the naive idea to try to not modify any existing routines so that I wouldn't have to run as many tests, but it makes more sense the way you suggested. I have implemented this change, let me know how it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for making the change.
…hod of Suite class with an optional argument to avoid duplicating existing code while not affecting existing calls to parse method.
My main comments have been addressed, so I pass the baton to @gold2718 ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of questions that would help me understand the code but I don't see anything that should hold up getting it in. Sorry this took so long.
scripts/ccpp_track_variables.py
Outdated
the name of the scheme and the intent of the variable within that scheme""" | ||
|
||
# Create a list of tuples that will hold the in/out information for each scheme | ||
var_graph=[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a pattern where you use spaces around the =
symbol and where you omit them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just regular sloppiness :) I've standardized this to use spaces except when in keyword arguments; I believe that's consistent with PEP8
|
||
# Loop through call tree, find matching filename for scheme via dictionary schemes_in_files, | ||
# then parse that metadata file to find variable info | ||
partial_matches = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have examples of partial matches? How does tracking them help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is for when a user inputs something like "latent_heat" as their variable, which matches multiple standard names (e.g. latent_heat_of_vaporization_of_water_at_0c
, surface_upward_potential_latent_heat_flux
, etc.). This makes it easier for users who might not know the exact standard name of the variable they are looking for, or for something like, for example, any variable containing the word "temperature".
The final section in my PR message ("Partial match for variable") gives an example of this.
@mkavulich Would you please update this to the latest main in anticipation of merging? |
Initial version of script for tracking which variables are modified by which schemes in a given suite.
This script takes in as input the following arguments:
The script output changes based on the input:
This is a brand new script, so it does not change any existing interfaces. It does add a new variable and method to the Suite class: call_tree is a list of schemes in order (including duplicates) for the given Suite. The method make_call_tree is used to populate the call_tree list; it is not called by the write() method so must be called separately if desired.
Please provide feedback on the format and structure of the program!
I started with roughly the ccpp_prebuild script as a guide, but I am not tied to any particular style or philosophy here.
Testing:
test removed: None
unit tests: None
system tests: None
manual testing: Ran script with a variety of inputs, gives expected output. Examples below:
Running examples
No new python packages or other prerequisites are needed, so you can run on any platform that CCPP can already run on. It is also not necessary to build any code or run prebuild scripts; all you need is the ccpp-framework and ccpp-physics repositories, a model config file (this requirement can hopefully be removed in the future), and the xmls for the suites you wish to analyze.
Here is an example setup based on the ufs-weather-model on Hera:
From here, you can run the following example commands:
Successful output: prints list of schemes that use the specified variable, along with the variable's intent
Unknown variable: script exits with descriptive error message
Partial match for variable: outputs list of partial matches for each scheme