-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add set_attr config file options to override the metadata read from input gridded data files. #1020
Comments
On 5/17/2019, Tracy pointed out that this functionality would be useful for AF T&E projects. |
This really is a way of overriding the metadata for gridded input files. And this can be useful for tools which compute .stat output files as well as those that don't, like pcp_combine or regrid_data_plane. Instead of defining how the output columns should be set, just define the metadata, as shown below. This mirrors the "attr" dictionary used for python embedding. And these settings can be parsed into the VarInfoBase class so they apply to all derived gridded data file types: set_attrs = { is_u_wind = boolean; Only the settings the user would like to override need to be defined. The boolean flags at the end are needed to enable python embedding to verify vector winds, as requested by NRL via met-help: |
Need to specifically test to make sure that these changes enable python-embedding to compute VL1L2 and VCNT output. Will need to add logic somewhere to handle... |
On 5/29/2020, NRL wrote met-help with a question about setting the OBS_UNITS column for Point-Stat output: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=95402 Adding the "set_attrs" functionality would enable them to do so since there are no unit strings prescribed for point observations. |
Add METplus issue to test that something like this works with these changes: FCST_VAR1_OPTIONS = set_attrs={units='K'; valid=...;} |
…f the set_attrs dictionary config entries. Still need to add code to parse those entries and update the tests and documentation.
…units to the OBS_UNITS output column instead of writing a constant NA string.
…d spec parsed from the file. Also, update the process_data_plane() function to handle updating the metadata and grid defintions.
…le, parse them as set_attrs.set_name instead of set_name from the set_attrs dictionary. That makes setting up the config file more flexible.
…evel_attr, and units_attr instead of name(), level_name(), and units(). This change should be made everywhere NetCDF output files are written.
…nd units() with calls to name_attr(), level_attr(), and units_attr().
…nits() with calls to name_attr(), level_attr(), and units_attr().
… units() with calls to name_attr(), level_attr(), and units_attr().
…eplace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr().
…el_name(), and units() with calls to name_attr(), level_attr(), and units_attr().
…d units() with calls to name_attr(), level_attr(), and units_attr().
…). The existing changes included *'s and don't actually match the name of the NetCDF variable.
…). The existing changes included *'s and don't actually match the name of the NetCDF variable. (#1389)
In the METplus meeting on 6/25/20, @CPKalb mentioned the need to use this with pcp_combine. Add a unit test which runs pcp_combine with set_attrs.set_valid on the -derive option to report the output valid time as the middle of the averaging period rather than the end which is the default behavior. |
…rning about setting a header column to a null string.
…time issues I found on dakota.
…ng all of these metadata modifiers into a set_attrs dictionary, I'm now parsing them all as individual entries. See #1020 issue comments for more details.
…e user-specified long_name string in the NetCDF matched pairs file from Grid-Stat.
* Per #1020, update the VarInfo class heirarchy to store the contents of the set_attrs dictionary config entries. Still need to add code to parse those entries and update the tests and documentation. * Per #1020, update the VarInfo class to parse and store the set_attrs dictionary members. * Per #1020, update point_stat and ensemble_stat to write the obs_info units to the OBS_UNITS output column instead of writing a constant NA string. * Per #1020, adds attr access functions to the VarInfo class. VarInfo::name_attr() for example returns SetAttrsName, if set, and simply Name otherwise. * Per #1020, this is work in progress. Swapping out calls to shc set_fcst_var, set_obs_var, set_fcst_level, set_obs_level, set_fcst_units, and set_obs_units with calls to the corresponding attrs access function. Still many more updates to the code required. * Per #1020, add support for set_attrs.set_accum to override the accumulation interval read from the data. * Per #1020, add a set_attrs() utility function to update the metadata in a DataPlane object using the contents of a set_attr Dictionary stored in a VarInfo object. * Per #1020, add grid.nxy() utility function to return Nx*Ny instead of always having to type out grid.nx()*grid.ny(). * Per #1020, switch VarInfo::SetAttrsGrid from a ConcatString to a Grid object. And parse that grid in the set_dict() function. The huge drawback here is that the grid may be specified as a named grid or using a grid specification string... but it cannot be defined as the path to a gridded data file. The library dependency logic is just too complex, and I'd need to move a lot of code around to make this work. For now, just leave that feature out. * No real change, just adding a blank line that was missing. * Per #1020, add Met2dDataFile::set_grid() function to override the grid spec parsed from the file. Also, update the process_data_plane() function to handle updating the metadata and grid defintions. * Per #1020, had to add -lvx_color to 5 Makefile.am files to get them linking again. * Per #1020, work on log messages to make the set_attrs parsing more clear. * Per #1020, update logic for how to parse set_attrs entries. For example, parse them as set_attrs.set_name instead of set_name from the set_attrs dictionary. That makes setting up the config file more flexible. * Per #1020, update Grid-Stat to write NetCDF output using name_attr, level_attr, and units_attr instead of name(), level_name(), and units(). This change should be made everywhere NetCDF output files are written. * Per #1020, update the parsing logic to check for embedded whitespace in name, level, and units. * Per #1020, removed unused SetAttrsEnsemble option. * Per #1020, add new test in unit_grid_stat.xml to exercise the set_attrs functionality. * Per #1020, update data/config/README with information about the set_attrs dictionary. * Per #1020, in Ensemble-Stat, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr(). * Per #1020, in Grid-Diag, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr(). * Deleting stale, commented out, development code. * Per #1020, in MODE and MTD replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr(). * Per #1020, in PCP-Combine, Shift-Data-Plane, and Regrid-Data-Plane, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr(). * Per #1020, in TCRMW and Series-Analysis, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr(). * Per #1020, in Wavelet-Stat, replace calls to name(), level_name(), and units() with calls to name_attr(), level_attr(), and units_attr(). * src/tools/core/wavelet_stat/wavelet_stat.cc * src/tools/tc_utils/tc_rmw/tc_rmw.cc * Adding a couple of spaces to unit_grid_stat.xml to be safe. * Per #1020, update call to set_obs_units() in point-stat to avoid a warning about setting a header column to a null string. * Per #1020, changing the order of parsing to see if this fixes the runtime issues I found on dakota. * Per #1020, so this represents a course correction. Rather than grouping all of these metadata modifiers into a set_attrs dictionary, I'm now parsing them all as individual entries. See #1020 issue comments for more details. * Per #1020, respond to PR comments by updating Grid-Stat to include the user-specified long_name string in the NetCDF matched pairs file from Grid-Stat. Co-authored-by: John Halley Gotway <[email protected]>
This idea came up during a met_help exchange with Ying Lin at NCEP:
https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=85872
Here's the idea:
There is currently no way to override what Point-Stat writes to the FCST_VAR output column. There actually is logic for doing this in STAT-Analysis. You can use the "-set_hdr" option to explicitly specify the contents of the output header columns. For example, if you run STAT-Analysis to aggregate data where VX_MASK = EAST and VX_MASK = WEST, the output header column will, by default, be written as a concatenation of the unique input strings: VX_MASK = EAST,WEST. But you can manually override that by setting -set_hdr VX_MASK CONUS. And then EAST,WEST will be replaced by CONUS.
Perhaps we could add a similar config file option for Point-Stat, Grid-Stat, Ensemble-Stat, Wavelet-Stat, MODE, and MTD to do what STAT-Analysis is already doing:
set_hdr_column = [ "FCST_VAR" ];
set_hdr_value = [ "APCP_24" ];
Data is messy and this would give us an option for cleaning it up. [MET-1020] created by johnhg
The text was updated successfully, but these errors were encountered: