For the STAT-Analysis "filter" job type apply the -set_hdr job commands to the -dump_row output file. #1129

JohnHalleyGotway · 2019-05-22T18:11:00Z

This issue arose when Jonathan ran SWPC data through MET. This resulted in long strings in the MET output columns. The strings were too long for METviewer and caused the loader to fail. To patch the data, Jonathan used sed to shorten the strings. We tried running a STAT-Analysis filter job with the -set_hdr option to do this, but it didn't work. The -set_hdr option currently only applies to the output of the aggregate or aggregate_stat job types.

This task is to enable the -set_hdr option for the filter job type when creating the -dump_row output. BUT it should only apply to the -dump_row output from filter, and not the -dump_row output of the other job types!

JohnHalleyGotway · 2019-05-22T18:48:17Z

Also consider supporting the use of regular expressions for the -set_hdr option throughout stat-analysis.

Currently, the set_hdr value is applied to all of the output for the column. But this make using the "-by" option less useful.

For example, let's say you have data with FCST_VAR = TMP and UGRD, you could define:

-job aggregate -line_type SL1L2 -by FCST_VAR
-set_hdr FCST_VAR TEMPERATURE 'T.'
-set_hdr FCST_VAR UWIND 'UG.'

So the set_hdr options would only be applied when the current string matches the regular expression listed. We'd still need to support the old logic when no regular expression is specified. So the default could be a regular expression of '.*'.

JohnHalleyGotway · 2019-05-22T22:42:12Z

Here's an email to Mallory Row describing the potential for related development:

Mallory,

When used for the aggregate or aggregate_stat job types, the intended purpose of the -dump_row option is for users to be able to see the actual input lines that were used when processing each job. So it's meant as a sanity check to double-check the filtering logic the user defined.

But for the filter job (which writes it output to the -dump_row file), the intention is a little different. For filter, stat_analysis is a fancy form of "grep", enabling the user to slice/dice their data however they'd like.

Just earlier today, we talked about enhancing the filter job type to support the "-set_hdr" option. For example, we have some data with a very long FCST_UNITS string and want to reset that to a shorter string. So we'd like to run a job like this:
stat_analysis -lookin stat_data -job filter -set_hdr FCST_UNITS TEC -dump_row short_units.txt

But this is not currently supported. Here's the GitHub issue for this:
#1129

If -set_hdr is used, this would require STAT-Analysis to actually parse the input lines, update strings, and write it back out. As long as we're parsing the data anyway, we could also consider updating the version number before writing it to the output. And in that step, we would, for example, add FCST_UNITS and OBS_UNITS to the output of the filter job.

It seems to me like using "-dump_row" in both contexts is confusing. Instead, perhaps we should require that the "filter" job use the "-out_stat" job command option to specify its output file?

Would that be a useful solution? Of course, that would only fix .stat output files. There is no "filter" job for MODE or MTD output data.

JohnHalleyGotway · 2019-05-23T19:33:16Z

Mallory confirms that this functionality would be useful. So the changes would be this:

(1) The -dump_row option remains as-is... whatever .stat lines are read as input should be written to the output -dump_row file. If we're writing the first line of output and the first line read is a header line... dump that to the output file. All future header lines should be ignored.

(2) For the -filter job, make the -out_stat command line option required. Regardless of the version of the .stat lines read as input, the output will now be written for the current version number.
Should we buffer all of the lines in one ascii table in memory to get the columns to line up? Maybe that's overkill and isn't worth the extra memory consumption.

TaraJensen · 2019-06-11T22:12:41Z

Charge 277047

JohnHalleyGotway added type: enhancement Improve something that it is currently doing component: application code labels May 22, 2019

JohnHalleyGotway added this to the MET 9.0 milestone May 22, 2019

TaraJensen mentioned this issue Jun 21, 2019

Add set_attr config file options to override the metadata read from input gridded data files. #1020

Closed

JohnHalleyGotway modified the milestones: MET 9.0, MET Future Versions Mar 15, 2020

JohnHalleyGotway modified the milestones: MET Future Versions, MET 9.1 Mar 23, 2020

JohnHalleyGotway modified the milestones: MET 9.1, MET 10.0 Jun 8, 2020

JohnHalleyGotway added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Sep 10, 2020

JohnHalleyGotway added priority: medium Medium Priority and removed priority: high alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels Nov 5, 2020

JohnHalleyGotway modified the milestones: MET 10.0.0, MET 10.1.0 May 10, 2021

JohnHalleyGotway removed the component: application code label Jun 10, 2021

JohnHalleyGotway modified the milestones: MET 10.1.0, Consider for Next Release Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For the STAT-Analysis "filter" job type apply the -set_hdr job commands to the -dump_row output file. #1129

For the STAT-Analysis "filter" job type apply the -set_hdr job commands to the -dump_row output file. #1129

JohnHalleyGotway commented May 22, 2019

JohnHalleyGotway commented May 22, 2019 •

edited

Loading

JohnHalleyGotway commented May 22, 2019

JohnHalleyGotway commented May 23, 2019

TaraJensen commented Jun 11, 2019

For the STAT-Analysis "filter" job type apply the -set_hdr job commands to the -dump_row output file. #1129

For the STAT-Analysis "filter" job type apply the -set_hdr job commands to the -dump_row output file. #1129

Comments

JohnHalleyGotway commented May 22, 2019

JohnHalleyGotway commented May 22, 2019 • edited Loading

JohnHalleyGotway commented May 22, 2019

JohnHalleyGotway commented May 23, 2019

TaraJensen commented Jun 11, 2019

JohnHalleyGotway commented May 22, 2019 •

edited

Loading