Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/develop' into feature_2673_sonar…
Browse files Browse the repository at this point in the history
…qube_beta3
  • Loading branch information
Howard Soh committed Feb 6, 2024
2 parents a736073 + 45274a1 commit d9b81e8
Show file tree
Hide file tree
Showing 12 changed files with 152 additions and 63 deletions.
67 changes: 41 additions & 26 deletions docs/Users_Guide/config_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3813,13 +3813,22 @@ Where "job_name" is set to one of the following:

* "filter"

To filter out the STAT or TCMPR lines matching the job filtering
criteria specified below and using the optional arguments below.
To filter out the STAT lines matching the job filtering criteria
specified below and using the optional arguments below.
The output STAT lines are written to the file specified using the
"-dump_row" argument.

Required Args: -dump_row

|
Optional Args:

.. code-block:: none
-set_hdr column_name value
May be used multiple times to override data written to the
output dump_row file.
|
* "summary"

Expand Down Expand Up @@ -3848,8 +3857,8 @@ Where "job_name" is set to one of the following:

* Format the -column option as LINE_TYPE:COLUMN.

|
|
Use the -derive job command option to automatically derive
statistics on the fly from input contingency tables and partial
sums.
Expand All @@ -3875,10 +3884,14 @@ Where "job_name" is set to one of the following:

.. code-block:: none
-by column_name to specify case information
-out_alpha to override default alpha value of 0.05
-derive to derive statistics on the fly
-column_union to summarize multiple columns
-by column_name
To specify case information.
-out_alpha
To override the default alpha value.
-derive
To derive statistics on the fly.
-column_union
To summarize multiple columns.
* "aggregate"

Expand All @@ -3895,8 +3908,8 @@ Where "job_name" is set to one of the following:
ISC, ECNT, RPS, RHIST, PHIST, RELP, SSVAR
Required Args: -line_type
|

|
* "aggregate_stat"

Expand Down Expand Up @@ -3930,8 +3943,8 @@ Where "job_name" is set to one of the following:
.. code-block:: none
-out_thresh or -out_fcst_thresh and -out_obs_thresh
When -out_line_type FHO, CTC, CTS, MCTC, MCTS,
PCT, PSTD, PJC, PRC
When -out_line_type FHO, CTC, CTS, MCTC, MCTS,
PCT, PSTD, PJC, PRC
Additional Optional Args for -line_type MPR:

Expand All @@ -3944,14 +3957,14 @@ Where "job_name" is set to one of the following:
-out_obs_wind_thresh
-out_wind_logic
When -out_line_type WDIR
Additional Optional Arg for:

.. code-block:: none
-line_type ORANK -out_line_type PHIST, SSVAR ...
-out_bin_size
Additional Optional Args for:

.. code-block:: none
Expand All @@ -3960,14 +3973,14 @@ Where "job_name" is set to one of the following:
-out_eclv_points
* "ss_index"

The skill score index job can be configured to compute a weighted
average of skill scores derived from a configurable set of
variables, levels, lead times, and statistics. The skill score
index is computed using two models, a forecast model and a
reference model. For each statistic in the index, a skill score
is computed as:

SS = 1 - (S[model]*S[model])/(S[reference]*S[reference])

Where S is the statistic.
Expand Down Expand Up @@ -4178,17 +4191,19 @@ Where "job_name" is set to one of the following:
"-rank_corr_flag value"
"-vif_flag value"
For aggregate and aggregate_stat job types:

.. code-block:: none
"-out_stat path" to write a .stat output file for the job
including the .stat header columns. Multiple
values for each header column are written as
a comma-separated list.
"-set_hdr col_name value" may be used multiple times to explicity
specify what should be written to the header
columns of the output .stat file.
-out_stat path
To write a .stat output file for aggregate and aggregate_stat jobs
including the .stat header columns. Multiple input values for each
header column are written to the output as a comma-separated list
of unique values.
-set_hdr col_name value
May be used multiple times to explicity specify what should be
written to the header columns of the output .stat file for
aggregate and aggregate_stat jobs or output dump_row file
for filter jobs.
When using the "-by" job command option, you may reference those columns
in the "-set_hdr" job command options. For example, when computing statistics
Expand Down
14 changes: 7 additions & 7 deletions docs/Users_Guide/stat-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -604,7 +604,7 @@ The Stat-Analysis tool supports several additional job command options which may
This job command option is extremely useful. It can be used multiple times to specify a list of STAT header column names. When reading each input line, the Stat-Analysis tool concatenates together the entries in the specified columns and keeps track of the unique cases. It applies the logic defined for that job to each unique subset of data. For example, if your output was run over many different model names and masking regions, specify **-by MODEL,VX_MASK** to get output for each unique combination rather than having to run many very similar jobs.

.. code-block:: none
-column_min col_name value
-column_max col_name value
-column_eq col_name value
Expand All @@ -615,30 +615,30 @@ This job command option is extremely useful. It can be used multiple times to sp
The column filtering options may be used when the **-line_type** has been set to a single value. These options take two arguments, the name of the data column to be used followed by a value, string, or threshold to be applied. If multiple column_min/max/eq/thresh/str options are listed, the job will be performed on their intersection. Each input line is only retained if its value meets the numeric filtering criteria defined, matches one of the strings defined by the **-column_str** option, or does not match any of the string defined by the **-column_str_exc** option. Multiple filtering strings may be listed using commas. Defining thresholds in MET is described in :numref:`config_options`.

.. code-block:: none
-dump_row file
Each analysis job is performed over a subset of the input data. Filtering the input data down to a desired subset is often an iterative process. The **-dump_row** option may be used for each job to specify the name of an output file to which the exact subset of data used for that job will be written. When initially constructing Stat-Analysis jobs, users are strongly encouraged to use the option and check its contents to ensure that the analysis was actually done over the intended subset.

.. code-block:: none
-out_line_type name
This option specifies the desired output line type(s) for the **aggregate_stat** job type.

.. code-block:: none
-out_stat file
-set_hdr col_name string
The Stat-Analysis tool writes its output to either the log file or the file specified using the **-out** command line option. However the **aggregate** and **aggregate_stat** jobs create STAT output lines and the standard output written lacks the full set of STAT header columns. The **-out_stat** job command option may be used for these jobs to specify the name of an output file to which full STAT output lines should be written. When the **-out_stat** job command option is used for **aggregate** and **aggregate_stat** jobs the output is sent to the **-out_stat** file instead of the log or **-out** file.

Jobs will often combine output with multiple entries in the header columns. For example, a job may aggregate output with three different values in the **VX_MASK** column, such as "mask1", "mask2", and "mask3". The output **VX_MASK** column will contain the unique values encountered concatenated together with commas: "mask1,mask2,mask3". Alternatively, the **-set_hdr** option may be used to specify what should be written to the output header columns, such as "-set_hdr VX_MASK all_three_masks".
Jobs will often combine output with multiple entries in the header columns. For example, a job may aggregate output with three different values in the **VX_MASK** column, such as "mask1", "mask2", and "mask3". The output **VX_MASK** column will contain the unique values encountered concatenated together with commas: "mask1,mask2,mask3". Alternatively, the **-set_hdr** option may be used to specify what should be written to the output header columns, such as "-set_hdr VX_MASK all_three_masks". When **-set_hdr** is specified for **filter** jobs, it controls what is written to the **-dump_row** output file.

When using the "-out_stat" option to create a .stat output file and stratifying results using one or more "-by" job command options, those columns may be referenced in the "-set_hdr" option. When using multiple "-by" options, use "CASE" to reference the full case information string:

.. code-block:: none
-job aggregate_stat -line_type MPR -out_line_type CNT -by FCST_VAR,OBS_SID \
-set_hdr VX_MASK OBS_SID -set_hdr DESC CASE
Expand All @@ -662,7 +662,7 @@ When processing input MPR lines, these options may be used to define a masking g
When processing input MPR lines, these options are used to define the forecast, observation, or both thresholds to be applied when computing statistics. For categorical output line types (FHO, CTC, CTS, MCTC, MCTS) these define the categorical thresholds. For continuous output line types (SL1L2, SAL1L2, CNT), these define the continuous filtering thresholds and **-out_cnt_logic** defines how the forecast and observed logic should be combined.

.. code-block:: none
-out_fcst_wind_thresh thresh
-out_obs_wind_thresh thresh
-out_wind_thresh thresh
Expand Down
5 changes: 4 additions & 1 deletion internal/test_unit/R_test/test_util.R
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,10 @@ compareStatLty = function(stat1, stat2, lty, verb=0, strict=0){
# compare the information in the header columns
for(intCol in 2:21){
listMatch = apply(data.frame(dfV1[,intCol], dfV2[,intCol]), 1,
function(a){ a[1] == a[2] });
function(a){
same = (a[1] == a[2]) | (is.na(a[1]) & is.na(a[2]));
same[is.na(same)] = FALSE;
return(same); });
intNumDiff = sum( !listMatch[ !is.na(listMatch) ] );
if( 0 < intNumDiff ){
if( 1 <= verb ){
Expand Down
19 changes: 19 additions & 0 deletions internal/test_unit/xml/unit_python.xml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,25 @@
</output>
</test>

<!-- Invokes Python script that reads in NUMPY text data (convert data value 0.0 to missing value) -->
<test name="python_numpy_plot_data_plane_missing">
<exec>&MET_BIN;/plot_data_plane</exec>
<env>
<pair><name>MET_PYTHON_EXE</name> <value>&MET_PYTHON_EXE;</value></pair>
</env>
<param> \
PYTHON_NUMPY \
&OUTPUT_DIR;/python/letter_numpy_0_to_missing.ps \
'name = "&MET_BASE;/python/examples/read_ascii_numpy.py &DATA_DIR_PYTHON;/letter.txt LETTER 0.0";' \
-plot_range 0.0 255.0 \
-title "Python enabled numpy plot_data_plane" \
-v 1
</param>
<output>
<ps>&OUTPUT_DIR;/python/letter_numpy_0_to_missing.ps</ps>
</output>
</test>

<test name="python_numpy_plot_data_plane_file_type">
<exec>&MET_BIN;/plot_data_plane</exec>
<param> \
Expand Down
1 change: 1 addition & 0 deletions internal/test_unit/xml/unit_stat_analysis_ps.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
-job filter -line_type MPR -fcst_var TMP -fcst_lev Z2 -vx_mask DTC165 \
-column_str OBS_SID KDLN,KDHT,KDEN,KDLS,KDMA,KDMN,KDVT,KDEW \
-column_str_exc OBS_SID KDLN,KDHT \
-set_hdr DESC FILTER_OBS_SID \
-dump_row &OUTPUT_DIR;/stat_analysis_ps/POINT_STAT_FILTER_OBS_SID.stat \
-v 1
</param>
Expand Down
35 changes: 25 additions & 10 deletions scripts/python/examples/read_ascii_numpy.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os
import sys
import numpy
from met.dataplane import dataplane

###########################################
Expand Down Expand Up @@ -67,33 +68,47 @@ def set_dataplane_attrs():
## load the data into the numpy array
##

if len(sys.argv) != 3:
dataplane.quit("read_ascii_numpy.py -> Must specify exactly one input file and a name for the data.")
SCRIPT_NAME = "read_ascii_numpy.py ->"
if len(sys.argv) < 3:
dataplane.quit(f"{SCRIPT_NAME} Must specify exactly one input file and a name for the data.")
elif len(sys.argv) > 4:
dataplane.quit(f"{SCRIPT_NAME} Have not supported arguments [{sys.argv[4:]}]")

# Read the input file as the first argument
input_file = os.path.expandvars(sys.argv[1])
data_name = sys.argv[2]

try:
log("Input File:\t" + repr(input_file))
log("Data Name:\t" + repr(data_name))
user_fill_value = None
try:
if len(sys.argv) > 3:
user_fill_value = float(sys.argv[3])
except:
log(f"{SCRIPT_NAME} Ignored argument {sys.argv[3]}")
pass

log(f"{SCRIPT_NAME} Input File:\t{repr(input_file)}")
log(f"{SCRIPT_NAME} Data Name:\t{repr(data_name)}")
if os.path.exists(input_file):
# read_2d_text_input() reads n by m text data and returns 2D numpy array
met_data = dataplane.read_2d_text_input(input_file)
if met_data is None:
dataplane.quit(f" Fail to build met_data from {input_file}")
dataplane.quit(f"{SCRIPT_NAME} Fail to build met_data from {input_file}")
else:
log("Data Shape:\t" + repr(met_data.shape))
log("Data Type:\t" + repr(met_data.dtype))
log(f"{SCRIPT_NAME} Data Shape:\t{repr(met_data.shape)}")
log(f"{SCRIPT_NAME} Data Type:\t{repr(met_data.dtype)}")
if user_fill_value is not None:
met_data = numpy.ma.masked_values(met_data, user_fill_value)
log(f"{SCRIPT_NAME} Python Type:\t{type(met_data)}")
else:
dataplane.quit(f"input {input_file} does exist!!!")
dataplane.quit(f"{SCRIPT_NAME} input {input_file} does exist!!!")
except:
import traceback
traceback.print_exc()
dataplane.quit(f"Unknown error with {sys.argv[0]}: ")
dataplane.quit(f"{SCRIPT_NAME} Unknown error with {sys.argv[0]}: ")

attrs = set_dataplane_attrs()
log("Attributes:\t" + repr(attrs))
log(f"{SCRIPT_NAME} Attributes:\t{repr(attrs)}")

# Sets fill_value if it exists at the dataplane
#attrs['fill_value'] = 255 # for letter.txt
2 changes: 2 additions & 0 deletions scripts/python/met/dataplane.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ def read_dataplane_json_numpy(tmp_filename):
numpy_dump_name = met_base_tools.get_numpy_filename(tmp_filename)
met_dp_data = np.load(numpy_dump_name)
met_info['met_data'] = met_dp_data
if numpy_dump_name != tmp_filename:
met_base_tools.remove_temp_file(numpy_dump_name)
return met_info

@staticmethod
Expand Down
8 changes: 4 additions & 4 deletions scripts/python/met/logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def append_error_prompt(msg):
return f'{logger.ERROR_P}: {msg}'

@staticmethod
def error_messageg(msg):
def error_message(msg):
msgs = msg if isinstance(msg, list) else [msg]
msgs.insert(0, '')
msgs.append('')
Expand Down Expand Up @@ -62,7 +62,7 @@ def get_met_fill_value(self):
return met_base.MET_FILL_VALUE

def error_msg(self, msg):
logger.error_messageg(msg)
logger.error_message(msg)

def get_prompt(self):
return met_base_tools.get_prompt()
Expand Down Expand Up @@ -116,7 +116,7 @@ def convert_to_array(ndarray_data):
for byte_data in ndarray_data:
array_data.append(byte_data.decode("utf-8").rstrip())
elif isinstance(ndarray_data, (np.ma.MaskedArray, np.ma.core.MaskedArray)):
array_data = np.ma.getdata(ndarray_data, subok=False).tolist()
array_data = ndarray_data.filled(fill_value=-9999).tolist()
elif isinstance(ndarray_data, np.ndarray):
array_data = ndarray_data.tolist()
else:
Expand All @@ -126,7 +126,7 @@ def convert_to_array(ndarray_data):
@staticmethod
def convert_to_ndarray(array_data):
if isinstance(array_data, (np.ma.MaskedArray, np.ma.core.MaskedArray)):
ndarray_data = np.ma.getdata(array_data, subok=False)
ndarray_data = array_data.filled(fill_value=-9999)
elif isinstance(array_data, np.ndarray):
ndarray_data = array_data
else:
Expand Down
4 changes: 2 additions & 2 deletions scripts/python/met/point.py
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ def read_point_data_json_numpy(self, tmp_filename):
self.obs_qty = point_array_list[11]

if numpy_dump_name != tmp_filename:
os.remove(numpy_dump_name)
met_base_tools.remove_temp_file(numpy_dump_name)

def write_point_data(self, tmp_filename):
if met_base_tools.use_netcdf_format():
Expand Down Expand Up @@ -797,7 +797,7 @@ def convert_point_data(point_data, check_all_records=False, input_type='csv'):
csv_point_data.check_csv_point_data(check_all_records)
tmp_point_data = csv_point_data.get_point_data()
else:
met_base.error_messageg(f'convert_point_data(() Not supported input type: {input_type}')
met_base.error_message(f'convert_point_data(() Not supported input type: {input_type}')
return tmp_point_data

def get_empty_point_obs():
Expand Down
Loading

0 comments on commit d9b81e8

Please sign in to comment.