diff --git a/docs/qcforward/grid_statistics.inc b/docs/qcforward/grid_statistics.inc index 87c2a072..65e4c6e0 100644 --- a/docs/qcforward/grid_statistics.inc +++ b/docs/qcforward/grid_statistics.inc @@ -8,6 +8,7 @@ This method checks if property statistics from 3D grids are within user specifie thresholds. If worse than a given set of limits, either a warning is given or a full stop of the workflow is forced. +Both discrete and continous properties are supported. Signature ^^^^^^^^^ @@ -31,10 +32,13 @@ actions property Name of property (either a property icon in RMS, or a file name) - calculation - Name of statistical value to check (optional). Default option is "Avg" for continous properties, - while other valid options are "Min, Max and Stddev". Default option for discrete properties is "Percent". + codename + The discrete property code name to check value for (optional). + .. note:: A codename is only needed for discrete properties + calculation + Name of statistical value to check (optional). Default option is "Avg", + while other valid options for continous properties are "Min", "Max" and "Stddev". selectors A dictionary of conditions to extract statistics from. e.g. a specific zone and/or region (optional). @@ -55,11 +59,15 @@ actions For example ``[0.05, 0.35]`` will give a warning if the statistic is < than 0.05 and > than 0.35. + .. note:: For discrete properties the statistical value will be reported in fractions. + warn_outside Same as warn_outside key above, but instead defines when to give a warning (optional). description A string to describe each action (optional). + + Optional fields @@ -85,8 +93,8 @@ nametag Examples ~~~~~~~~ -Example when executed inside RMS (basic): -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Example when executed inside RMS (continous properties - basic): +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python @@ -122,9 +130,47 @@ Example when executed inside RMS (basic): check() +Example when executed inside RMS (discrete properties - basic): +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: python + + from fmu.tools import qcforward as qcf + + # Check average grid statistics for porosity and permeability + + GRIDNAME = "SimGrid" + REPORT = "somefile.csv" + ACTIONS = [ + { + "property": "Facies", + "codename": "Sand", + "selectors": {"Zone": "Top_Zone"}, + "stop_outside": [0.4, 0.8], + }, + { + "property": "Facies", + "codename": "Sand", + "selectors": {"Zone": "Mid_Zone"}, + "stop_outside": [0.2, 0.5], + }, + ] + + def check(): + + usedata = { + "grid": GRIDNAME, + "actions": ACTIONS, + "report": REPORT, + } + + qcf.grid_statistics(usedata, project=project) + + if __name__ == "__main__": + check() -Example when executed inside RMS (more settings): -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Example when executed inside RMS (continous properties - more settings): +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python @@ -149,7 +195,7 @@ Example when executed inside RMS (more settings): actions.append( { "property": "PORO", - "selectors": {"ZONE": zone}, + "selectors": {"Zone": zone}, "filters": {"FLUID": {"include": ["Gas", "Oil"]}}, "stop_outside": limits, }, @@ -157,7 +203,6 @@ Example when executed inside RMS (more settings): usedata = { "nametag": "MYDATA1", - "path": PATH, "grid": GRIDNAME, "report": REPORT, "actions": actions, @@ -233,13 +278,12 @@ The YAML file may in case look like: warn_outside: [0.18, 0.25] - property: PORO selectors: - ZONE: Top_Zone + Zone: Top_Zone filters: REGION: exclude: ["Surroundings"] stop_outside: [0, 1] warn_outside: [0.18, 0.25] - path: ../input/qc_files/ report: somefile.csv nametag: QC_PORO verbosity: info diff --git a/docs/qcproperties.rst b/docs/qcproperties.rst index 25adea74..3bbae190 100644 --- a/docs/qcproperties.rst +++ b/docs/qcproperties.rst @@ -4,9 +4,11 @@ The qcproperties class The ``qcproperties`` class provides a set of methods for extracting property statistics from 3D Grids, Raw and Blocked wells. +Statistics can be extracted for both continous and discrete properties. Dependent on the +property type different statistics are calculated. The property type is auto-detected. + If several methods of statistics extraction has been run within the instance, -a merged dataframe is available through the 'dataframe' property or the 'dataframe_disc' -property for continous properties and discrete properties respectively. +a merged dataframe is available through the 'dataframe' property. The methods for statistics extraction can be run individually, or a yaml-configuration file can be used to enable an automatic run of the methods. @@ -30,7 +32,9 @@ Arguments for the methods are similar and described in section below. * ``from_yaml``: Use a yaml-configuration file to enable an automatic run of the methods above. -All methods returns a PropStat instance (see decription further down). +All methods returns a Pandas DataFrame for the run in question, if several methods of statistics +extraction has been run within the instance a merged dataframe is available through the +'dataframe' property .. seealso:: The `Using yaml input for auto execution` section for description of how to use a yaml-configuration file to run the different methods automatically. @@ -43,20 +47,11 @@ have been run within the QCProperties instance. dataframe A merged dataframe with statistical data for **continous** properties from all - runs of statistics extractions within the instance . - -dataframe_disc - A merged dataframe with statistical data for **discrete** propertiesfrom all - runs of statistics extractions within the instance . + runs of statistics extractions within the instance. to_csv - Used to write the dataframes with statistics to a csv-file. Takes two arguments: - + Used to write the dataframe with statistics to a csv-file. Takes one arguments: ``csvfile``: String with desired filename (required). - - ``disc``: Bool that controls which dataframe to write (optional). If True the - dataframe with discrete properties is written else the dataframe with continous - properties is written. Default is False. Arguments @@ -67,8 +62,6 @@ different for the three methods, and for the two run environments (inside/outsid **Input arguments:** * ``data``: The input data as a Python dictionary (required). See valid keys below. -* ``reuse``: Bool to define if XTGeo data should be reused in the instance (optional). - Default is True. Turning this off will impact the performance. * ``project``: Required for usage inside RMS @@ -76,8 +69,8 @@ different for the three methods, and for the two run environments (inside/outsid Method specific fields: grid - Name of grid icon in RMS, or name of grid file if run outside RMS. Required with the - ``get_grid_statistics`` method. + Name of grid icon in RMS, or name of grid file if run outside RMS. Required with the + ``get_grid_statistics`` method. wells Required with the ``get_well_statistics`` and the ``get_bwell_statistics`` methods. @@ -87,7 +80,7 @@ Method specific fields: **get_well_statistics**: - ``names``: List of wellnames (optional). Default is all wells. + ``names``: List of wellnames (optional). Default is all wells. ``logrun``: Name of logrun. ``trajectory``: Name of trajectory. @@ -140,24 +133,27 @@ Common fields: values, as it is the name that are used to group the data. filters - Dictionary with additional filter (optional). - - Only discrete parameters are supported. A selector can be input as a filter, this will - override any existing filters specified directly on the selector. + Dictionary with additional filters (optional). + The key is the name (or path) to the filter parameter / log, and the - value is a dictionary with one of two options: + value is a dictionary with options: - ``include``: List of values to include (optional) + ``include``: List of values to include for discrete parameters - ``exclude``: List of values to exclude (optional) + ``exclude``: List of values to exclude for discrete parameters - ``pfile``: Name (or path) to file containing the parameter e.g. INIT file (optional) + ``range``: List with two entries, defining minimum and maximum values to use for continous parameters + + ``pfile``: Name (or path) to file containing the parameter e.g. INIT file + + .. note:: If a selector or property is input as a filter, this will override any existing filters + specified directly on the selector/property. .. seealso:: Option ``"multiple_filters"`` below which can be used to extract statistics multiple times with different filters. multiple_filters - Option for extract statistics multiple times with different filters (optional). + Option that can be used to extract statistics multiple times with different filters (optional). The input is a dictionariy where the keys are the "name" (ID string) for the dataset, and the value is the dictionary of filters (Same format as ``filters`` above) @@ -184,51 +180,17 @@ Common fields: * For **grid statistics** default is the `gridname` * For **blocked wells statistics** default is the `name of the blocked wells object` if inside - RMS and `blocked_wells` if outside + RMS and `bwells` if outside * For **well statistics** default is `wells` name ID string for the dataset (optional). Recommended, if not given it will be set equal to the source string. - csvfile - Path to output csvfile (optional). A csv-file will only be written, if argument is provided. - verbosity Level of output while running None, "info" or "debug", default is None. (optional) -The returned PropStat instance -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -All methods above returns a PropStat instance, with different properties: - -dataframe - The dataframe with statistical data for continous properties. - -dataframe_disc - The dataframe with statistical data for discrete properties. - -property_dataframe - The full dataframe for the properties which is used as input to the statistical - aggregations. Note: If filters are used as input, this dataframe will be the filtered. - -get_value - Method to retrive a statistical value from either of the two the property statistics - dataframes (dependent on the property type, discrete vs continous) - - Arguments are: - - ``prop``: String whith the property name (Required) - - ``conditions``: A dictionary with selector conditions to look up value for, - e.g {"REGION": "EAST", "ZONE": "TOP_ZONE"}. If no conditions are given, the - value for the total will be returned. - - ``calculation``: String with name of column to retrieve value from. "Avg" is the - default for continous properties, "Percent" for discrete. - - ``codename``: Codename to select for discrete properties (Required if dicrete property) - Examples ^^^^^^^^^ @@ -236,7 +198,7 @@ Examples get_grid_statistics examples """""""""""""""""""""""""""""""" -**Example in RMS (basic):** +**Example in RMS (continous properties - basic):** Example extracting statistics for porosity and permeability for each zone and facies. Result is written to csv. @@ -246,35 +208,36 @@ Result is written to csv. from fmu.tools import QCProperties GRID = "GeoGrid" - PROPERTIES = ["PORO", "PERM"] - SELECTORS = ["ZONE", "FACIES"] + PROPERTIES = ["Poro", "Perm"] + SELECTORS = ["Zone", "Facies"] REPORT = "../output/qc/somefile.csv" - usedata = { - "properties": PROPERTIES, - "selectors": SELECTORS, - "grid": GRID, - "csvfile": REPORT, - } - - def check(): + def extract_statistics(): qcp = QCProperties() + + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "grid": GRID, + "verbosity": 1, + } qcp.get_grid_statistics(data=usedata, project=project) + qcp.to_csv(REPORT) if __name__ == "__main__": - check() + extract_statistics() + print("Done") -**Example in RMS (more settings):** +**Example in RMS (continous properties - more settings):** -Example extracting statistics for porosity and facies for each region. Filters +Example extracting statistics for porosity per region. Filters are used to extract statistics for HC zone and Water zone separately. Statistics will be combined for regions with code values 2 and 3. -Both properties are weighted on a Total_Bulk parameter. +Both properties are weighted on a Total_Bulk parameter. +Result is written to csv. -The result is written out in two csv-files, one with statistics of percentages for -the discrete facies parameter, and one with regular statistics for the continous porosity parameter. .. code-block:: python @@ -283,7 +246,6 @@ the discrete facies parameter, and one with regular statistics for the continous GRID = "GeoGrid" PROPERTIES = { "PORO": {"name": "PHIT", "weight": "Total_Bulk"}, - "FACIES": {"name": "Facies", "weight": "Total_Bulk"}, } SELECTORS = { "REGION": { @@ -292,8 +254,7 @@ the discrete facies parameter, and one with regular statistics for the continous "codes": {2: "NS", 3: "NS",}, } } - REPORT_CONT = "../output/qc/continous_stats.csv" - REPORT_DISC = "../output/qc/discrete_stats.csv" + REPORT = "../output/qc/continous_stats.csv" FLUID_FILTERS = { "HC_zone": {"Fluid": {"include": ["oil", "gas"]}}, @@ -309,20 +270,56 @@ the discrete facies parameter, and one with regular statistics for the continous "selectors": SELECTORS, "grid": GRID, "multiple_filters": FLUID_FILTERS, + "verbosity": 1, } qcp.get_grid_statistics(data=usedata, project=project) - - qcp.to_csv(REPORT_CONT) - qcp.to_csv(REPORT_DISC, disc=True) + qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() + print("Done") .. note:: The code is executed twice, filtering on the HC-zone first then the water-zone in a second run. Alternatively the fluid parameter could have been used as a selector, for extracting statistics in one run. +**Example in RMS (discrete properties):** + +Example extracting statistics for a discrete facies parameter for each region. +The facies parameter are weighted on a Total_Bulk parameter. + +The result is written out to csv. + +.. code-block:: python + + from fmu.tools import QCProperties + + GRID = "GeoGrid" + PROPERTIES = { + "FACIES": {"name": "Facies", "weight": "Total_Bulk"}, + } + SELECTORS = ["Regions"] + + REPORT = "../output/qc/discrete_stats.csv" + + def extract_statistics(): + + qcp = QCProperties() + + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "grid": GRID, + "verbosity": 1, + } + + qcp.get_grid_statistics(data=usedata, project=project) + qcp.to_csv(REPORT) + + if __name__ == "__main__": + extract_statistics() + print("Done") **Example when executed from files:** @@ -344,21 +341,59 @@ the discrete facies parameter, and one with regular statistics for the continous } REPORT = "../output/qc/somefile.csv" - usedata = { - "properties": PROPERTIES, - "selectors": SELECTORS, - "path": PATH, - "grid": GRID, - "name": "MYDATA", + def extract_statistics(): + + qcp = QCProperties() + + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "path": PATH, + "grid": GRID, + "name": "MYDATA", + } + + qcp.get_grid_statistics(data=usedata) + qcp.to_csv(REPORT) + + if __name__ == "__main__": + extract_statistics() + +**Example when executed from file using Eclipse INIT-file as input:** + +.. code-block:: python + + from fmu.tools import QCProperties + + PATH = "../input/qc/" + GRID = "ECLIPSE.EGRID" + PROPERTIES = {"PERMX": {"name": "PERMX", "pfile": "ECLIPSE.INIT"}} + SELECTORS = { + "FIPNUM": { + "name": "FIPNUM", + "pfile": "ECLIPSE.INIT" + }, } + REPORT = "../output/qc/somefile.csv" - def check(): + def extract_statistics(): qcp = QCProperties() + + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "path": PATH, + "grid": GRID, + "name": "from_eclipse", + } + qcp.get_grid_statistics(data=usedata) + qcp.to_csv(REPORT) if __name__ == "__main__": - check() + extract_statistics() + get_well_statistics examples @@ -367,7 +402,8 @@ get_well_statistics examples **Example in RMS:** Example extracting statistics for permeability for each zone and facies. -All wells starting with 34_10-A or 34_10-B will be included in statistics. +All wells starting with 33_10 and all 34_11 wells containing "A" will be included in statistics. +Note the use of python regular expressions! Result is written to csv. .. code-block:: python @@ -375,7 +411,7 @@ Result is written to csv. from fmu.tools import QCProperties WELLS = { - "names": ["34_10-A.*$", "34_10-B.*$"], + "names": ["33_10.*", "34_11-.*A.*"], "logrun": "log", "trajectory": "Drilled trajectory", } @@ -383,22 +419,22 @@ Result is written to csv. SELECTORS = ["Zonelog", "Facies_log"] REPORT = "../output/qc/somefile.csv" - usedata = { - "properties": PROPERTIES, - "selectors": SELECTORS, - "wells": WELLS, - "csvfile": REPORT, - } - - def check(): + def extract_statistics(): qcp = QCProperties() - qcp.get_well_statistics(data=usedata, project=project) + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "wells": WELLS, + } + + qcp.get_well_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": - check() + extract_statistics() + print("Done") **Example when executed from files:** @@ -412,34 +448,36 @@ Result is written to csv. from fmu.tools import QCProperties - WELLS = ["34_10-A.*$"] + WELLS = ["34_10-A.*"] PATH = "../input/qc/" PROPERTIES = ["Phit", "Klogh"] SELECTORS = ["Zonelog", "Facies_log"] REPORT = "../output/qc/somefile.csv" - usedata = { - "properties": PROPERTIES, - "selectors": SELECTORS, - "wells": WELLS, - "path": PATH, - "name": "A-wells", - } - - def check(): + def extract_statistics(): qcp = QCProperties() + + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "wells": WELLS, + "path": PATH, + "name": "A-wells", + } + qcp.get_well_statistics(data=usedata) usedata2 = usedata.copy() - usedata2["wells"] = ["34_10-B.*$"] + usedata2["wells"] = ["34_10-B.*"] usedata2["name"] = "B-wells" + qcp.get_grid_statistics(data=usedata2, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": - check() + extract_statistics() get_bwell_statistics examples """""""""""""""""""""""""""""""" @@ -462,22 +500,23 @@ Result is written to csv. SELECTORS = ["Zonelog", "Facies_log"] REPORT = "../output/qc/somefile.csv" - usedata = { - "properties": PROPERTIES, - "selectors": SELECTORS, - "wells": WELLS, - "csvfile": REPORT, - } - def extract_statistics(): qcp = QCProperties() - qcp.get_bwell_statistics(data=usedata, project=project) + usedata = { + "properties": PROPERTIES, + "selectors": SELECTORS, + "wells": WELLS, + "csvfile": REPORT, + } + + qcp.get_bwell_statistics(data=usedata, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": extract_statistics() + print("Done") **Example when executed from files:** @@ -523,28 +562,26 @@ Several steps have been to ensure consistency between the sources, making the re from fmu.tools import QCProperties - REPORT = "somefile.csv" + REPORT = "../output/qc/somefile.csv" GEOGRIDDATA = { "properties": ["Poro", "Perm"], "selectors": {"ZONE": {"name":"Zone"}}, - "grid": "Geogrid", + "grid": "GeoGrid", } SIMGRIDDATA = { - "properties": {"Poro":{"name":"PORO"}, "Perm":{"name":"PERMX"}}, + "properties": {"Poro": {"name":"PORO"}, "Perm": {"name":"PERMX"}}, "selectors": {"ZONE": {"name":"Zone"}}, - "grid": "Simgrid", + "grid": "SimGrid", } BWDATA = { - "properties": ["Poro", "Perm"], - "selectors": {"ZONE": {"name":"Zonelog", "codes":{1:"UpperReek", 2:"MidReek", 3:"LowerReek"}, "exclude":["Above_TopUpperReek", "Below_BaseLowerReek"]}}, + "properties": {"Poro": {"name": "Phit"}, "Perm": {"name": "Klogh"}}, + "selectors": {"ZONE": {"name": "Zonelog", "codes": {1:"UpperReek", 2:"MidReek", 3:"LowerReek"}, "exclude": ["Above_TopUpperReek", "Below_BaseLowerReek"]}}, "wells": {"bwname": "BW", "grid": "Geogrid"}, } - WDATA = { - "properties": ["Poro"], - "selectors": {"ZONE": {"name":"Zonelog", "codes":{1:"UpperReek", 2:"MidReek", 3:"LowerReek"}, "exclude":["Above_TopUpperReek", "Below_BaseLowerReek"]}}, - "wells": {"logrun": "log", "trajectory": "Drilled trajectory"}, - } + + WDATA = BWDATA.copy() + WDATA["wells"] = {"logrun": "log", "trajectory": "Drilled trajectory"} def extract_statistics(): @@ -560,6 +597,7 @@ Several steps have been to ensure consistency between the sources, making the re if __name__ == "__main__": extract_statistics() +.. seealso:: The section below for example of using the same configuration but with yaml-input. Using yaml input for auto execution @@ -578,7 +616,7 @@ element. * ``wells``: the get_well_statistics method are executed on elements in this level -* ``blocked_wells``: the get_bwell_statistics method are executed on elements in this level +* ``blockedwells``: the get_bwell_statistics method are executed on elements in this level Example in RMS with setting from a YAML file: @@ -595,13 +633,13 @@ dataframe are written to csv. YAML_PATH = "../input/qc/somefile.yml" REPORT = "../output/qc/somefile.csv" - def check(): + def extract_statistics(): qcp = QCProperties() qcp.from_yaml(YAML_PATH, project=project) qcp.to_csv(REPORT) if __name__ == "__main__": - check() + extract_statistics() The YAML file may in case look like: @@ -611,59 +649,62 @@ The YAML file may in case look like: grid: - grid: GeoGrid properties: - PORO: - name: PHIT - PERM: - name: KLOGH + - Poro + - Perm selectors: ZONE: name: Zone - FACIES: - name: Facies - grid: SimGrid properties: - PORO: + Poro: name: PORO - PERM: + Perm: name: PERMX selectors: ZONE: name: Zone - FACIES: - name: Facies - + wells: - wells: logrun: log - names: [34_10-A.*$] trajectory: Drilled trajectory properties: - PORO: + Poro: name: Phit - PERM: + Perm: name: Klogh selectors: ZONE: name: Zonelog - FACIES: - name: Facies_log + codes: + 1: UpperReek + 2: MidReek + 3: LowerReek + exclude: + - Above_TopUpperReek + - Below_BaseLowerReek blockedwells: - wells: grid: GeoGrid - names: [34_10-A.*$] bwname: BW properties: - PORO: + Poro: name: Phit - PERM: + Perm: name: Klogh selectors: ZONE: name: Zonelog - FACIES: - name: Facies_log + codes: + 1: UpperReek + 2: MidReek + 3: LowerReek + exclude: + - Above_TopUpperReek + - Below_BaseLowerReek + Additional Notes @@ -674,9 +715,6 @@ Advice on performance There are several settings that has an influence perfomance: -* Keep the option ``reuse = True`` to avoid reloading data to XTGeo if it is previously used, - e.g. extracting statistics from the same grid but with different filters. - * Filters can be used to remove unnecessary data, this will limit the input data before statistics is calculated and will speed up execution. diff --git a/src/fmu/tools/qcforward/_grid_statistics.py b/src/fmu/tools/qcforward/_grid_statistics.py index 641be800..4196a6e3 100644 --- a/src/fmu/tools/qcforward/_grid_statistics.py +++ b/src/fmu/tools/qcforward/_grid_statistics.py @@ -2,11 +2,11 @@ This private module in qcforward is used for grid statistics """ from typing import Union -import sys import collections from pathlib import Path import json from jsonschema import validate +import pandas as pd import fmu.tools from fmu.tools.qcproperties.qcproperties import QCProperties @@ -37,7 +37,7 @@ def run( self, data: dict, project: Union[object, str] = None, - ): + ) -> tuple: """Main routine for evaulating if statistics from 3D grids is within user specified thresholds. @@ -63,33 +63,35 @@ def run( self.ldata = _LocalData() self.ldata.parse_data(data) + dfr = self.check_gridstatistics(project, data) + QCC.print_debug(f"Results: \n{dfr}") + + self.evaluate_qcreport(dfr, "grid statistics") + + def check_gridstatistics(self, project, data): + """ + Extract statistics per action and check if property value is + within user specified limits. + + Returns a dataframe with results + """ + qcp = QCProperties() results = [] QCC.print_info("Checking status for items in actions...") for action in self.ldata.actions: - # extract parameters from actions and compute statistics + # extract parameters from actions data_upd = self._extract_parameters_from_action(data, action) - stat = qcp.get_grid_statistics( - project=project, data=data_upd, reuse=True, qcdata=self.gdata - ) - selectors = ( - list(action.get("selectors").values()) - if "selectors" in action - else None - ) - # Extract mean value if calculation is not given - if "calculation" in action: - calculation = action["calculation"] - else: - calculation = "Avg" if action.get("codename") is None else "Percent" + + selectors, calculation = self._get_selecors_and_calculation(action) + + # Create datframe with statistics + dframe = qcp.get_grid_statistics(project=project, data=data_upd) # Get value from statistics for given property and selectors - value = stat.get_value( - action["property"], - conditions=action.get("selectors"), - calculation=calculation, - codename=action.get("codename"), + value = self._get_statistical_value( + dframe, action["property"], calculation, selectors ) status = "OK" @@ -101,7 +103,7 @@ def run( result = collections.OrderedDict() result["PROPERTY"] = action["property"] - result["SELECTORS"] = f"{selectors}" + result["SELECTORS"] = f"{list(selectors.values())}" result["FILTERS"] = "yes" if "filters" in action else "no" result["CALCULATION"] = calculation result["VALUE"] = value @@ -118,22 +120,10 @@ def run( results, reportfile=self.ldata.reportfile, nametag=self.ldata.nametag ) - if len(dfr[dfr["STATUS"] == "WARN"]) > 0: - print(dfr[dfr["STATUS"] == "WARN"]) - - if len(dfr[dfr["STATUS"] == "STOP"]) > 0: - print(dfr[dfr["STATUS"] == "STOP"], file=sys.stderr) - msg = "One or more actions has status = STOP" - QCC.force_stop(msg) - - print( - "\n== QC forward check {} ({}) finished ==".format( - self.__class__.__name__, self.ldata.nametag - ) - ) + return dfr @staticmethod - def _validate_input(data): + def _validate_input(data: dict): """Validate data against JSON schemas""" # TODO: complete JSON files @@ -149,9 +139,12 @@ def _validate_input(data): validate(instance=data, schema=schema) - def _extract_parameters_from_action(self, data, action): - # Function to extract property and selector data from actions - # and convert to desired input format for QCProperties + @staticmethod + def _extract_parameters_from_action(data: dict, action: dict) -> dict: + """ + Extract property and selector data from actions + and convert to desired input format for QCProperties + """ data = data.copy() properties = {} @@ -175,3 +168,39 @@ def _extract_parameters_from_action(self, data, action): data["filters"] = filters return data + + @staticmethod + def _get_selecors_and_calculation(action: dict) -> tuple: + """ + Get selectors and selected calculation from the action. + If a discrete property has been input it is added to the selctors. + If calculation is not specified a default is set. + """ + selectors = action.get("selectors", {}) + if "codename" in action: + selectors.update({action["property"]: action["codename"]}) + calculation = "Avg" if "calculation" not in action else action["calculation"] + return selectors, calculation + + @staticmethod + def _get_statistical_value( + dframe: pd.DataFrame, + prop: str, + calculation: str, + selectors: dict = None, + ) -> float: + """ + Retrive statistical value from the property statistic dataframe + """ + + dframe = dframe[dframe["PROPERTY"] == prop].copy() + + if selectors is not None: + for selector, value in selectors.items(): + dframe = dframe[dframe[selector] == value] + + if len(dframe.index) > 1: + print(dframe) + raise Exception("Ambiguous result, multiple rows meet conditions") + + return dframe.iloc[0][calculation] diff --git a/src/fmu/tools/qcproperties/_aggregate_df.py b/src/fmu/tools/qcproperties/_aggregate_df.py new file mode 100644 index 00000000..4d4aa2bc --- /dev/null +++ b/src/fmu/tools/qcproperties/_aggregate_df.py @@ -0,0 +1,203 @@ +"""Module containing .... """ +import pandas as pd +import numpy as np + +from fmu.tools._common import _QCCommon + +from fmu.tools.qcproperties._utils import list_combinations + +QCC = _QCCommon() + + +class PropertyAggregation: + """ + Class for extracting statistics from a property dataframe. + Statistics for multiple properties can be calculated simultaneosly. The + aggregation methods and statistics are based on the property type. + Selectors can be used to extract statistics per value in discrete properties. + """ + + def __init__( + self, + props2df, + ): + + """Initiate instance""" + self._property_dataframe = props2df.dataframe # dataframe with properties + self._controls = props2df.aggregation_controls + self._controls["property_type"] = props2df.property_type + self._dataframe = pd.DataFrame() # dataframe with statistics + + QCC.verbosity = self._controls["verbosity"] + + # Create list with groups for pandas groupby + selector_combo_list = ( + list_combinations(self._controls["selectors"]) + if self._controls["selector_combos"] + else [self._controls["selectors"]] + ) + + # Compute dataframe with statistics + QCC.print_info( + f"Calculating statistics for {self._controls['property_type']} properties" + ) + self._dataframe = ( + self._calculate_continous_statistics(selector_combo_list) + if self._controls["property_type"] == "CONT" + else self._calculate_discrete_fractions(selector_combo_list) + ) + + # ================================================================================== + # Class properties + # ================================================================================== + + @property + def dataframe(self): + """Returns the dataframe with property statistics.""" + return self._dataframe + + @property + def controls(self): + """Attribute with data used for statistics aggregation.""" + return self._controls + + # ================================================================================== + # Hidden class methods + # ================================================================================== + + def _disc_aggregations(self): + """Statistical aggregations to extract from continous data""" + return [ + ("Count", "count"), + ( + "Sum_Weight", + lambda x: np.sum(x) + if x.name in list(self._controls["weights"].values()) + else np.nan, + ), + ] + + def _cont_aggregations(self, dframe=None): + """Statistical aggregations to extract from discrete data""" + return [ + ("Avg", np.mean), + ("Stddev", np.std), + ("P10", lambda x: np.nanpercentile(x, q=10)), + ("P90", lambda x: np.nanpercentile(x, q=90)), + ("Min", np.min), + ("Max", np.max), + ( + "Avg_Weighted", + lambda x: np.average( + x.dropna(), + weights=dframe.loc[ + x.dropna().index, self._controls["weights"][x.name] + ], + ) + if x.name in self._controls["weights"] + else np.nan, + ), + ("Count", "count"), + ] + + def _calculate_continous_statistics(self, selector_combo_list): + """ + Calculate statistics for continous properties. + Returns a pandas dataframe. + """ + + # Extract statistics for combinations of selectors + dfs = [] + groups = [] + for combo in selector_combo_list: + group = self._property_dataframe.dropna(subset=combo).groupby(combo) + groups.append(group) + + df_group = ( + group[self._controls["properties"]] + .agg(self._cont_aggregations(dframe=self._property_dataframe)) + .stack(0) + .rename_axis(combo + ["PROPERTY"]) + .reset_index() + ) + dfs.append(df_group) + + # Extract statistics for the total + group_total = self._property_dataframe.dropna( + subset=self._controls["selectors"] + ).groupby(lambda x: "Total") + df_group = ( + group_total[self._controls["properties"]] + .agg(self._cont_aggregations(dframe=self._property_dataframe)) + .stack(0) + .reset_index(level=0, drop=True) + .rename_axis(["PROPERTY"]) + .reset_index() + ) + dfs.append(df_group) + dframe = pd.concat(dfs) + + # Empty values in selectors is filled with "Total" + dframe[self._controls["selectors"]] = dframe[ + self._controls["selectors"] + ].fillna("Total") + + return dframe + + def _calculate_discrete_fractions(self, selector_combo_list): + """ + Calculate fraction statistics for discrete properties. A Weighted + fraction is calculated for each property where a weight is specified + Returns a pandas dataframe. + """ + + # Extract statistics for combinations of selectors + combo_list = selector_combo_list + dfs = [] + for prop in self._controls["properties"]: + if prop not in self._controls["selectors"]: + combo_list = [x + [prop] for x in selector_combo_list] + combo_list.append([prop]) + self._controls["selectors"].append(prop) + + select = ( + self._controls["weights"][prop] + if prop in self._controls["weights"] + else prop + ) + + for combo in combo_list: + df_prop = self._property_dataframe.dropna(subset=combo).copy() + df_group = ( + df_prop.groupby(combo)[select] + .agg(self._disc_aggregations()) + .reset_index() + .assign(PROPERTY=prop) + ) + + for col, name in { + "Avg_Weighted": "Sum_Weight", + "Avg": "Count", + }.items(): + df_group[f"Total_{name}"] = ( + df_group.groupby([x for x in combo if x != prop])[ + name + ].transform(lambda x: x.sum()) + if combo != [prop] + else df_group[name].sum() + ) + df_group[col] = df_group[name] / df_group[f"Total_{name}"] + + df_group = df_group.drop( + columns=["Total_Sum_Weight", "Total_Count", "Sum_Weight"] + ) + dfs.append(df_group) + + dframe = pd.concat(dfs) + + # Empty values in selectors is filled with "Total" + dframe[self._controls["selectors"]] = dframe[ + self._controls["selectors"] + ].fillna("Total") + + return dframe diff --git a/src/fmu/tools/qcproperties/_combine_propstats.py b/src/fmu/tools/qcproperties/_combine_propstats.py deleted file mode 100644 index 9480bea1..00000000 --- a/src/fmu/tools/qcproperties/_combine_propstats.py +++ /dev/null @@ -1,74 +0,0 @@ -import pandas as pd -from fmu.tools._common import _QCCommon - -QCC = _QCCommon() - - -def combine_property_statistics( - propstats: list, discrete=False, verbosity=0 -) -> pd.DataFrame: - """ - Combine property dataframes from each PropStat() instance in one dataframe - """ - QCC.verbosity = verbosity - - dfs = [] - _check_for_duplicate_names(propstats) - all_selectors = _check_consistency_in_selectors(propstats) - - for pstat in propstats: - dframe = pstat.dataframe if not discrete else pstat.dataframe_disc - dframe["ID"] = pstat.name - dfs.append(dframe) - - dframe = pd.concat(dfs) - # fill NaN with "Total" for PropStat()'s with missing selectors - dframe[all_selectors] = dframe[all_selectors].fillna("Total") - - return dframe - - -def _check_for_duplicate_names(propstats): - """ - Check if PropStat() instances have similar names, adjust - names by adding a number to get them unique. - """ - names = [] - - for pstat in propstats: - pstat.name = pstat.name if pstat.name is not None else pstat.source - - if pstat.name in names: - count = len([x for x in names if x.startswith(pstat.name)]) - newname = f"{pstat.name}_{count+1}" - QCC.print_info( - f"Name {pstat.name} already in use, changing name to {newname}" - ) - pstat.name = newname - names.append(pstat.name) - - -def _check_consistency_in_selectors(propstats): - """ - Check if all PropStat() instances have the same selectors, - give warning if not. - - TO-DO: Add check to see if selectors have same codenames - - Same selectors and codenames are needed in order to compare - data from the different instances in e.g. WebViz - """ - ps_selectors = [] - - for pstat in propstats: - ps_selectors.append(list(pstat.pdata.selectors.keys())) - - # create list of all unique selectors from the PropStat() instances - all_selectors = list(set(sum(ps_selectors, []))) - - if not all(len(value) == len(all_selectors) for value in ps_selectors): - QCC.give_warn("Not all propstat elements have equal selectors") - for name, sel in zip([pstat.name for pstat in propstats], ps_selectors): - QCC.print_info(f"name = {name}, selectors = {sel}") - - return all_selectors diff --git a/src/fmu/tools/qcproperties/_config_parser.py b/src/fmu/tools/qcproperties/_config_parser.py new file mode 100644 index 00000000..d323b9af --- /dev/null +++ b/src/fmu/tools/qcproperties/_config_parser.py @@ -0,0 +1,179 @@ +""" Private class in qcproperties """ + +from fmu.tools._common import _QCCommon + +QCC = _QCCommon() + + +class ConfigParser: + """ + Class for parsing and preparing the input data for extracting statistics + with QCProperties. The input data is formatted and grouped into relevant + attributes based on where it will be utilized. + """ + + def __init__( + self, + data: dict, + ): + + QCC.verbosity = data.get("verbosity", 0) + + if "csvfile" in data: + raise KeyError( + "Use of 'csvfile' keyword in data is deprecated. " + "To output a csv-file use the to_csv method on " + "the QCProperties() instance instead!" + ) + + self._aggregation_controls = dict( + properties=[], + selectors=[], + weights={}, + ) + + self._prop2df_controls = dict( + unique_parameters=[], + properties_input_names=[], + selectors_input_names=[], + filters={}, + name_mapping={}, + usercodes={}, + ) + + self._data_loading_input = dict(pfiles={}, pdates={}) + + # set data loading input + for item in ["grid", "wells", "bwells", "path", "verbosity"]: + if item in data: + self._data_loading_input[item] = data[item] + + # set aggregation controls and parameter data + self._parse_properties(data["properties"]) + self._parse_selectors(data.get("selectors", {})) + self._parse_filters(data.get("filters", {})) + self._aggregation_controls["selector_combos"] = data.get( + "selector_combos", True + ) + self._aggregation_controls["output_percentage"] = data.get( + "output_percentage", False + ) + self._aggregation_controls["verbosity"] = QCC.verbosity + + # ================================================================================== + # Properties + # ================================================================================== + + @property + def data_loading_input(self) -> dict: + """Attribute to use for loading data to XTGeo""" + return self._data_loading_input + + @property + def aggregation_controls(self) -> dict: + """Attribute to use for statisticts aggregation""" + return self._aggregation_controls + + @property + def prop2df_controls(self) -> dict: + """Attribute to use for creating dataframe from properties""" + return self._prop2df_controls + + # ================================================================================== + # Hidden class methods + # ================================================================================== + + def _parse_properties(self, properties): + """Add property data to relevant attributes""" + + if isinstance(properties, list): + properties = {param: {"name": param} for param in properties} + + for column_name, values in properties.items(): + name = values["name"] + self._aggregation_controls["properties"].append(column_name) + self._prop2df_controls["properties_input_names"].append(name) + self._prop2df_controls["name_mapping"][name] = column_name + self._add_to_parameters(name) + + if "weight" in values: + self._aggregation_controls["weights"][column_name] = values["weight"] + self._add_to_parameters(values["weight"]) + if "range" in values: + self._prop2df_controls["filters"][name] = {"range": values["range"]} + if "pfile" in values: + self._data_loading_input["pfiles"][name] = values["pfile"] + + QCC.print_debug(f"properties: {properties}") + + def _parse_selectors(self, selectors): + """Add selector data to relevant attributes""" + + if isinstance(selectors, list): + selectors = {param: {"name": param} for param in selectors} + + for column_name, values in selectors.items(): + name = values["name"] + self._aggregation_controls["selectors"].append(column_name) + self._prop2df_controls["selectors_input_names"].append(name) + self._prop2df_controls["name_mapping"][name] = column_name + self._add_to_parameters(name) + + if "include" in values and "exclude" in values: + raise ValueError("can't both include and exclude values in filtering") + if "include" in values: + self._prop2df_controls["filters"][name] = {"include": values["include"]} + if "exclude" in values: + self._prop2df_controls["filters"][name] = {"exclude": values["exclude"]} + if "codes" in values: + self._prop2df_controls["usercodes"][name] = values["codes"] + if "pfile" in values: + self._data_loading_input["pfiles"][name] = values["pfile"] + + QCC.print_debug(f"selectors: {selectors}") + + def _parse_filters(self, filters): + """Add additional filters to relevant attributes""" + for name, values in filters.items(): + self._add_to_parameters(name) + + if "pfile" in values: + self._data_loading_input["pfiles"][name] = values["pfile"] + + # support using a selector prop as filter. If the selctor + # has filters specified in its values, they will be ignored + if name in self._prop2df_controls["filters"]: + QCC.give_warn( + f"Filters for {name} found both in 'filters' and 'selectors' " + "or 'properties'. The filter on the selector/property is ignored." + ) + self._prop2df_controls["filters"][name] = values + + # Filter format check + for values in self._prop2df_controls["filters"].values(): + if "include" in values: + if isinstance(values["include"], str): + values["include"] = [values["include"]] + if not all(isinstance(item, str) for item in values["include"]): + values["include"] = [str(item) for item in values["include"]] + + if "exclude" in values: + if isinstance(values["exclude"], str): + values["exclude"] = [values["exclude"]] + if not all(isinstance(item, str) for item in values["exclude"]): + values["exclude"] = [str(item) for item in values["exclude"]] + + if "range" in values: + if not ( + isinstance(values["range"], list) and len(values["range"]) == 2 + ): + raise TypeError( + "Filter range must be input as list with two values" + ) + + QCC.print_debug(f"Filters: {self._prop2df_controls['filters']}") + + def _add_to_parameters(self, param): + """Add parameter to list of unique parameters""" + if param not in self._prop2df_controls["unique_parameters"]: + self._prop2df_controls["unique_parameters"].append(param) diff --git a/src/fmu/tools/qcproperties/_grid2df.py b/src/fmu/tools/qcproperties/_grid2df.py new file mode 100644 index 00000000..01ab0488 --- /dev/null +++ b/src/fmu/tools/qcproperties/_grid2df.py @@ -0,0 +1,154 @@ +from typing import Optional +import pandas as pd +from fmu.tools.qcdata import QCData +from fmu.tools._common import _QCCommon +from fmu.tools.qcproperties._config_parser import ConfigParser +from fmu.tools.qcproperties._utils import filter_df + +QCC = _QCCommon() + + +class GridProps2df: + """ + Class responsible for generating a property dataframe from grid prperties, and + providing control arguments for the statisics extraction using PropertyAggregation() + """ + + def __init__(self, project: Optional[object], data: dict, xtgdata: QCData): + + """Initiate instance""" + QCC.verbosity = data.get("verbosity", 0) + + self._xtgdata = xtgdata # A QCData instance used for dataloading to XTGeo + self._property_type = None + self._dataframe = pd.DataFrame() # dataframe with property data + + self._data_input_preparations(project, data) + + # Create dataframe from grid properties + self._create_df_from_grid_props() + + # ================================================================================== + # Class properties + # ================================================================================== + + @property + def dataframe(self) -> pd.DataFrame(): + """Dataframe with property statistics.""" + return self._dataframe + + @property + def property_type(self) -> str: + """Property type (continous/discrete)""" + return self._property_type + + @property + def aggregation_controls(self) -> dict: + """Attribute to use for statistics aggregation""" + return self._aggregation_controls + + # ================================================================================== + # Hidden class methods + # ================================================================================== + + def _data_input_preparations(self, project: Optional[object], data: dict): + """ + Prepare the input parameter data for usage within QCProperties(). + Parameters are loaded to XTGeo and property types are checked. + """ + data = data.copy() + controllers = ConfigParser(data) + + self._aggregation_controls = controllers.aggregation_controls + self._controls = controllers.prop2df_controls + + xtg_input = controllers.data_loading_input + + # set gridprops argument format for dataloading with QCData() + if not xtg_input["pfiles"]: + xtg_input["gridprops"] = self._controls["unique_parameters"] + else: + xtg_input["gridprops"] = [ + [param, xtg_input["pfiles"][param]] + if param in xtg_input["pfiles"] + else param + for param in self._controls["unique_parameters"] + ] + + # Load data to XTGeo + self._xtgdata.parse( + project=project, + data=xtg_input, + reuse=True, + ) + # Load data to XTGeo + self._check_and_set_property_type() + + def _create_df_from_grid_props(self): + """ + Extract a combined property dataframe for the input properties. + Values for discrete logs will be replaced by their codename. + """ + QCC.print_info("Creating property dataframe from grid properties") + + self._dataframe = self._xtgdata.gridprops.get_dataframe().copy().dropna() + + # replace codes values in dataframe with code names + self._codes_to_codenames() + + # Filter property dataframe + if self._controls["filters"]: + self._dataframe = filter_df(self._dataframe, self._controls["filters"]) + + # rename columns in dataframe + self.dataframe.rename(columns=self._controls["name_mapping"], inplace=True) + + def _check_and_set_property_type(self): + """ + Use XTGeo to check that selectors are discrete, and also + check if input properties are continous or discrete. + Raise errors if not desired format. + """ + # check that all selectors are discrete + selectors = self._controls["selectors_input_names"] + xtgprops = [ + self._xtgdata.gridprops.get_prop_by_name(prop) for prop in selectors + ] + if not all(prop.isdiscrete for prop in xtgprops): + raise ValueError("Only discrete properties can be used as selectors") + + # check that all properties defined are of the same type + properties = self._controls["properties_input_names"] + xtgprops = [ + self._xtgdata.gridprops.get_prop_by_name(prop) for prop in properties + ] + if any(prop.isdiscrete for prop in xtgprops) and not all( + prop.isdiscrete for prop in xtgprops + ): + raise TypeError( + "Properties of different types (continuous/discrete) " + "defined in the input." + ) + + # Set attribute used to control aggregation method + discrete = xtgprops[0].isdiscrete + QCC.print_debug( + f"{'Discrete' if discrete else 'Continous'} properties in input" + ) + self._property_type = "DISC" if discrete else "CONT" + + def _codes_to_codenames(self): + """Replace codes in dicrete parameters with codenames""" + for param in self._controls["unique_parameters"]: + xtg_prop = self._xtgdata.gridprops.get_prop_by_name(param) + + if xtg_prop.isdiscrete: + codes = xtg_prop.codes.copy() + usercodes = self._controls["usercodes"].copy() + + # Update code names if user input + if usercodes and param in usercodes: + codes.update(usercodes[param]) + + # replace codes values in dataframe with code names + self._dataframe[param] = self._dataframe[param].map(codes.get) diff --git a/src/fmu/tools/qcproperties/_propstat.py b/src/fmu/tools/qcproperties/_propstat.py deleted file mode 100644 index 2a40421a..00000000 --- a/src/fmu/tools/qcproperties/_propstat.py +++ /dev/null @@ -1,601 +0,0 @@ -"""Module containing .... """ -from pathlib import Path -import pandas as pd -import numpy as np - -from fmu.tools._common import _QCCommon -from fmu.tools.qcdata import QCData - -from fmu.tools.qcproperties._propstat_parameter_data import PropStatParameterData - -# from fmu.tools.qcproperties._property_dataframe import create_property_dataframe -from fmu.tools.qcproperties._utils import list_combinations, filter_df - -QCC = _QCCommon() - - -class PropStat: - """ - Class for extracting property statistics from Grids, Raw and Blocked wells. - - Statistics for multiple properties can be calculated simultaneosly, and - selectors can be used to extract statistics per value in discrete - properties/logs. Filters can be used to remove unwanted data from the datasets. - - Args: - parameter_data (obj): An instance of PropStatParameterData() containing - parameters e.g. properties, selectors and filters. - xtgeo_data (obj): An instance of QCData() containing XTGeo objects - data (dict): The input data as a Python dictionary (see description of - valid argument keys in documentation) - """ - - CALCULATIONS = [ - "Avg", - "Stddev", - "Min", - "Max", - "P10", - "P90", - "Avg_weighted", - "Percent", - "Percent_weighted", - ] - - def __init__( - self, - parameter_data: PropStatParameterData, - xtgeo_data: QCData, - data: dict, - ): - - """Initiate instance""" - QCC.verbosity = data.get("verbosity", 0) - - self._pdata = parameter_data - self._xtgdata = xtgeo_data - self._data = data - self._dtype = data.get("dtype", None) - self._name = data.get("name", None) - self._selector_combos = data.get("selector_combos", True) - self._codes = {} # codenames for disc parameters - self._prop_df_full = pd.DataFrame() # dataframe containing all parameter values - self._prop_df = pd.DataFrame() # filtered dataframe as input to calculations - self._dataframe = pd.DataFrame() # dataframe with statistics - self._dataframe_disc = pd.DataFrame() # dataframe with percentages - - self._set_source() - self._set_wells() - self._check_properties() - - # Get dataframe from the XTGeo objects - self._prop_df_full = ( - self._create_prop_df_from_grid_props() - if self._dtype == "grid" - else self._create_prop_df_from_wells() - ) - # Get dataframe from the XTGeo objects - self.extract_statistics() - - if "csvfile" in data: - self.to_csv(data["csvfile"]) - - # ================================================================================== - # Class properties - # ================================================================================== - - @property - def dataframe(self): - """Returns the dataframe with continous property statistics.""" - return self._dataframe - - @property - def property_dataframe(self): - """Returns the Pandas dataframe object used as input to statistics.""" - return self._prop_df - - @property - def dataframe_disc(self): - """Returns the dataframe with dicrete property statistics.""" - return self._dataframe_disc - - @property - def pdata(self): - """Returns the PropStatParameterData instance.""" - return self._pdata - - @pdata.setter - def pdata(self, newdata): - """Update the PropStatParameterData instance.""" - self._pdata = newdata - - @property - def xtgdata(self): - """Returns available and reusable XTGeo data.""" - return self._xtgdata - - @xtgdata.setter - def xtgdata(self, newdata): - """Update available XTGeo data.""" - self._xtgdata = newdata - - @property - def name(self): - """Returns name used as ID column in dataframe.""" - return self._name - - @name.setter - def name(self, newname): - """Set name used as ID column in dataframe.""" - self._name = newname - - @property - def source(self): - """Returns the source string.""" - return self._source - - @property - def codes(self): - """Returns the codenames used for discrete properties.""" - return self._codes - - # ================================================================================== - # Hidden class methods - # ================================================================================== - - def _set_source(self): - """Set source attribute""" - if "source" in self._data: - self._source = self._data["source"] - else: - self._source = self._dtype - if self._dtype == "grid": - self._source = Path(self._data["grid"]).stem - elif self._dtype == "bwells" and self._data["project"] is not None: - self._source = self._data["bwells"].get("bwname", "BW") - - QCC.print_info(f"Source is set to: '{self._source}'") - - def _set_wells(self): - """Set wells attribute""" - if self._dtype != "grid": - self._wells = ( - self._xtgdata.wells.wells - if self._dtype == "wells" - else self._xtgdata.bwells.wells - ) - self._validate_wells() - else: - self._wells = None - - def _check_properties(self): - """Group properties into continous and discrete""" - self._disc_props = [] - self._cont_props = [] - - for prop, values in self.pdata.properties.items(): - if self._dtype == "grid": - xtg_prop = self._xtgdata.gridprops.get_prop_by_name(values["name"]) - if xtg_prop.isdiscrete: - self._disc_props.append(prop) - if values["name"] not in self.pdata.disc_params: - self.pdata.disc_params.append(values["name"]) - else: - self._cont_props.append(prop) - else: - if self._wells[0].isdiscrete(values["name"]): - self._disc_props.append(prop) - if values["name"] not in self.pdata.disc_params: - self.pdata.disc_params.append(values["name"]) - else: - self._cont_props.append(prop) - - def _codes_to_codenames(self, dframe): - """Replace codes in dicrete parameters with codenames""" - for param in self.pdata.disc_params: - if self._dtype == "grid": - xtg_prop = self._xtgdata.gridprops.get_prop_by_name(param) - - if not xtg_prop.isdiscrete: - raise RuntimeError( - "A selector parameter needs to be discrete: " - f"{param} parameter is not!" - ) - codes = xtg_prop.codes.copy() - else: - codes = self._wells[0].get_logrecord(param).copy() - - if self.pdata.codenames is not None and param in self.pdata.codenames: - codes.update(self.pdata.codenames[param]) - self._codes[param] = codes - - # replace codes values in dataframe with code names - dframe[param] = dframe[param].map(codes.get) - return dframe - - def _create_prop_df_from_grid_props(self): - """ - Extract a combined property dataframe for the input properties. - Values for discrete logs will be replaced by their codename. - """ - QCC.print_info("Creating property dataframe from grid properties") - # check that all properties defined are present as xtgeo properties - for prop in self.pdata.params: - if prop not in self._xtgdata.gridprops.names: - print(self._xtgdata.gridprops.names) - raise ValueError(f"Property name {prop} not found in xtg_props") - - dframe = self._xtgdata.gridprops.dataframe().copy().dropna() - - # replace codes values in dataframe with code names - return self._codes_to_codenames(dframe) - - def _validate_wells(self): - removed_wells = [] - for xtg_well in self._wells: - # skip well if discrete parameters are missing - if not all(log in xtg_well.lognames for log in self.pdata.disc_params): - QCC.print_info( - f"Skipping {xtg_well.name} some dicrete logs are missing" - ) - removed_wells.append(xtg_well) - continue - for log in self.pdata.disc_params: - if log in xtg_well.lognames: - if not xtg_well.isdiscrete(log): - raise ValueError( - "Selector and Filter logs needs to be discrete: " - f"{log} is not!" - ) - self._wells = [ - xtg_well for xtg_well in self._wells if xtg_well not in removed_wells - ] - - def _create_prop_df_from_wells(self): - """ - Create a combined property dataframe for the input wells. - Values for discrete logs will be replaced by their codename. - """ - QCC.print_info("Creating property dataframe from well logs") - # Loop through XTGeo wells and combine into one dataframe - dfs = [] - for xtg_well in self._wells: - # extract dataframe for well - df_well = xtg_well.dataframe.copy() - df_well["WELL"] = xtg_well.name - dfs.append(df_well) - - dframe = pd.concat(dfs) - - # To avoid bias in statistics, drop duplicates to remove - # cells penetrated by multiple wells. - dframe = dframe.drop_duplicates( - subset=[x for x in dframe.columns if x != "WELL"] - ) - dframe = dframe[self.pdata.params] - # replace codes values in dataframe with code names - return self._codes_to_codenames(dframe) - - def _update_filters(self, filters, reuse=True): - """ - Change filters and update the unfiltered property dataframe if - a new filter parameter is introduced. - """ - - pdata_upd = PropStatParameterData( - properties=self.pdata.properties, - selectors=self.pdata.selectors, - filters=filters, - ) - - # extract new property dataframe if new parameters are added to filters - if set(pdata_upd.params) == set(self.pdata.params): - self.pdata = pdata_upd - return - else: - self.pdata = pdata_upd - if self._dtype == "grid": - self._data["gridprops"] = self.pdata.params - - self.xtgdata.parse( - project=self._data["project"], - data=self._data, - reuse=reuse, - wells_settings=None - if self._dtype == "grid" - else { - "lognames": self.pdata.params, - }, - ) - self._prop_df_full = ( - self._create_prop_df_from_grid_props() - if self._dtype == "grid" - else self._create_prop_df_from_wells() - ) - - def _rename_prop_df_columns(self): - """ - Rename the columns of the property dataframe. - From 'name' value to key value in properties and selectors. - """ - - rename_dict = { - values["name"]: name for name, values in self.pdata.properties.items() - } - rename_dict.update( - {values["name"]: name for name, values in self.pdata.selectors.items()} - ) - - return self.property_dataframe.rename(columns=rename_dict, inplace=True) - - def _aggregations(self, dframe=None, discrete=False): - """Statistical aggregations to extract from the data""" - - return ( - [ - ("Avg", np.mean), - ("Stddev", np.std), - ("P10", lambda x: np.nanpercentile(x, q=10)), - ("P90", lambda x: np.nanpercentile(x, q=90)), - ("Min", np.min), - ("Max", np.max), - ( - "Avg_Weighted", - lambda x: np.average( - x.dropna(), - weights=dframe.loc[ - x.dropna().index, self.pdata.weights[x.name] - ], - ) - if x.name in self.pdata.weights - else np.nan, - ), - ] - if not discrete - else [ - ("Count", "count"), - ( - "Sum_Weight", - lambda x: np.sum(x) - if x.name in list(self.pdata.weights.values()) - else np.nan, - ), - ] - ) - - def _calculate_statistics(self, selector_combo_list, selectors): - """ - Calculate statistics for continous properties. - Returns a pandas dataframe. - """ - dframe = self.property_dataframe.copy() - - # Extract statistics for combinations of selectors - dfs = [] - groups = [] - for combo in selector_combo_list: - group = dframe.dropna(subset=combo).groupby(combo) - groups.append(group) - - df_group = ( - group[self._cont_props] - .agg(self._aggregations(dframe=dframe)) - .stack(0) - .rename_axis(combo + ["PROPERTY"]) - .reset_index() - ) - dfs.append(df_group) - - # Extract statistics for the total - group_total = dframe.dropna(subset=selectors).groupby(lambda x: "Total") - df_group = ( - group_total[self._cont_props] - .agg(self._aggregations(dframe=dframe)) - .stack(0) - .reset_index(level=0, drop=True) - .rename_axis(["PROPERTY"]) - .reset_index() - ) - dfs.append(df_group) - dframe = pd.concat(dfs) - - # empty values in selectors is filled with "Total" - dframe[selectors] = dframe[selectors].fillna("Total") - - # return dataframe with specified columns order - cols_first = ["PROPERTY"] + selectors - dframe = dframe[cols_first + [x for x in dframe.columns if x not in cols_first]] - - dframe["SOURCE"] = self._source - dframe["ID"] = self._name - return dframe - - def _calculate_percentages(self, selector_combo_list, selectors): - """ - Calculate statistics for discrete properties. A Weighted Percent - is calculated for each property where a weight is specified - Returns a pandas dataframe. - """ - dframe = self.property_dataframe.copy() - - combo_list = selector_combo_list - selectors = selectors.copy() - - dfs = [] - for prop in self._disc_props: - if prop not in selectors: - combo_list = [x + [prop] for x in selector_combo_list] - combo_list.append([prop]) - selectors.append(prop) - - select = self.pdata.weights[prop] if prop in self.pdata.weights else prop - - for combo in combo_list: - df_prop = dframe.dropna(subset=combo).copy() - df_group = ( - df_prop.groupby(combo)[select] - .agg(self._aggregations(discrete=True)) - .reset_index() - .assign(PROPERTY=prop) - ) - - for col, name in { - "Percent_weighted": "Sum_Weight", - "Percent": "Count", - }.items(): - df_group[f"Total_{name}"] = ( - df_group.groupby([x for x in combo if x != prop])[ - name - ].transform(lambda x: x.sum()) - if combo != [prop] - else df_group[name].sum() - ) - df_group[col] = (df_group[name] / df_group[f"Total_{name}"]) * 100 - - df_group = df_group.drop( - columns=["Total_Sum_Weight", "Total_Count", "Sum_Weight"] - ) - dfs.append(df_group) - - dframe = pd.concat(dfs) - - # empty values in selectors is filled with "Total" - dframe[selectors] = dframe[selectors].fillna("Total") - - # return dataframe with specified columns order - cols_first = ["PROPERTY"] + selectors - dframe = dframe[cols_first + [x for x in dframe.columns if x not in cols_first]] - - dframe["SOURCE"] = self._source - dframe["ID"] = self._name - return dframe - - def _group_data_and_aggregate(self): - """ - Calculate statistics for properties for a given set - of combinations of discrete selector properties. - Returns a pandas dataframe. - """ - selectors = list(self.pdata.selectors.keys()) - - if selectors: - if self._selector_combos: - selector_combo_list = list_combinations(selectors) - else: - selector_combo_list = selectors if len(selectors) == 1 else [selectors] - QCC.print_info( - f"Extracting statistics for selector group in: {selector_combo_list}" - ) - else: - selector_combo_list = [] - QCC.print_info("No selectors, extracting statistics for the total") - - if self._cont_props: - QCC.print_info("Calculating statistics for continous properties...") - self._dataframe = self._calculate_statistics(selector_combo_list, selectors) - - if self._disc_props: - QCC.print_info("Calculating percentages for discrete properties...") - self._dataframe_disc = self._calculate_percentages( - selector_combo_list, selectors - ) - - # ================================================================================== - # Public class methods - # ================================================================================== - - def extract_statistics( - self, filters: dict = None, reuse: bool = True - ) -> pd.DataFrame: - """Filter the property dataframe and calculate statistics.""" - - if filters: - self._update_filters(filters, reuse) - - # Filter full property dataframe and rename column headers - self._prop_df = ( - filter_df(self._prop_df_full, self.pdata.filters) - if self.pdata.filters - else self._prop_df_full - ) - self._rename_prop_df_columns() - - # Generate dataframes with statistics - self._group_data_and_aggregate() - - def to_csv(self, csvfile: str = "../../share/results/tables/propstats.csv"): - """ Write the property statistics dataframe to csv """ - self.dataframe.to_csv(csvfile, index=False) - - def get_value( - self, prop, calculation: str = None, conditions: dict = None, codename=None - ) -> float: - """ - Retrive statistical value from either of the two the property statistics - dataframes (dependent on the property type, discrete vs continous). - - Args: - prop (str): name of property - conditions (dict): A dictionary with selector conditions to look up - value for, e.g {"REGION": "EAST", "ZONE": "TOP_ZONE"}. If no - conditions are given, the value for the total will be returned. - calculation (str): Name of column to retrieve value from. "Avg" is the - default for continous properties, "Percent" for discrete. - codename (str): Codename to select for discrete properties - """ - - conditions = conditions if conditions is not None else {} - disc_prop = True if prop in self._disc_props else False - - if disc_prop: - if codename is not None: - conditions[prop] = codename - else: - raise ValueError( - "A 'codename' argument is needed for discrete properties" - ) - - if calculation is None: - calculation = "Avg" if prop in self._cont_props else "Percent" - - if calculation not in PropStat.CALCULATIONS: - raise KeyError( - f"{calculation} is not a valid calculation. " - f"Valid calculations are: {', '.join(PropStat.CALCULATIONS)}" - ) - - dframe = self.dataframe if not disc_prop else self.dataframe_disc - dframe = dframe[dframe["PROPERTY"] == prop].copy() - - selectors = list(self.pdata.selectors.keys()) - if disc_prop and prop not in selectors: - selectors = selectors.append(prop) if selectors else [prop] - - if selectors: - if not all(x in selectors for x in conditions): - raise ValueError("One or more condition properties are not a selector") - - missing_selectors = [x for x in selectors if x not in conditions] - - # Raise exception if selectors are missing in conditions and - # self._selector_combos=False as the result will be unambigous. - # If selector_combos=True use value "Total" for missing selectors - if not self._selector_combos and missing_selectors: - raise ValueError("All selectors needs to be defined in conditions") - - for selector in missing_selectors: - conditions[selector] = "Total" - - for selector, value in conditions.items(): - if value not in dframe[selector].unique(): - raise ValueError( - f"{value} not found in column {selector} " - f"Valid options are {dframe[selector].unique()}" - ) - dframe = dframe[dframe[selector] == value] - - if len(dframe.index) > 1: - print(dframe) - raise Exception("Ambiguous result, multiple rows meet conditions") - - return dframe.iloc[0][calculation] diff --git a/src/fmu/tools/qcproperties/_propstat_parameter_data.py b/src/fmu/tools/qcproperties/_propstat_parameter_data.py deleted file mode 100644 index bda545da..00000000 --- a/src/fmu/tools/qcproperties/_propstat_parameter_data.py +++ /dev/null @@ -1,202 +0,0 @@ -""" Private class in qcproperties """ - -from typing import Union -from fmu.tools._common import _QCCommon - -QCC = _QCCommon() - - -class PropStatParameterData: - """Class for preparing the input parameter data for use with a PropStat() - instance. Initializing this class will combine and group the input data - into different class attributes. - - Args: - properties (dict or list): - Properties to compute statistics for. Can be given as list or as dictionary. - If dictionary the key will be the column name in the output dataframe, and - the value will be a dictionary with valid options: - "name" (str or path): the actual name (or path) of the parameter / log. - "weight" (str or path): a weight parameter (name or path if outside RMS) - - selectors (dict or list): - Selectors are discrete properties/logs e.g. Zone. that are used to extract - statistics for groups of the data. Can be given as list or as dictionary. - If dictionary the key will be the column name in the output dataframe, and - the value will be a dictionary with valid options: - "name" (str or path): the actual name (or path) of the property / log. - "include" or "exclude" (list): list of values to include/exclude - "codes" (dict): a dictionary of codenames to update existing codenames. - - filters (dict): - Additional filters, only discrete parameters are supported. - The key is the name (or path) to the filter parameter / log, and the - value is a dictionary with valid options: - "include" or "exclude" (list): list of values to include/exclude - - Example:: - - properties = {"PORO": {"name": "PHIT", "weight": "Total_Bulk"}}, - selectors = { - "ZONE": { - "name": "Regions", - "exclude": ["Surroundings"], - "codes": {1: "East", 2: "North"}, - } - }, - filters = {"Fluid": {"include": ["oil", "gas"]}} - """ - - def __init__( - self, - properties: Union[dict, list], - selectors: Union[dict, list] = None, - filters: dict = None, - verbosity: int = None, - ): - - self._params = [] - self._disc_params = [] - self._filters = {} - self._codenames = {} - self._properties = {} - self._selectors = {} - self._weights = {} - - QCC.verbosity = verbosity - - # adjust format of properties and selectors if input as list - self._properties, self._selectors = self._input_conversion( - properties, selectors - ) - - # combine data and set different instance attributes - self._combine_data(filters) - - @property - def properties(self): - """Attribute containing all properties""" - return self._properties - - @property - def selectors(self): - """Attribute containing all selector properties""" - return self._selectors - - @property - def params(self): - """Data attribute containing all unique parameters collected from the input""" - return self._params - - @property - def disc_params(self): - """Discrete Parameters attribute""" - return self._disc_params - - @disc_params.setter - def disc_params(self, newdata): - """Update the discrete parameter list.""" - self._disc_params = newdata - - @property - def filters(self): - """Filter attribute""" - return self._filters - - @property - def codenames(self): - """Codenames attribute used to update codenames for dicrete parameters""" - return self._codenames - - @property - def weights(self): - """Weight attribute""" - return self._weights - - # ================================================================================== - # Hidden class methods - # ================================================================================== - - @staticmethod - def _input_conversion(properties, selectors): - """ - Check if property and selector data are given as list and - return desired input dict format for _PropStatParameterData - """ - properties_dict = {} - selectors_dict = {} - - if isinstance(properties, list): - for prop in properties: - properties_dict[prop] = {"name": prop} - properties = properties_dict - - if isinstance(selectors, list): - for selctor in selectors: - selectors_dict[selctor] = {"name": selctor} - selectors = selectors_dict - - return properties, selectors - - def _add_properties_data(self): - """ Add properties data to relevant attributes """ - for prop, values in self._properties.items(): - self._params.append(values["name"]) - - if "weight" in values: - self._weights[prop] = values["weight"] - if values["weight"] not in self._params: - self._params.append(values["weight"]) - - def _add_selector_data(self): - """ Add selector data to relevant attributes """ - for values in self._selectors.values(): - prop = values["name"] - if prop not in self._params: - self._params.append(prop) - if prop not in self._disc_params: - self._disc_params.append(prop) - - if "include" in values and "exclude" in values: - raise ValueError("can't both include and exclude values in filtering") - - if "include" in values: - self._filters[prop] = {"include": values.get("include")} - if "exclude" in values: - self._filters[prop] = {"exclude": values.get("exclude")} - - if "codes" in values: - self._codenames[prop] = values["codes"] - - def _add_filters(self, filters): - """ Add additional filters to relevant attributes """ - for prop, values in filters.items(): - if prop not in self._params: - self._params.append(prop) - self._filters[prop] = values - self._disc_params.append(prop) - - # support using a selector prop as filter. If the selctor - # has filters specified in its values, they will be ignored - if any(x["name"] == prop for x in self._selectors.values()): - if prop in self._filters: - QCC.give_warn( - f"Filters for {prop} found both in 'filters' and 'selectors'. " - "The filter defined on the selector is ignored." - ) - self._filters[prop] = values - - def _combine_data(self, filters): - """ create combined lists of all data sources""" - - self._add_properties_data() - - if self._selectors: - self._add_selector_data() - - if filters is not None: - self._add_filters(filters) - - QCC.print_debug(f"All Properties: {self.properties}") - QCC.print_debug(f"All Selectors: {self.selectors}") - QCC.print_debug(f"All Filters: {self.filters}") diff --git a/src/fmu/tools/qcproperties/_utils.py b/src/fmu/tools/qcproperties/_utils.py index 074461a5..84c08834 100644 --- a/src/fmu/tools/qcproperties/_utils.py +++ b/src/fmu/tools/qcproperties/_utils.py @@ -4,15 +4,12 @@ def filter_df(dframe, filters): - """Filter dataframe """ + """Filter dataframe""" dframe = dframe.copy() for prop, filt in filters.items(): - if filt.get("include"): - if isinstance(filt["include"], str): - filt["include"] = [filt["include"]] + if "include" in filt: if all(x in dframe[prop].unique() for x in filt["include"]): - dframe = dframe[dframe[prop].isin(filt["include"])] else: raise ValueError( @@ -20,9 +17,7 @@ def filter_df(dframe, filters): f"does not exist in dataframe column {prop} " f"Available values are: {dframe[prop].unique()}" ) - if filt.get("exclude"): - if isinstance(filt["exclude"], str): - filt["exclude"] = [filt["exclude"]] + if "exclude" in filt: if all(x in dframe[prop].unique() for x in filt["exclude"]): dframe = dframe[~dframe[prop].isin(filt["exclude"])] else: @@ -31,6 +26,9 @@ def filter_df(dframe, filters): f"does not exist in dataframe column {prop} " f"Available values are: {dframe[prop].unique()}" ) + if "range" in filt: + low_value, high_value = filt["range"] + dframe = dframe[(dframe[prop] >= low_value) & (dframe[prop] <= high_value)] if dframe.empty: raise Exception("Empty dataframe - no data left after filtering") diff --git a/src/fmu/tools/qcproperties/_well2df.py b/src/fmu/tools/qcproperties/_well2df.py new file mode 100644 index 00000000..ec5ecc97 --- /dev/null +++ b/src/fmu/tools/qcproperties/_well2df.py @@ -0,0 +1,191 @@ +from typing import Optional +import pandas as pd + +from fmu.tools.qcdata import QCData +from fmu.tools._common import _QCCommon +from fmu.tools.qcproperties._config_parser import ConfigParser +from fmu.tools.qcproperties._utils import filter_df + +QCC = _QCCommon() + + +class WellLogs2df: + """ + Class responsible for generating a property dataframe from well logs, and + providing control arguments for the statisics extraction using PropertyAggregation() + """ + + def __init__( + self, + project: Optional[object], + data: dict, + xtgdata: QCData, + blockedwells: bool = False, + ): + + """Initiate instance""" + QCC.verbosity = data.get("verbosity", 0) + + self._xtgdata = xtgdata # A QCData instance used for dataloading to XTGeo + self._wells = [] + self._property_type = None + self._dataframe = pd.DataFrame() # dataframe with property log data + + self._data_input_preparations(project, data, blockedwells) + + # Get dataframe from the XTGeo objects + self._create_df_from_wells() + + # ================================================================================== + # Class properties + # ================================================================================== + + @property + def dataframe(self) -> pd.DataFrame(): + """Dataframe with property statistics.""" + return self._dataframe + + @property + def property_type(self) -> str: + """Property type (continous/discrete)""" + return self._property_type + + @property + def aggregation_controls(self) -> dict: + """Attribute to use for statistics aggregation""" + return self._aggregation_controls + + # ================================================================================== + # Hidden class methods + # ================================================================================== + + def _data_input_preparations( + self, project: Optional[object], data: dict, blockedwells: bool + ): + """ + Prepare the input parameter data for usage within QCProperties(). + Parameters are loaded to XTGeo and property types are checked. + """ + data = data.copy() + + if blockedwells: + data["bwells"] = data.pop("wells") + + controllers = ConfigParser(data) + + self._aggregation_controls = controllers.aggregation_controls + self._controls = controllers.prop2df_controls + + # Load data to XTGeo + self._xtgdata.parse( + project=project, + data=controllers.data_loading_input, + reuse=True, + wells_settings={"lognames": self._controls["unique_parameters"]}, + ) + + self._set_wells(blockedwells) + + # Check which property type is input + self._check_logs_and_set_property_type() + + def _set_wells(self, blockedwells: bool): + """Set wells attribute""" + self._wells = ( + self._xtgdata.wells.wells + if not blockedwells + else self._xtgdata.bwells.wells + ) + self._validate_wells() + + def _validate_wells(self): + """Remove wells where selector logs are missing""" + selectors = self._controls["selectors_input_names"] + removed_wells = [] + for xtg_well in self._wells: + # skip well if selector logs are missing + if not all(log in xtg_well.lognames for log in selectors): + QCC.print_info( + f"Skipping {xtg_well.name} some selector logs are missing" + ) + removed_wells.append(xtg_well) + continue + self._wells = [ + xtg_well for xtg_well in self._wells if xtg_well not in removed_wells + ] + + def _check_logs_and_set_property_type(self): + """ + Use XTGeo to check that selectors are discrete, and also + check if input properties are continous or discrete. + Raise errors if not desired format. + """ + # check that all selectors are discrete + selectors = self._controls["selectors_input_names"] + if not all(self._wells[0].isdiscrete(log) for log in selectors): + raise ValueError("Only discrete logs can be used as selectors") + + # check that all properties defined are of the same type + properties = self._controls["properties_input_names"] + if any(self._wells[0].isdiscrete(log) for log in properties) and not all( + self._wells[0].isdiscrete(log) for log in properties + ): + raise TypeError( + "Properties of different types (continuous/discrete) " + "defined in the input." + ) + + # Set attribute used to control aggregation method + discrete = self._wells[0].isdiscrete(properties[0]) + QCC.print_debug( + f"{'Discrete' if discrete else 'Continous'} properties in input" + ) + self._property_type = "DISC" if discrete else "CONT" + + def _codes_to_codenames(self): + """Replace codes in dicrete parameters with codenames""" + for param in self._controls["unique_parameters"]: + + if self._wells[0].isdiscrete(param): + codes = self._wells[0].get_logrecord(param).copy() + usercodes = self._controls["usercodes"].copy() + + # Update code names if user input + if usercodes and param in usercodes: + codes.update(usercodes[param]) + + # replace codes values in dataframe with code names + self._dataframe[param] = self._dataframe[param].map(codes.get) + + def _create_df_from_wells(self): + """ + Create a combined property dataframe for the input wells. + Values for discrete logs will be replaced by their codename. + """ + QCC.print_info("Creating property dataframe from well logs") + # Loop through XTGeo wells and combine into one dataframe + dfs = [] + for xtg_well in self._wells: + # extract dataframe for well + df_well = xtg_well.dataframe.copy() + df_well["WELL"] = xtg_well.name + dfs.append(df_well) + + dframe = pd.concat(dfs) + + # To avoid bias in statistics, drop duplicates to remove + # cells penetrated by multiple wells. + dframe = dframe.drop_duplicates( + subset=[x for x in dframe.columns if x != "WELL"] + ) + self._dataframe = dframe[self._controls["unique_parameters"]].copy() + + # replace codes values in dataframe with code names + self._codes_to_codenames() + + # Filter property dataframe + if self._controls["filters"]: + self._dataframe = filter_df(self._dataframe, self._controls["filters"]) + + # rename columns in dataframe + self.dataframe.rename(columns=self._controls["name_mapping"], inplace=True) diff --git a/src/fmu/tools/qcproperties/qcproperties.py b/src/fmu/tools/qcproperties/qcproperties.py index cdfda4da..408ffa0c 100644 --- a/src/fmu/tools/qcproperties/qcproperties.py +++ b/src/fmu/tools/qcproperties/qcproperties.py @@ -1,14 +1,15 @@ """The qcproperties module""" - +from pathlib import Path +from typing import Optional import pandas as pd import yaml from fmu.tools._common import _QCCommon from fmu.tools.qcdata import QCData -from fmu.tools.qcproperties._combine_propstats import combine_property_statistics -from fmu.tools.qcproperties._propstat_parameter_data import PropStatParameterData -from fmu.tools.qcproperties._propstat import PropStat +from fmu.tools.qcproperties._grid2df import GridProps2df +from fmu.tools.qcproperties._well2df import WellLogs2df +from fmu.tools.qcproperties._aggregate_df import PropertyAggregation QCC = _QCCommon() @@ -18,6 +19,9 @@ class QCProperties: The QCProperties class consists of a set of methods for extracting property statistics from 3D Grids, Raw and Blocked wells. + Statistics can be collected from either discrete or continous properties. + Dependent on the property different statistics are collected. + The methods for statistics extraction can be run individually, or a yaml-configuration file can be used to enable an automatic run of the methods. See the method 'from_yaml'. @@ -30,224 +34,206 @@ class QCProperties: XTGeo is being utilized to get a dataframe from the input parameter data. XTGeo data is reused in the instance to increase performance. - - Methods for extracting statistics from 3D Grids, Raw and Blocked wells: - - Args: - data (dict): The input data as a Python dictionary (see description of - valid argument keys in documentation) - reuse (bool or list): If True, then grid and gridprops will be reused - as default. Alternatively it can be a list for more - fine grained control, e.g. ["grid", "gridprops", "wells"] - project (obj or str): For usage inside RMS - - Returns: - A PropStat() instance - """ def __init__(self): - self._propstats = [] # list of PropStat() instances - self._dataframe = pd.DataFrame() # merged dataframe with continous stats - self._dataframe_disc = pd.DataFrame() # merged dataframe with discrete stats self._xtgdata = QCData() # QCData instance, general XTGeo data + self._dfs = [] # list of dataframes with aggregated statistics + self._selectors_all = [] + self._proptypes_all = [] + self._ids = [] + self._dataframe = pd.DataFrame() # merged dataframe with statistics # Properties: # ================================================================================== @property def dataframe(self): - """A merged dataframe from all the PropStat() instances""" - self._dataframe = self._create_dataframe(self._dataframe) + """Dataframe with statistics""" + self._dataframe = self._create_or_return_dataframe() return self._dataframe - @property - def dataframe_disc(self): - """A merged dataframe from all the PropStat() instances""" - self._dataframe_disc = self._create_dataframe( - self._dataframe_disc, discrete=True - ) - return self._dataframe_disc - - @property - def xtgdata(self): - """The QCData instance""" - return self._xtgdata - # Hidden methods: # ================================================================================== - def _input_preparations(self, project, data, reuse, dtype, qcdata=None): - """ - Prepare the input parameter data for use with a PropStat() instance. - Parameters are loaded to XTGeo and can be reused in the instance. - """ - - data = data.copy() - data["dtype"] = dtype - data["project"] = project - if dtype == "bwells": - data["bwells"] = data.pop("wells") - - pdata = PropStatParameterData( - properties=data["properties"], - selectors=data.get("selectors", {}), - filters=data.get("filters", None), - verbosity=data.get("verbosity", 0), - ) - - if dtype == "grid": - pfiles = {} - for elem in ["properties", "selectors", "filters"]: - if elem in data and isinstance(data[elem], dict): - for values in data[elem].values(): - if "pfile" in values: - pfiles[values["name"]] = values["pfile"] - - data["gridprops"] = ( - [ - [param, pfiles[param]] if param in pfiles else ["unknown", param] - for param in pdata.params - ] - if project is None - else pdata.params - ) - - if qcdata is not None: - self._xtgdata = qcdata - - self._xtgdata.parse( - project=data["project"], - data=data, - reuse=reuse, - wells_settings=None - if dtype == "grid" - else { - "lognames": pdata.params, - }, - ) - - return pdata, data - - def _dataload_and_calculation(self, project, data, reuse, dtype, qcdata=None): - """ Load data to XTGeo and xtract statistics. Can be """ - # create PropStatParameterData() instance and load parameters to xtgeo - pdata, data = self._input_preparations(project, data, reuse, dtype, qcdata) - - QCC.print_info("Extracting property statistics...") - # compute statistics - propstat = PropStat(parameter_data=pdata, xtgeo_data=self._xtgdata, data=data) - - self._propstats.append(propstat) - return propstat - - def _extract_statistics(self, project, data, reuse, dtype, qcdata): - """ - Single statistics extraction, or multiple if multiple filters are defined. - All PropStat() instances will be appended to the self._propstats list and - are used to create a merged dataframe for the instance. - - Returns: A single PropStat() instance or a list of PropStat() intances if - multiple filters are used. - """ - QCC.verbosity = data.get("verbosity", 0) - - if "multiple_filters" in data: - propstats = [] - for name, filters in data["multiple_filters"].items(): - QCC.print_info( - f"Starting run with name '{name}', " f"using filters {filters}" - ) - usedata = data.copy() - usedata["filters"] = filters - usedata["name"] = name - pstat = self._dataload_and_calculation( - project, data=usedata, reuse=True, dtype=dtype, qcdata=qcdata - ) - propstats.append(pstat) - return propstats - else: - return self._dataload_and_calculation(project, data, reuse, dtype, qcdata) - - def _initiate_from_config(self, cfg, project=None, reuse=False): - """ Run methods for statistics extraction based on entries in yaml-config""" - + def _initiate_from_config(self, cfg: str, project: Optional[object]): + """Run methods for statistics extraction based on entries in yaml-config""" with open(cfg, "r") as stream: data = yaml.safe_load(stream) if "grid" in data: for item in data["grid"]: - self.get_grid_statistics(data=item, project=project, reuse=reuse) + self.get_grid_statistics(data=item, project=project) if "wells" in data: for item in data["wells"]: - self.get_well_statistics(data=item, project=project, reuse=reuse) + self.get_well_statistics(data=item, project=project) if "blockedwells" in data: for item in data["blockedwells"]: - self.get_bwell_statistics(data=item, project=project, reuse=reuse) + self.get_bwell_statistics(data=item, project=project) - def _create_dataframe(self, dframe, discrete=False): + def _create_or_return_dataframe(self): """ - Combine dataframe from all PropStat() instances. Update dataframe if - out of sync with self._propstats + Combine dataframes from all runs within the instance. + Only update dataframe if more data have been run within the + instance, else return previous dataframe. """ - if (self._propstats and dframe.empty) or ( - len(self._propstats) != len(dframe["ID"].unique()) - ): - dframe = combine_property_statistics( - self._propstats, discrete=discrete, verbosity=QCC.verbosity - ) + dframe = self._dataframe + dframes = self._dfs + + if dframe.empty or len(dframes) > len(dframe["ID"].unique()): + QCC.print_debug("Updating combined dataframe") + self._warn_if_different_property_types() + dframe = pd.concat(dframes) + + # fill NaN with "Total" for dataframes with missing selectors + dframe[self._selectors_all] = dframe[self._selectors_all].fillna("Total") + + # Specify column order in statistics dataframe + cols_first = ["PROPERTY"] + self._selectors_all + dframe = dframe[ + cols_first + [x for x in dframe.columns if x not in cols_first] + ] return dframe + def _warn_if_different_property_types(self): + """Give warning if dataframes have different property types""" + if not all(ptype == self._proptypes_all[0] for ptype in self._proptypes_all): + QCC.give_warn( + "Merging statistics dataframes from different property types " + "(continous/discrete). Is this intentional?" + ) + + def _adjust_id_if_duplicate(self, run_id: str) -> str: + """ + Check for equal run ids, modify ids + by adding a number to get them unique. + """ + check_id = run_id + count = 0 + while check_id in self._ids: + check_id = f"{run_id}({count+1})" + count += 1 + return check_id + + def _set_dataframe_id_and_class_attributes( + self, statistics: PropertyAggregation, source: str, run_id: str + ): + """ + Set source and id column of statistics datframe, and different + class attributes. + """ + run_id = self._adjust_id_if_duplicate(run_id) + # set id and source columns in statistics dataframe + statistics.dataframe["ID"] = run_id + statistics.dataframe["SOURCE"] = source + + self._ids.append(run_id) + self._dfs.append(statistics.dataframe) + + for selector in statistics.controls["selectors"]: + if selector not in self._selectors_all: + self._selectors_all.append(selector) + + self._proptypes_all.append(statistics.controls["property_type"]) + + # pylint: disable = no-self-argument, not-callable + def _check_multiple_filters(method): + """Decorator function for extracting statistics with different filters""" + + def wrapper(self, **kwargs): + if "multiple_filters" in kwargs["data"]: + for name, filters in kwargs["data"]["multiple_filters"].items(): + kwargs["data"].update(filters=filters, name=name) + method(self, **kwargs) + return self.dataframe + return method(self, **kwargs) + + return wrapper + + @_check_multiple_filters + def _extract_statistics( + self, dtype: str, data: dict, project: Optional[object], source: str + ): + """Create dataframe from properties and extract statistics""" + QCC.verbosity = data.get("verbosity", 0) + QCC.print_info("Starting run...") + + # Create Property dataframe from input (using XTGeo) + property_data = ( + GridProps2df(project=project, data=data, xtgdata=self._xtgdata) + if dtype == "grid" + else WellLogs2df( + project=project, + data=data, + xtgdata=self._xtgdata, + blockedwells=dtype == "bwells", + ) + ) + + # Compute statistics + stats = PropertyAggregation(property_data) + + self._set_dataframe_id_and_class_attributes( + stats, + source=source, + run_id=data.get("name", source), + ) + + return stats.dataframe + # QC methods: # ================================================================================== def get_grid_statistics( self, data: dict, - project: object = None, - reuse: bool = False, - qcdata: QCData = None, - ) -> PropStat: + project: Optional[object] = None, + ) -> pd.DataFrame: """Extract property statistics from 3D Grid""" return self._extract_statistics( - project, data, reuse, dtype="grid", qcdata=qcdata + dtype="grid", + data=data, + project=project, + source=data.get("source", Path(data["grid"]).stem), ) def get_well_statistics( self, data: dict, - project: object = None, - reuse: bool = False, - qcdata: QCData = None, - ) -> PropStat: - """Extract property statistics from wells """ + project: Optional[object] = None, + ) -> pd.DataFrame: + """Extract property statistics from wells""" return self._extract_statistics( - project, data, reuse, dtype="wells", qcdata=qcdata + dtype="wells", + data=data, + project=project, + source=data.get("source", "wells"), ) def get_bwell_statistics( self, data: dict, - project: object = None, - reuse: bool = False, - qcdata: QCData = None, - ) -> PropStat: - """Extract property statistics from blocked wells """ + project: Optional[object] = None, + ) -> pd.DataFrame: + """Extract property statistics from blocked wells""" return self._extract_statistics( - project, data, reuse, dtype="bwells", qcdata=qcdata + dtype="bwells", + data=data, + project=project, + source=data.get( + "source", + "bwells" if project is None else data["wells"].get("bwname", "BW"), + ), ) - def from_yaml(self, cfg: str, project: object = None, reuse: bool = False): - """ Use yaml-configuration file to run the statistics extractions methods.""" - self._initiate_from_config(cfg, project, reuse) - - def to_csv(self, csvfile: str, disc: bool = False): - """ Write combined dataframe to csv """ - dframe = self.dataframe if not disc else self.dataframe_disc - dframe.to_csv(csvfile, index=False) + def from_yaml(self, cfg: str, project: Optional[object] = None): + """Use yaml-configuration file to run the statistics extractions methods""" + self._initiate_from_config(cfg, project) - QCC.print_info(f"Dataframe with {'discrete' if disc else 'continous'} ") - QCC.print_info(f"property statistics written to {csvfile}") + def to_csv(self, csvfile: str): + """Write combined dataframe to csv""" + self.dataframe.to_csv(csvfile, index=False) + QCC.print_info(f"Dataframe with statistics written to {csvfile}") diff --git a/tests/qcforward/test_qcforward_grid_statistics.py b/tests/qcforward/test_grid_statistics.py similarity index 62% rename from tests/qcforward/test_qcforward_grid_statistics.py rename to tests/qcforward/test_grid_statistics.py index 9e3e4391..946a4feb 100644 --- a/tests/qcforward/test_qcforward_grid_statistics.py +++ b/tests/qcforward/test_grid_statistics.py @@ -5,16 +5,23 @@ from fmu.tools import qcforward as qcf -PATH = abspath("../xtgeo-testdata/3dgrids/reek/") -GRID = "reek_sim_grid.roff" - -REPORT = abspath("/tmp/somefile.csv") SOMEYAML = abspath("/tmp/somefile.yml") +REPORT = abspath("/tmp/somefile.csv") -def test_simple_action(): +@pytest.fixture(name="data") +def fixture_data(): + return { + "path": abspath("../xtgeo-testdata/3dgrids/reek/"), + "grid": "reek_sim_grid.roff", + "report": REPORT, + "verbosity": 1, + } + - actions = [ +def test_simple_action(data): + + data["actions"] = [ { "property": "reek_sim_poro.roff", "warn_outside": [0.18, 0.25], @@ -23,13 +30,6 @@ def test_simple_action(): }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT, - "actions": actions, - } qcjob = qcf.GridStatistics() qcjob.run(data) @@ -43,9 +43,9 @@ def test_simple_action(): pathlib.Path(REPORT).unlink() -def test_action_with_disc_and_cont_props(): +def test_action_with_disc_and_cont_props(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "warn_outside": [0.18, 0.25], @@ -55,36 +55,37 @@ def test_action_with_disc_and_cont_props(): { "property": "reek_sim_facies2.roff", "codename": "SHALE", - "warn_outside": [30, 70], - "stop_outside": [0, 100], + "warn_outside": [0.3, 0.7], + "stop_outside": [0, 1], "description": "test2", }, + { + "property": "reek_sim_facies2.roff", + "codename": "SHALE", + "selectors": {"reek_sim_zone.roff": "Below_Top_reek"}, + "warn_outside": [0.3, 0.7], + "stop_outside": [0, 1], + "description": "test3", + }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT, - "actions": actions, - } qcjob = qcf.GridStatistics() qcjob.run(data) dfr = pd.read_csv(REPORT) - assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["STATUS"] == "WARN" - assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["VALUE"] == pytest.approx( - 0.1677, 0.001 - ) - assert dfr.loc[dfr["CALCULATION"] == "Percent"].iloc[0]["VALUE"] == pytest.approx( - 58.50, abs=0.01 - ) + assert dfr.loc[dfr["PROPERTY"] == "reek_sim_poro.roff"].iloc[0]["STATUS"] == "WARN" + assert dfr.loc[dfr["PROPERTY"] == "reek_sim_poro.roff"].iloc[0][ + "VALUE" + ] == pytest.approx(0.1677, 0.001) + assert dfr.loc[ + (dfr["PROPERTY"] == "reek_sim_facies2.roff") & (dfr["DESCRIPTION"] == "test2") + ].iloc[0]["VALUE"] == pytest.approx(0.585, abs=0.001) pathlib.Path(REPORT).unlink() -def test_multiple_actions(): +def test_multiple_actions(data): zones_stop = [ ["Below_Top_reek", [0.1, 0.3]], @@ -92,20 +93,13 @@ def test_multiple_actions(): ["Below_Low_reek", [0.1, 0.20]], ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT + "1", - } - actions = [] - for zs in zones_stop: + for zstop in zones_stop: actions.append( { "property": "reek_sim_poro.roff", - "selectors": {"reek_sim_zone.roff": zs[0]}, - "stop_outside": zs[1], + "selectors": {"reek_sim_zone.roff": zstop[0]}, + "stop_outside": zstop[1], } ) @@ -113,14 +107,14 @@ def test_multiple_actions(): qcjob = qcf.GridStatistics() qcjob.run(data) - dfr = pd.read_csv(REPORT + "1") + dfr = pd.read_csv(REPORT) print(dfr) - pathlib.Path(REPORT + "1").unlink() + pathlib.Path(REPORT).unlink() -def test_action_with_selectors(): +def test_action_with_selectors(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "selectors": {"reek_sim_zone.roff": "Below_Mid_reek"}, @@ -129,29 +123,22 @@ def test_action_with_selectors(): }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT + "2", - "actions": actions, - } qcjob = qcf.GridStatistics() qcjob.run(data) - dfr = pd.read_csv(REPORT + "2") + dfr = pd.read_csv(REPORT) assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["STATUS"] == "WARN" assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["VALUE"] == pytest.approx( 0.1606, 0.001 ) - pathlib.Path(REPORT + "2").unlink() + pathlib.Path(REPORT).unlink() -def test_action_with_filters(): +def test_action_with_filters(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "filters": { @@ -165,29 +152,22 @@ def test_action_with_filters(): }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT + "3", - "actions": actions, - } qcjob = qcf.GridStatistics() qcjob.run(data) - dfr = pd.read_csv(REPORT + "3") + dfr = pd.read_csv(REPORT) assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["STATUS"] == "OK" assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["VALUE"] == pytest.approx( 0.2384, 0.001 ) - pathlib.Path(REPORT + "3").unlink() + pathlib.Path(REPORT).unlink() -def test_action_with_filters_and_selectors(): +def test_action_with_filters_and_selectors(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "selectors": {"reek_sim_zone.roff": "Below_Mid_reek"}, @@ -201,29 +181,22 @@ def test_action_with_filters_and_selectors(): }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT + "4", - "actions": actions, - } qcjob = qcf.GridStatistics() qcjob.run(data) - dfr = pd.read_csv(REPORT + "4") + dfr = pd.read_csv(REPORT) assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["STATUS"] == "OK" assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["VALUE"] == pytest.approx( 0.2384, 0.001 ) - pathlib.Path(REPORT + "4").unlink() + pathlib.Path(REPORT).unlink() -def test_actions_shall_stop(): +def test_actions_shall_stop(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "warn_outside": [0.17, 0.4], @@ -231,40 +204,28 @@ def test_actions_shall_stop(): } ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "actions": actions, - } qcjob = qcf.GridStatistics() with pytest.raises(SystemExit): qcjob.run(data) -def test_actions_shall_stop_no_warnlimits(): +def test_actions_shall_stop_no_warnlimits(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "stop_outside": [0.20, 1], } ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "actions": actions, - } qcjob = qcf.GridStatistics() with pytest.raises(SystemExit): qcjob.run(data) -def test_actions_with_selectors(): +def test_actions_with_selectors(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "selectors": { @@ -276,29 +237,23 @@ def test_actions_with_selectors(): "calculation": "Avg", }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT + "5", - "actions": actions, - } + qcjob = qcf.GridStatistics() qcjob.run(data) - dfr = pd.read_csv(REPORT + "5") + dfr = pd.read_csv(REPORT) assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["STATUS"] == "WARN" assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["VALUE"] == pytest.approx( 0.3117, 0.001 ) - pathlib.Path(REPORT + "5").unlink() + pathlib.Path(REPORT).unlink() -def test_yaml_dump(): +def test_yaml_dump(data): - actions = [ + data["actions"] = [ { "property": "reek_sim_poro.roff", "warn_outside": [0.18, 0.25], @@ -306,26 +261,19 @@ def test_yaml_dump(): }, ] - data = { - "nametag": "MYDATA1", - "path": PATH, - "grid": GRID, - "report": REPORT + "6", - "actions": actions, - "dump_yaml": SOMEYAML, - } + data["dump_yaml"] = SOMEYAML qcjob = qcf.GridStatistics() qcjob.run(data) # now read the dump file: qcjob.run(data=SOMEYAML) - dfr = pd.read_csv(REPORT + "6") + dfr = pd.read_csv(REPORT) assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["STATUS"] == "WARN" assert dfr.loc[dfr["CALCULATION"] == "Avg"].iloc[0]["VALUE"] == pytest.approx( 0.1677, 0.001 ) - pathlib.Path(REPORT + "6").unlink() + pathlib.Path(REPORT).unlink() pathlib.Path(SOMEYAML).unlink() diff --git a/tests/qcproperties/conftest.py b/tests/qcproperties/conftest.py new file mode 100644 index 00000000..e1533d32 --- /dev/null +++ b/tests/qcproperties/conftest.py @@ -0,0 +1,51 @@ +from os.path import abspath +import pytest + + +@pytest.fixture() +def data_grid(): + return { + "path": abspath("../xtgeo-testdata/3dgrids/reek/"), + "grid": "reek_sim_grid.roff", + "properties": { + "PORO": {"name": "reek_sim_poro.roff"}, + "PERM": {"name": "reek_sim_permx.roff"}, + }, + "selectors": { + "ZONE": {"name": "reek_sim_zone.roff"}, + "FACIES": {"name": "reek_sim_facies2.roff"}, + }, + "verbosity": 1, + } + + +@pytest.fixture() +def data_wells(): + return { + "path": abspath("../xtgeo-testdata/wells/reek/1/"), + "wells": ["OP_*.w"], + "properties": { + "PORO": {"name": "Poro"}, + "PERM": {"name": "Perm"}, + }, + "selectors": { + "ZONE": {"name": "Zonelog"}, + "FACIES": {"name": "Facies"}, + }, + "verbosity": 1, + } + + +@pytest.fixture() +def data_bwells(): + return { + "path": abspath("../xtgeo-testdata/wells/reek/1/"), + "wells": ["OP_1.bw"], + "properties": { + "PORO": {"name": "Poro"}, + }, + "selectors": { + "FACIES": {"name": "Facies"}, + }, + "verbosity": 1, + } diff --git a/tests/data/propstatistics/propstat.yml b/tests/qcproperties/data/propstat.yml similarity index 97% rename from tests/data/propstatistics/propstat.yml rename to tests/qcproperties/data/propstat.yml index 2dccf401..92ac647b 100644 --- a/tests/data/propstatistics/propstat.yml +++ b/tests/qcproperties/data/propstat.yml @@ -17,12 +17,12 @@ common_well_data: &common_well_data name: Perm common_bwell_data: &common_bwell_data - path: tests/data/propstatistics + path: ../xtgeo-testdata/wells/reek/1/ wells: [OP_1.bw] properties: PORO: name: Poro - + grid: - <<: *common_grid_data selectors: diff --git a/tests/qcproperties/test_qcproperties.py b/tests/qcproperties/test_qcproperties.py new file mode 100644 index 00000000..5d818fab --- /dev/null +++ b/tests/qcproperties/test_qcproperties.py @@ -0,0 +1,379 @@ +from pathlib import Path +import pytest +from fmu.tools.qcdata import QCData +from fmu.tools.qcproperties.qcproperties import QCProperties +from fmu.tools.qcproperties._grid2df import GridProps2df +from fmu.tools.qcproperties._well2df import WellLogs2df + + +class TestProperties2df: + """Tests related to generation of dataframe from properties """ + + def test_wells(self, data_wells): + """Test creating property dataframe from wells""" + pdf = WellLogs2df(data=data_wells, project=None, xtgdata=QCData()) + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.1539, abs=0.001) + assert pdf.dataframe["PORO"].max() == pytest.approx(0.3661, abs=0.001) + assert set(pdf.dataframe.columns) == set(["PORO", "PERM", "ZONE", "FACIES"]) + + def test_blockedwells(self, data_bwells): + """Test creating property dataframe from blocked wells""" + pdf = WellLogs2df( + data=data_bwells, project=None, xtgdata=QCData(), blockedwells=True + ) + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.1709, abs=0.001) + assert pdf.dataframe["PORO"].max() == pytest.approx(0.3640, abs=0.001) + assert set(pdf.dataframe.columns) == set(["PORO", "FACIES"]) + + def test_gridprops(self, data_grid): + """Test creating property dataframe from grid properties""" + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.1677, abs=0.001) + assert pdf.dataframe["PORO"].max() == pytest.approx(0.3613, abs=0.001) + assert set(pdf.dataframe.columns) == set(["PORO", "PERM", "ZONE", "FACIES"]) + + def test_props_and_selectors_as_list(self, data_grid): + """Test """ + data_grid["properties"] = ["reek_sim_poro.roff", "reek_sim_permx.roff"] + data_grid["selectors"] = ["reek_sim_zone.roff", "reek_sim_facies2.roff"] + + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + assert pdf.dataframe["reek_sim_poro.roff"].mean() == pytest.approx( + 0.1677, abs=0.001 + ) + assert pdf.dataframe["reek_sim_poro.roff"].max() == pytest.approx( + 0.3613, abs=0.001 + ) + assert set(pdf.dataframe.columns) == set( + [ + "reek_sim_poro.roff", + "reek_sim_permx.roff", + "reek_sim_zone.roff", + "reek_sim_facies2.roff", + ] + ) + + def test_filters(self, data_grid): + """Test filters as argument""" + data_grid["filters"] = { + "reek_sim_facies2.roff": { + "include": ["FINESAND", "COARSESAND"], + } + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert ["FINESAND", "COARSESAND"] == list(pdf.dataframe["FACIES"].unique()) + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.2374, abs=0.001) + + data_grid["filters"] = { + "reek_sim_facies2.roff": { + "exclude": "FINESAND", + } + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert "FINESAND" not in list(pdf.dataframe["FACIES"].unique()) + + data_grid["filters"] = { + "reek_sim_poro.roff": { + "range": [0.15, 0.25], + } + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.2027, abs=0.001) + assert pdf.dataframe["PORO"].min() > 0.15 + assert pdf.dataframe["PORO"].max() < 0.25 + + def test_selector_filters(self, data_grid): + """Test filters on selector""" + data_grid["selectors"] = { + "FACIES": {"name": "reek_sim_facies2.roff", "include": "FINESAND"}, + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert ["FINESAND"] == list(pdf.dataframe["FACIES"].unique()) + + # test exclude values using list + data_grid["selectors"] = { + "FACIES": { + "name": "reek_sim_facies2.roff", + "exclude": ["FINESAND", "SHALE"], + }, + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert "FINESAND" not in list(pdf.dataframe["FACIES"].unique()) + assert "SHALE" not in list(pdf.dataframe["FACIES"].unique()) + + def test_filters_and_selector_filters(self, data_grid): + """ + Test filters on both selector and as separate argument + Wanted behaviour is to ignore the filter on the selector + """ + data_grid["selectors"] = { + "FACIES": {"name": "reek_sim_facies2.roff", "exclude": "FINESAND"}, + } + data_grid["filters"] = { + "reek_sim_facies2.roff": { + "include": ["FINESAND", "COARSESAND"], + } + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert ["FINESAND", "COARSESAND"] == list(pdf.dataframe["FACIES"].unique()) + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.2374, abs=0.001) + + def test_filters_and_property_filters(self, data_grid): + """ + Test filters on both properties and as separate argument. + Wanted behaviour is to ignore the filter on the property + """ + data_grid["properties"] = { + "PORO": {"name": "reek_sim_poro.roff", "range": [0.2, 0.4]}, + } + data_grid["filters"] = { + "reek_sim_poro.roff": { + "range": [0.15, 0.25], + } + } + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert pdf.dataframe["PORO"].mean() == pytest.approx(0.2027, abs=0.001) + assert pdf.dataframe["PORO"].min() > 0.15 + assert pdf.dataframe["PORO"].max() < 0.25 + + def test_codenames(self, data_grid): + """Test modifying codenames on selectors""" + + data_grid["selectors"] = { + "ZONE": {"name": "reek_sim_zone.roff", "codes": {1: "TOP", 2: "MID"}}, + "FACIES": { + "name": "reek_sim_facies2.roff", + "codes": {1: "SAND", 2: "SAND"}, + }, + } + + pdf = GridProps2df(data=data_grid, project=None, xtgdata=QCData()) + + assert set(["TOP", "MID", "Below_Low_reek"]) == { + x for x in list(pdf.dataframe["ZONE"].unique()) if x is not None + } + + assert set(["SAND", "SHALE"]) == { + x for x in list(pdf.dataframe["FACIES"].unique()) if x is not None + } + + +class TestStatistics: + """Tests for extracting statistics with QCProperties """ + + def test_gridprops(self, data_grid): + """Test extracting statsitics from grid properties""" + + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert set(qcp.dataframe["PROPERTY"].unique()) == set(["PORO", "PERM"]) + + row = qcp.dataframe[ + (qcp.dataframe["ZONE"] == "Total") + & (qcp.dataframe["FACIES"] == "Total") + & (qcp.dataframe["PROPERTY"] == "PORO") + ] + assert row["Avg"].values == pytest.approx(0.1677, abs=0.001) + assert row["Max"].values == pytest.approx(0.3613, abs=0.001) + + def test_wells(self, data_wells): + """Test extracting statsitics from well logs""" + qcp = QCProperties() + qcp.get_well_statistics(data_wells) + + assert set(qcp.dataframe["PROPERTY"].unique()) == set(["PORO", "PERM"]) + assert set(qcp.dataframe["ZONE"].unique()) == set( + [ + "Above_TopUpperReek", + "Below_TopLowerReek", + "Below_TopMidReek", + "Below_TopUpperReek", + "Below_BaseLowerReek", + "Total", + ] + ) + + row = qcp.dataframe[ + (qcp.dataframe["ZONE"] == "Total") + & (qcp.dataframe["FACIES"] == "Total") + & (qcp.dataframe["PROPERTY"] == "PORO") + ] + assert row["Avg"].values == pytest.approx(0.1539, abs=0.001) + assert row["Max"].values == pytest.approx(0.3661, abs=0.001) + + def test_blockedwells(self, data_bwells): + """Test extracting statsitics from blocked well logs""" + qcp = QCProperties() + qcp.get_bwell_statistics(data_bwells) + + assert list(qcp.dataframe["PROPERTY"].unique()) == ["PORO"] + + row = qcp.dataframe[ + (qcp.dataframe["FACIES"] == "Total") & (qcp.dataframe["PROPERTY"] == "PORO") + ] + assert row["Avg"].values == pytest.approx(0.1709, abs=0.001) + assert row["Max"].values == pytest.approx(0.3640, abs=0.001) + + def test_continous_properties(self, data_grid): + """Test extracting statsitics on continous properties""" + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert set(qcp.dataframe.columns) == set( + [ + "Avg_Weighted", + "Avg", + "Count", + "FACIES", + "Max", + "Min", + "P10", + "P90", + "PROPERTY", + "Stddev", + "ZONE", + "SOURCE", + "ID", + ] + ) + assert qcp._proptypes_all[0] == "CONT" + + def test_discrete_properties(self, data_grid): + """Test extracting statsitics on discrete properties""" + data_grid["properties"] = { + "FACIES": {"name": "reek_sim_facies2.roff"}, + } + data_grid["selectors"] = { + "ZONE": {"name": "reek_sim_zone.roff"}, + } + + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert set(qcp.dataframe.columns) == set( + [ + "Avg_Weighted", + "Avg", + "Count", + "FACIES", + "PROPERTY", + "ZONE", + "SOURCE", + "ID", + ] + ) + assert qcp._proptypes_all[0] == "DISC" + assert list(qcp.dataframe["PROPERTY"].unique()) == ["FACIES"] + assert set(qcp.dataframe["FACIES"].unique()) == set( + ["FINESAND", "COARSESAND", "SHALE"] + ) + row = qcp.dataframe[ + (qcp.dataframe["ZONE"] == "Total") & (qcp.dataframe["FACIES"] == "FINESAND") + ] + assert row["Avg"].values == pytest.approx(0.4024, abs=0.001) + + def test_set_id(self, data_grid): + """Test extracting statsitics on continous properties""" + data_grid["name"] = "Test_case" + + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + assert ["Test_case"] == list(qcp.dataframe["ID"].unique()) + + qcp.get_grid_statistics(data_grid) + assert ["Test_case", "Test_case(1)"] == qcp.dataframe["ID"].unique().tolist() + + def test_no_selectors(self, data_grid): + """Test running without selectors""" + data_grid.pop("selectors", None) + + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert len(qcp.dataframe) == 2 + assert qcp.dataframe[qcp.dataframe["PROPERTY"] == "PORO"][ + "Avg" + ].values == pytest.approx(0.1677, abs=0.001) + + def test_no_selector_combos(self, data_grid): + """Test running without selector_combos""" + data_grid["selector_combos"] = False + + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert ["Total"] == list( + qcp.dataframe[qcp.dataframe["ZONE"] == "Total"]["FACIES"].unique() + ) + + def test_multiple_filters(self, data_grid): + """Test running two statistics extractions using multiple_filters""" + data_grid.pop("selectors", None) + data_grid["multiple_filters"] = { + "test1": { + "reek_sim_facies2.roff": { + "include": ["SHALE"], + } + }, + "test2": { + "reek_sim_facies2.roff": { + "exclude": ["SHALE"], + } + }, + } + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert set(["test1", "test2"]) == set(qcp.dataframe["ID"].unique()) + assert qcp.dataframe[ + (qcp.dataframe["PROPERTY"] == "PORO") & (qcp.dataframe["ID"] == "test1") + ]["Avg"].values == pytest.approx(0.1183, abs=0.001) + + def test_read_eclipse_init(self, data_grid): + """Test reading property from INIT-file""" + data_grid["grid"] = "REEK.EGRID" + data_grid["properties"] = { + "PORO": {"name": "PORO", "pfile": "REEK.INIT"}, + "PERM": {"name": "PERMX", "pfile": "REEK.INIT"}, + } + data_grid["selectors"] = { + "REGION": {"name": "FIPNUM", "pfile": "REEK.INIT"}, + } + + qcp = QCProperties() + qcp.get_grid_statistics(data_grid) + + assert ["REEK"] == list(qcp.dataframe["ID"].unique()) + assert qcp.dataframe[ + (qcp.dataframe["PROPERTY"] == "PORO") & (qcp.dataframe["REGION"] == "2") + ]["Avg"].values == pytest.approx(0.1661, abs=0.001) + + +class TestStatisticsMultipleSources: + """Tests for extracting statistics from different sources """ + + def test_auto_combination(self, data_grid, data_wells, data_bwells): + """Tests combining statistic """ + qcp = QCProperties() + + qcp.get_grid_statistics(data_grid) + assert len(qcp.dataframe["ID"].unique()) == 1 + + qcp.get_well_statistics(data_wells) + assert len(qcp.dataframe["ID"].unique()) == 2 + + qcp.get_bwell_statistics(data_bwells) + assert len(qcp.dataframe["ID"].unique()) == 3 + + def test_from_yaml(self): + """Tests extracting statistics from yaml-file """ + qcp = QCProperties() + yaml_input = Path(__file__).parent / "data/propstat.yml" + qcp.from_yaml(yaml_input) diff --git a/tests/qcproperties/test_qcproperties_grid.py b/tests/qcproperties/test_qcproperties_grid.py deleted file mode 100644 index 79f971d3..00000000 --- a/tests/qcproperties/test_qcproperties_grid.py +++ /dev/null @@ -1,255 +0,0 @@ -from os.path import abspath -import pytest - -from fmu.tools.qcproperties.qcproperties import QCProperties - -PATH = abspath("../xtgeo-testdata/3dgrids/reek/") -GRID = "reek_sim_grid.roff" -PROPERTIES = { - "PORO": {"name": "reek_sim_poro.roff"}, - "PERM": {"name": "reek_sim_permx.roff"}, -} -SELECTORS = { - "ZONE": {"name": "reek_sim_zone.roff"}, - "FACIES": {"name": "reek_sim_facies2.roff"}, -} - -data_orig = { - "path": PATH, - "grid": GRID, - "properties": PROPERTIES, - "selectors": SELECTORS, - "verbosity": 1, -} - - -def test_full_dataframe(): - data = data_orig.copy() - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.1677, abs=0.001) - assert stat.property_dataframe["PORO"].max() == pytest.approx(0.3613, abs=0.001) - assert set(stat.property_dataframe.columns) == set( - ["PORO", "PERM", "ZONE", "FACIES"] - ) - - -def test_no_selectors(): - data = data_orig.copy() - data.pop("selectors", None) - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - assert set(stat.property_dataframe.columns) == set(["PORO", "PERM"]) - - -def test_statistics(): - data = data_orig.copy() - data["name"] = "Test_case" - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - - assert set(stat.dataframe.columns) == set( - [ - "Avg_Weighted", - "Avg", - "FACIES", - "Max", - "Min", - "P10", - "P90", - "PROPERTY", - "Stddev", - "ZONE", - "SOURCE", - "ID", - ] - ) - assert list(stat.dataframe["ID"].unique())[0] == data["name"] - assert set(stat.dataframe["PROPERTY"].unique()) == set(["PORO", "PERM"]) - assert stat.dataframe[stat.dataframe["PROPERTY"] == "PORO"][ - "Avg" - ].max() == pytest.approx(0.3138, abs=0.001) - - row = stat.dataframe[ - (stat.dataframe["ZONE"] == "Total") - & (stat.dataframe["FACIES"] == "Total") - & (stat.dataframe["PROPERTY"] == "PORO") - ] - assert row["Avg"].values == pytest.approx(0.1677, abs=0.001) - - -def test_statistics_no_combos(): - data = data_orig.copy() - data["selector_combos"] = False - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - - assert ["Total"] == list( - stat.dataframe[stat.dataframe["ZONE"] == "Total"]["FACIES"].unique() - ) - - -def test_codenames(): - data = data_orig.copy() - - qcp = QCProperties() - stat_no_code = qcp.get_grid_statistics(data) - - data["selectors"] = { - "ZONE": {"name": "reek_sim_zone.roff", "codes": {1: "TOP", 2: "MID"}}, - "FACIES": { - "name": "reek_sim_facies2.roff", - }, - } - - stat = qcp.get_grid_statistics(data, reuse=True) - - assert set(["TOP", "MID", "Below_Low_reek", "Total"]) == { - x for x in list(stat.dataframe["ZONE"].unique()) if x is not None - } - assert set(["Below_Top_reek", "Below_Mid_reek", "Below_Low_reek", "Total"]) == { - x for x in list(stat_no_code.dataframe["ZONE"].unique()) if x is not None - } - - -def test_extract_statistics_update_filter_parameter(): - """Test changing filters after initialization""" - data = data_orig.copy() - data["selectors"] = ["reek_sim_zone.roff"] - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.1677, abs=0.001) - assert set(stat.property_dataframe.columns) == set( - [ - "PORO", - "PERM", - "reek_sim_zone.roff", - ] - ) - stat.extract_statistics( - filters={ - "reek_sim_facies2.roff": { - "include": ["FINESAND", "COARSESAND"], - } - }, - ) - - assert set(stat.property_dataframe.columns) == set( - [ - "PORO", - "PERM", - "reek_sim_facies2.roff", - "reek_sim_zone.roff", - ] - ) - assert ["FINESAND", "COARSESAND"] == list( - stat.property_dataframe["reek_sim_facies2.roff"].unique() - ) - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.2374, abs=0.001) - - -def test_extract_statistics_update_filter_values(): - """Test changing filters after initialization""" - data = data_orig.copy() - data["selectors"] = { - "ZONE": {"name": "reek_sim_zone.roff", "exclude": ["Below_Top_reek"]}, - "FACIES": { - "name": "reek_sim_facies2.roff", - "include": ["FINESAND", "COARSESAND"], - }, - } - data["filters"] = { - "reek_sim_facies2.roff": { - "include": ["FINESAND", "COARSESAND"], - } - } - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - - assert "Below_Top_reek" not in list(stat.property_dataframe["ZONE"].unique()) - assert ["FINESAND", "COARSESAND"] == list( - stat.property_dataframe["FACIES"].unique() - ) - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.2390, abs=0.001) - - stat.extract_statistics( - filters={ - "reek_sim_facies2.roff": { - "include": ["SHALE"], - } - } - ) - assert "Below_Top_reek" not in list(stat.property_dataframe["ZONE"].unique()) - assert ["SHALE"] == list(stat.property_dataframe["FACIES"].unique()) - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.1155, abs=0.001) - - -def test_get_value(): - data = data_orig.copy() - - qcp = QCProperties() - stat = qcp.get_grid_statistics(data) - - assert stat.get_value("PORO") == pytest.approx(0.1677, abs=0.001) - assert stat.get_value("PORO", calculation="Max") == pytest.approx(0.3613, abs=0.001) - - conditions = {"ZONE": "Below_Top_reek", "FACIES": "COARSESAND"} - assert stat.get_value("PORO", conditions=conditions) == pytest.approx( - 0.3117, abs=0.001 - ) - conditions = {"ZONE": "Below_Top_reek"} - assert stat.get_value("PORO", conditions=conditions) == pytest.approx( - 0.1595, abs=0.001 - ) - - -def test_multiple_filters(): - data = data_orig.copy() - data.pop("selectors", None) - data["multiple_filters"] = { - "test1": { - "reek_sim_facies2.roff": { - "include": ["SHALE"], - } - }, - "test2": { - "reek_sim_facies2.roff": { - "exclude": ["SHALE"], - } - }, - } - qcp = QCProperties() - qcp.get_grid_statistics(data) - - assert set(["test1", "test2"]) == set(qcp.dataframe["ID"].unique()) - assert qcp.dataframe[ - (qcp.dataframe["PROPERTY"] == "PORO") & (qcp.dataframe["ID"] == "test1") - ]["Avg"].values == pytest.approx(0.1183, abs=0.001) - - -def test_read_eclipse(): - data = data_orig.copy() - data["grid"] = "REEK.EGRID" - data["properties"] = { - "PORO": {"name": "PORO", "pfile": "REEK.INIT"}, - "PERM": {"name": "PERMX", "pfile": "REEK.INIT"}, - } - data["selectors"] = { - "REGION": {"name": "FIPNUM", "pfile": "REEK.INIT"}, - } - - qcp = QCProperties() - qcp.get_grid_statistics(data) - - assert set(["REEK"]) == set(qcp.dataframe["ID"].unique()) - assert qcp.dataframe[ - (qcp.dataframe["PROPERTY"] == "PORO") & (qcp.dataframe["REGION"] == "2") - ]["Avg"].values == pytest.approx(0.1661, abs=0.001) diff --git a/tests/qcproperties/test_qcproperties_propstatset.py b/tests/qcproperties/test_qcproperties_propstatset.py deleted file mode 100644 index c9a800f1..00000000 --- a/tests/qcproperties/test_qcproperties_propstatset.py +++ /dev/null @@ -1,47 +0,0 @@ -from os.path import abspath -import yaml - -from fmu.tools.qcproperties.qcproperties import QCProperties - - -cfg_path = abspath("tests/data/propstatistics/propstat.yml") - -with open(cfg_path, "r") as stream: - cfg = yaml.safe_load(stream) - -GRIDDATA = cfg["grid"][0] -GRIDDATA["path"] = abspath("../xtgeo-testdata/3dgrids/reek/") - -WELLDATA = cfg["wells"][0] -WELLDATA["path"] = abspath("../xtgeo-testdata/wells/reek/1/") - -BWELLDATA = cfg["blockedwells"][0] -BWELLDATA["path"] = abspath("../xtgeo-testdata/wells/reek/1/") - - -def test_propstatset(): - - qcp = QCProperties() - - qcp.get_grid_statistics(GRIDDATA) - qcp.get_well_statistics(WELLDATA) - qcp.get_bwell_statistics(BWELLDATA) - - assert len(qcp.dataframe["ID"].unique()) == 3 - - -def test_propstatset_auto_combination(): - - qcp = QCProperties() - - qcp.get_grid_statistics(GRIDDATA) - - assert len(qcp.dataframe["ID"].unique()) == 1 - - qcp.get_well_statistics(WELLDATA) - - assert len(qcp.dataframe["ID"].unique()) == 2 - - qcp.get_bwell_statistics(BWELLDATA, reuse=True) - - assert len(qcp.dataframe["ID"].unique()) == 3 diff --git a/tests/qcproperties/test_qcproperties_wells.py b/tests/qcproperties/test_qcproperties_wells.py deleted file mode 100644 index 925d4ec2..00000000 --- a/tests/qcproperties/test_qcproperties_wells.py +++ /dev/null @@ -1,180 +0,0 @@ -# -*- coding: utf-8 -*- -"""Test code for RMS volumetrics parsing""" - -from os.path import abspath -import pytest - -from fmu.tools.qcproperties.qcproperties import QCProperties - -PATH = abspath("../xtgeo-testdata/wells/reek/1/") -WELLS = ["OP_*.w"] -BWELLS = ["OP_1.bw"] -PROPERTIES = { - "PORO": {"name": "Poro"}, - "PERM": {"name": "Perm"}, -} -SELECTORS = { - "ZONE": {"name": "Zonelog"}, - "FACIES": {"name": "Facies"}, -} - -data_orig_wells = { - "verbosity": 1, - "path": PATH, - "wells": WELLS, - "properties": PROPERTIES, - "selectors": SELECTORS, -} - -data_orig_bwells = { - "verbosity": 1, - "path": PATH, - "wells": BWELLS, - "properties": { - "PORO": {"name": "Poro"}, - }, - "selectors": { - "FACIES": {"name": "Facies"}, - }, -} - - -def test_full_dataframe_wells(): - data = data_orig_wells.copy() - - qcp = QCProperties() - stat = qcp.get_well_statistics(data) - - assert set(stat.property_dataframe.columns) == set( - ["ZONE", "PERM", "PORO", "FACIES"] - ) - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.1534, abs=0.001) - - -def test_filters_wells(): - data = data_orig_wells.copy() - data["selectors"] = { - "ZONE": { - "name": "Zonelog", - "exclude": [ - "Below_TopMidReek", - "Below_TopLowerReek", - "Below_BaseLowerReek", - ], - }, - "FACIES": {"name": "Facies", "include": ["Crevasse", "Channel"]}, - } - qcp = QCProperties() - stat = qcp.get_well_statistics(data) - - assert set(["Crevasse", "Channel", "Total"]) == set( - stat.dataframe["FACIES"].unique() - ) - assert set(stat.dataframe["ZONE"].unique()) == set( - ["Above_TopUpperReek", "Below_TopUpperReek", "Total"] - ) - - -def test_statistics_wells(): - data = data_orig_wells.copy() - data["name"] = "Raw_Logs" - - qcp = QCProperties() - stat = qcp.get_well_statistics(data) - stat = qcp.get_well_statistics(data) - assert set(stat.dataframe.columns) == set( - [ - "Avg_Weighted", - "Avg", - "FACIES", - "Max", - "Min", - "P10", - "P90", - "PROPERTY", - "Stddev", - "ZONE", - "SOURCE", - "ID", - ] - ) - assert list(stat.dataframe["ID"].unique())[0] == "Raw_Logs" - assert set(stat.dataframe["PROPERTY"].unique()) == set(["PORO", "PERM"]) - assert stat.dataframe[stat.dataframe["PROPERTY"] == "PORO"][ - "Avg" - ].max() == pytest.approx(0.3059, abs=0.001) - assert set(stat.dataframe["ZONE"].unique()) == set( - [ - "Above_TopUpperReek", - "Below_TopLowerReek", - "Below_TopMidReek", - "Below_TopUpperReek", - "Below_BaseLowerReek", - "Total", - ] - ) - - row = stat.dataframe[ - (stat.dataframe["ZONE"] == "Total") - & (stat.dataframe["FACIES"] == "Total") - & (stat.dataframe["PROPERTY"] == "PORO") - ] - assert row["Avg"].values == pytest.approx(0.1539, abs=0.001) - - -def test_full_dataframe_bwells(): - data = data_orig_bwells.copy() - data["wells"] = BWELLS - - qcp = QCProperties() - stat = qcp.get_bwell_statistics(data, reuse=True) - - assert set(stat.property_dataframe.columns) == set(["PORO", "FACIES"]) - assert stat.property_dataframe["PORO"].mean() == pytest.approx(0.1709, abs=0.001) - - -def test_filters_bwells(): - data = data_orig_bwells.copy() - data["wells"] = BWELLS - data["selectors"] = { - "FACIES": {"name": "Facies", "include": "Channel"}, - } - qcp = QCProperties() - stat = qcp.get_bwell_statistics(data, reuse=True) - - assert set(["Channel", "Total"]) == set(stat.dataframe["FACIES"].unique()) - - -def test_statistics_bwells(): - data = data_orig_bwells.copy() - data["wells"] = BWELLS - data["name"] = "Blocked_Logs" - - qcp = QCProperties() - stat = qcp.get_bwell_statistics(data, reuse=True) - - assert set(stat.dataframe.columns) == set( - [ - "Avg", - "Avg_Weighted", - "FACIES", - "Max", - "Min", - "P10", - "P90", - "PROPERTY", - "Stddev", - "SOURCE", - "ID", - ] - ) - assert list(stat.dataframe["ID"].unique())[0] == "Blocked_Logs" - assert set(stat.dataframe["PROPERTY"].unique()) == set(["PORO"]) - assert stat.dataframe[stat.dataframe["PROPERTY"] == "PORO"][ - "Avg" - ].max() == pytest.approx(0.2678, abs=0.001) - - row = stat.dataframe[ - (stat.dataframe["FACIES"] == "Total") & (stat.dataframe["PROPERTY"] == "PORO") - ] - assert row["Avg"].values == pytest.approx(0.1709, abs=0.001)