diff --git a/doc/esmvalcore/preprocessor.rst b/doc/esmvalcore/preprocessor.rst index 82bb54833a..d1c6487b60 100644 --- a/doc/esmvalcore/preprocessor.rst +++ b/doc/esmvalcore/preprocessor.rst @@ -22,7 +22,7 @@ following the default order in which they are applied: Overview ======== -.. +.. ESMValTool is a modular ``Python 3.6+`` software package possesing capabilities of executing a large number of diagnostic routines that can be written in a number of programming languages (Python, NCL, R, Julia). The modular nature @@ -180,7 +180,7 @@ extract the levels and vertically regrid onto the vertical levels of The extrapolation mode is controlled by the `extrapolation_mode` keyword. For the available interpolation schemes available in Iris, the extrapolation_mode keyword must be one of: - + * ``extrapolate``: the extrapolation points will be calculated by extending the gradient of the closest two points; * ``error``: a ``ValueError`` exception will be raised, notifying an @@ -290,7 +290,7 @@ available for the developer to use them as they need to. The `fx_files` attribute of the big `variable` nested dictionary that gets passed to the diagnostic is, in turn, a dictionary on its own, and members of it can be accessed in the diagnostic through a simple loop over the ``config`` diagnostic -variable items e.g.: +variable items e.g.: .. code-block:: @@ -309,14 +309,14 @@ numbers of missing data from dataset to dataset may introduce biases and artifically assign more weight to the datasets that have less missing data. This is handled in ESMValTool via the missing values masks: two types of such masks are available, one for the multimodel case and another for the -single model case. +single model case. The multimodel missing values mask (``mask_fillvalues``) is a preprocessor step that usually comes after all the single-model steps (regridding, area selection etc) have been performed; in a nutshell, it combines missing values masks from individual models into a multimodel missing values mask; the individual model masks are built according to common criteria: the user chooses a time window in -which missing data points are counted, and if the number of missing data points +which missing data points are counted, and if the number of missing data points relative to the number of total data points in a window is less than a chosen fractional theshold, the window is discarded i.e. all the points in the window are masked (set to missing). @@ -383,7 +383,7 @@ conceptually a very similar process to interpolation (in fact, the regridder engine uses interpolation and extrapolation, with various schemes). The primary difference is that interpolation is based on sample data points, while regridding is based on the horizontal grid of another cube (the reference -grid). +grid). The underlying regridding mechanism in ESMValTool uses the `cube.regrid() `_ @@ -391,7 +391,7 @@ from Iris. The use of the horizontal regridding functionality is flexible depending on what type of reference grid and what interpolation scheme is preferred. Below -we show a few examples. +we show a few examples. Regridding on a reference dataset grid -------------------------------------- @@ -425,7 +425,7 @@ cell specification is oftentimes used when operating on localized data. scheme: nearest In this case the ``NearestNeighbour`` interpolation scheme is used (see below -for scheme definitions). +for scheme definitions). When using a ``MxN`` type of grid it is possible to offset the grid cell centrepoints using the `lat_offset` and ``lon_offset`` arguments: @@ -507,7 +507,7 @@ to ``multi_model_statistics``. Multimodel statistics in ESMValTool are computed along the time axis, and as such, can be computed across a common overlap in time (by specifying ``span: overlap`` argument) or across the full length in time of each model (by -specifying ``span: full`` argument). +specifying ``span: full`` argument). Restrictive computation is also available by excluding any set of models that the user will not want to include in the statistics (by setting ``exclude: @@ -537,7 +537,7 @@ see also :func:`esmvalcore.preprocessor.multi_model_statistics`. for different run scenarios, but as a thumb rule, for the multimodel preprocessor, the expected maximum memory intake could be approximated as the number of datasets multiplied by the average size in memory for one - dataset. + dataset. .. _time operations: @@ -545,16 +545,39 @@ Time manipulation ================= The ``_time.py`` module contains the following preprocessor functions: -* ``extract_time``: Extract a time range from an Iris ``cube``. -* ``extract_season``: Extract only the times that occur within a specific - season. -* ``extract_month``: Extract only the times that occur within a specific month. -* ``time_average``: Take the weighted average over the time dimension. -* ``seasonal_mean``: Produces a mean for each season (DJF, MAM, JJA, SON) -* ``annual_mean``: Produces an annual or decadal mean. -* ``regrid_time``: Aligns the time axis of each dataset to have common time +* extract_time_: Extract a time range from a cube. +* extract_season_: Extract only the times that occur within a specific season. +* extract_month_: Extract only the times that occur within a specific month. +* daily_statistics_: Compute statistics for each day +* monthly_statistics_: Compute statistics for each month +* seasonal_statistics_: Compute statistics for each season +* annual_statistics_: Compute statistics for each year +* decadal_statistics_: Compute statistics for each decade +* climate_statistics_: Compute statistics for the full period +* anomalies_: Compute anomalies +* regrid_time_: Aligns the time axis of each dataset to have common time points and calendars. +Statistics functions are applied by default in the order they appear in the +list. For example, the following example applied to hourly data will retrieve +the minimum values for the full period (by season) of the monthly mean of the +daily maximum of any given variable. + +.. code-block:: yaml + + daily_statistics: + operator: max + + monthly_statistics: + operator: mean + + climate_statistics: + operator: min + period: season + + +.. _extract_time: + ``extract_time`` ---------------- @@ -575,6 +598,8 @@ will not be accepted. See also :func:`esmvalcore.preprocessor.extract_time`. +.. _extract_season: + ``extract_season`` ------------------ @@ -592,6 +617,8 @@ the seasonal_mean function, below. See also :func:`esmvalcore.preprocessor.extract_season`. +.. _extract_month: + ``extract_month`` ----------------- @@ -601,37 +628,155 @@ between 1 and 12 as the named month string will not be accepted. See also :func:`esmvalcore.preprocessor.extract_month`. -.. _time_average: +.. _daily_statistics: -``time_average`` ----------------- +``daily_statistics`` +-------------------- -This function takes the weighted average over the time dimension. This -function requires no arguments and removes the time dimension of the cube. +This function produces statistics for each day in the dataset. -See also :func:`esmvalcore.preprocessor.time_average`. +Parameters: + * operator: operation to apply. Accepted values are 'mean', + 'median', 'std_dev', 'min' and 'max'. Default is 'mean' -``seasonal_mean`` ------------------ +See also :func:`esmvalcore.preprocessor.daily_statistics`. + +.. _monthly_statistics: + +``monthly_statistics`` +---------------------- -This function produces a seasonal mean for each season (DJF, MAM, JJA, SON). -Note that this function will not check for missing time points. For instance, -if you are looking at the DJF field, but your datasets starts on January 1st, -the first DJF field will only contain data from January and February. +This function produces statistics for each month in the dataset. + +Parameters: + * operator: operation to apply. Accepted values are 'mean', + 'median', 'std_dev', 'min' and 'max'. Default is 'mean' + +See also :func:`esmvalcore.preprocessor.monthly_statistics`. + +.. _seasonal_statistics: + +``seasonal_statistics`` +----------------------- + +This function produces statistics for each season (DJF, MAM, JJA, SON) in the +dataset. Note that this function will not check for missing time points. +For instance, if you are looking at the DJF field, but your datasets +starts on January 1st, the first DJF field will only contain data +from January and February. We recommend using the extract_time to start the dataset from the following December and remove such biased initial datapoints. +Parameters: + * operator: operation to apply. Accepted values are 'mean', + 'median', 'std_dev', 'min' and 'max'. Default is 'mean' + See also :func:`esmvalcore.preprocessor.seasonal_mean`. -``annual_mean`` ---------------- +.. _annual_statistics: + +``annual_statistics`` +--------------------- + +This function produces statistics for each year. + +Parameters: + * operator: operation to apply. Accepted values are 'mean', + 'median', 'std_dev', 'min' and 'max'. Default is 'mean' + +See also :func:`esmvalcore.preprocessor.annual_statistics`. + +.. _decadal_statistics: + +``decadal_statistics`` +---------------------- + +This function produces statistics for each decade. + +Parameters: + * operator: operation to apply. Accepted values are 'mean', + 'median', 'std_dev', 'min' and 'max'. Default is 'mean' + +See also :func:`esmvalcore.preprocessor.decadal_statistics`. + +.. _climate_statistics: + +``climate_statistics`` +---------------------- + +This function produces statistics for the whole dataset. It can produce scalars +(if the full period is chosen) or daily, monthly or seasonal statics. + +Parameters: + * operator: operation to apply. Accepted values are 'mean', 'median', 'std_dev', + 'min' and 'max'. Default is 'mean' + + * period: define the granularity of the statistics: get values for the + full period, for each month or day of year. + Available periods: 'full', 'season', 'seasonal', 'monthly', 'month', + 'mon', 'daily', 'day'. Default is 'full' + +Examples: + * Monthly climatology: + + .. code-block:: yaml + + climate_statistics: + operator: mean + period: month + + * Daily maximum for the full period: + + .. code-block:: yaml + + climate_statistics: + operator: max + period: day + + * Minimum value in the period: + + .. code-block:: yaml + + climate_statistics: + operator: min + period: full + +See also :func:`esmvalcore.preprocessor.climate_statistics`. + +.. _anomalies: + +``anomalies`` +---------------------- + +This function computes the anomalies for the whole dataset. It can compute +anomalies from the full, seasonal, monthly and daily climatologies. + +Parameters: + * period: define the granularity of the climatology to use: + full period, seasonal, monthly or daily. + Available periods: 'full', 'season', 'seasonal', 'monthly', 'month', + 'mon', 'daily', 'day'. Default is 'full' + +Examples: + + * Anomalies from the monthly climatology: + + .. code-block:: yaml + + anomalies: + period: month + + * Anomalies from the full period climatology: + + .. code-block:: yaml + + anomalies: + +See also :func:`esmvalcore.preprocessor.anomalies`. -This function produces an annual or a decadal mean. The only argument is the -decadal boolean switch. When this switch is set to true, this function -will output the decadal averages. -See also :func:`esmvalcore.preprocessor.annual_mean`. +.. _regrid_time: ``regrid_time`` --------------- @@ -654,7 +799,7 @@ Area manipulation The ``_area.py`` module contains the following preprocessor functions: * ``extract_region``: Extract a region from a cube based on ``lat/lon`` - corners. + corners. * ``zonal_means``: Calculates the zonal or meridional means. * ``area_statistics``: Calculates the average value over a region. * ``extract_named_regions``: Extract a specific region from in the region @@ -756,7 +901,7 @@ This function calculates the volume-weighted average across three dimensions, but maintains the time dimension. This function takes the argument: ``operator``, which defines the operation to -apply over the volume. +apply over the volume. No depth coordinate is required as this is determined by Iris. This function works best when the ``fx_files`` provide the cell volume. diff --git a/doc/esmvalcore/recipe.rst b/doc/esmvalcore/recipe.rst index 2bfa58e161..a2a6aa47e8 100644 --- a/doc/esmvalcore/recipe.rst +++ b/doc/esmvalcore/recipe.rst @@ -84,7 +84,7 @@ data specifications: - ensemble member (key ``ensemble``, value e.g. ``r1i1p1``, ``r1i1p1f1``) - time range (e.g. key-value ``start_year: 1982``, ``end_year: 1990``) - model grid (native grid ``grid: gn`` or regridded grid ``grid: gr``, for - CMIP6 data only). + CMIP6 data only). For example, a datasets section could be: @@ -128,7 +128,8 @@ arguments): regrid: target_grid: 1x1 scheme: linear - time_average: + climate_statistics: + operator: mean multi_model_statistics: span: overlap statistics: [mean ] @@ -189,7 +190,7 @@ key are provided, then the key in the datasets section will take precedence over the keys in variables section. For many recipes it makes more sense to define the ``start_year`` and ``end_year`` items in the variable section, because the diagnostic script assumes that all the data has the same time -range. +range. Note that the path to the script provided in the `script` option should be either the absolute path to the script, or the path relative to the @@ -273,7 +274,7 @@ This way the user may test a new diagnostic thoroughly before committing to the GitHub repository and including it in the ESMValTool diagnostics library. Re-using parameters from one ``script`` to another --------------------------------------------------- +-------------------------------------------------- Due to ``yaml`` features it is possible to recycle entire diagnostics sections for use with other diagnostics. Here is an example: diff --git a/esmvalcore/preprocessor/__init__.py b/esmvalcore/preprocessor/__init__.py index a04ec87fb0..ac95186e6d 100644 --- a/esmvalcore/preprocessor/__init__.py +++ b/esmvalcore/preprocessor/__init__.py @@ -20,8 +20,11 @@ from ._reformat import (cmor_check_data, cmor_check_metadata, fix_data, fix_file, fix_metadata) from ._regrid import extract_levels, regrid -from ._time import (annual_mean, extract_month, extract_season, extract_time, - regrid_time, seasonal_mean, time_average) +from ._time import (extract_month, extract_season, extract_time, regrid_time, + daily_statistics, monthly_statistics, seasonal_statistics, + annual_statistics, decadal_statistics, + climate_statistics, anomalies) + from ._units import convert_units from ._volume import (volume_statistics, depth_integration, extract_trajectory, extract_transect, extract_volume) @@ -78,9 +81,13 @@ # 'annual_cycle': annual_cycle, # 'diurnal_cycle': diurnal_cycle, 'zonal_means', - 'annual_mean', - 'seasonal_mean', - 'time_average', + 'daily_statistics', + 'monthly_statistics', + 'seasonal_statistics', + 'annual_statistics', + 'decadal_statistics', + 'climate_statistics', + 'anomalies', 'regrid_time', 'cmor_check_data', 'convert_units', diff --git a/esmvalcore/preprocessor/_area.py b/esmvalcore/preprocessor/_area.py index 9b5cf74bbe..627ef1e888 100644 --- a/esmvalcore/preprocessor/_area.py +++ b/esmvalcore/preprocessor/_area.py @@ -8,20 +8,11 @@ import iris from dask import array as da +from ._shared import guess_bounds, get_iris_analysis_operation logger = logging.getLogger(__name__) -# guess bounds tool -def _guess_bounds(cube, coords): - """Guess bounds of a cube, or not.""" - # check for bounds just in case - for coord in coords: - if not cube.coord(coord).has_bounds(): - cube.coord(coord).guess_bounds() - return cube - - # slice cube over a restricted area (box) def extract_region(cube, start_longitude, end_longitude, start_latitude, end_latitude): @@ -72,37 +63,6 @@ def extract_region(cube, start_longitude, end_longitude, start_latitude, return cube.copy(data) -def get_iris_analysis_operation(operator): - """ - Determine the iris analysis operator from a string. - - Map string to functional operator. - - Parameters - ---------- - operator: str - A named operator. - - Returns - ------- - function: A function from iris.analysis - - Raises - ------ - ValueError - operator not in allowed operators list. - allowed operators: mean, median, std_dev, variance, min, max - """ - operators = ['mean', 'median', 'std_dev', 'variance', 'min', 'max'] - operator = operator.lower() - if operator not in operators: - raise ValueError("operator {} not recognised. " - "Accepted values are: {}." - "".format(operator, ', '.join(operators))) - operation = getattr(iris.analysis, operator.upper()) - return operation - - def zonal_means(cube, coordinate, mean_type): """ Get zonal means. @@ -230,7 +190,7 @@ def area_statistics(cube, operator, fx_files=None): coord_names = ['longitude', 'latitude'] if grid_areas is None or not grid_areas.any(): - cube = _guess_bounds(cube, coord_names) + cube = guess_bounds(cube, coord_names) grid_areas = iris.analysis.cartography.area_weights(cube) logger.info('Calculated grid area shape: %s', grid_areas.shape) diff --git a/esmvalcore/preprocessor/_shared.py b/esmvalcore/preprocessor/_shared.py new file mode 100644 index 0000000000..385881ae16 --- /dev/null +++ b/esmvalcore/preprocessor/_shared.py @@ -0,0 +1,69 @@ +""" +Shared functions for preprocessor. + +Utility functions that can be used for multiple preprocessor steps +""" +import logging + +import iris +import iris.analysis + +logger = logging.getLogger(__name__) + + +# guess bounds tool +def guess_bounds(cube, coords): + """Guess bounds of a cube, or not.""" + # check for bounds just in case + for coord in coords: + if not cube.coord(coord).has_bounds(): + cube.coord(coord).guess_bounds() + return cube + + +def get_iris_analysis_operation(operator): + """ + Determine the iris analysis operator from a string. + + Map string to functional operator. + + Parameters + ---------- + operator: str + A named operator. + + Returns + ------- + function: A function from iris.analysis + + Raises + ------ + ValueError + operator not in allowed operators list. + allowed operators: mean, median, std_dev, variance, min, max + """ + operators = ['mean', 'median', 'std_dev', 'variance', 'min', 'max'] + operator = operator.lower() + if operator not in operators: + raise ValueError("operator {} not recognised. " + "Accepted values are: {}." + "".format(operator, ', '.join(operators))) + operation = getattr(iris.analysis, operator.upper()) + return operation + + +def operator_accept_weights(operator): + """ + Get if operator support weights. + + Parameters + ---------- + operator: str + A named operator. + + Returns + ------- + bool: True if operator support weights, False otherwise + + """ + return operator.lower() in ('mean',) diff --git a/esmvalcore/preprocessor/_time.py b/esmvalcore/preprocessor/_time.py index 8937fcf84b..529c044759 100644 --- a/esmvalcore/preprocessor/_time.py +++ b/esmvalcore/preprocessor/_time.py @@ -9,8 +9,12 @@ import cf_units import iris +import iris.util import iris.coord_categorisation import numpy as np +import dask.array as da + +from ._shared import get_iris_analysis_operation, operator_accept_weights logger = logging.getLogger(__name__) @@ -26,7 +30,7 @@ filterwarnings( 'ignore', "Collapsing a non-contiguous coordinate. " - f"Metadata may not be fully descriptive for '{_coord}'.", + "Metadata may not be fully descriptive for '{0}'.".format(_coord), category=UserWarning, module='iris', ) @@ -176,53 +180,96 @@ def get_time_weights(cube): return time_weights -def time_average(cube): +def daily_statistics(cube, operator='mean'): + """ + Compute daily statistics. + + Chunks time in daily periods and computes statistics over them; + + Parameters + ---------- + cube: iris.cube.Cube + input cube. + + operator: str, optional + Select operator to apply. + Available operators: 'mean', 'median', 'std_dev', 'min', 'max' + + Returns + ------- + iris.cube.Cube + Daily statistics cube + """ + if not cube.coords('day_of_year'): + iris.coord_categorisation.add_day_of_year(cube, 'time') + if not cube.coords('year'): + iris.coord_categorisation.add_year(cube, 'time') + + operator = get_iris_analysis_operation(operator) + cube = cube.aggregated_by(['day_of_year', 'year'], operator) + return cube + + +def monthly_statistics(cube, operator='mean'): """ - Compute time average. + Compute monthly statistics. - Get the time average over the entire cube. The average is weighted by the - bounds of the time coordinate. + Chunks time in monthly periods and computes statistics over them; Parameters ---------- cube: iris.cube.Cube input cube. + operator: str, optional + Select operator to apply. + Available operators: 'mean', 'median', 'std_dev', 'min', 'max' + Returns ------- iris.cube.Cube - time averaged cube. + Monthly statistics cube """ - time_weights = get_time_weights(cube) + if not cube.coords('month_number'): + iris.coord_categorisation.add_month_number(cube, 'time') + if not cube.coords('year'): + iris.coord_categorisation.add_year(cube, 'time') - return cube.collapsed('time', iris.analysis.MEAN, weights=time_weights) + operator = get_iris_analysis_operation(operator) + cube = cube.aggregated_by(['month_number', 'year'], operator) + return cube -# get the seasonal mean -def seasonal_mean(cube): +def seasonal_statistics(cube, operator='mean'): """ - Compute seasonal means with MEAN. + Compute seasonal statistics. - Chunks time in 3-month periods and computes means over them; + Chunks time in 3-month periods and computes statistics over them; Parameters ---------- cube: iris.cube.Cube input cube. + operator: str, optional + Select operator to apply. + Available operators: 'mean', 'median', 'std_dev', 'min', 'max' + Returns ------- iris.cube.Cube - Seasonal mean cube + Seasonal statistic cube """ if not cube.coords('clim_season'): iris.coord_categorisation.add_season(cube, 'time', name='clim_season') if not cube.coords('season_year'): - iris.coord_categorisation.add_season_year(cube, - 'time', - name='season_year') + iris.coord_categorisation.add_season_year( + cube, 'time', name='season_year') + + operator = get_iris_analysis_operation(operator) + cube = cube.aggregated_by(['clim_season', 'season_year'], - iris.analysis.MEAN) + operator) # CMOR Units are days so we are safe to operate on days # Ranging on [90, 92] days makes this calendar-independent @@ -246,6 +293,188 @@ def spans_three_months(time): return cube.extract(three_months_bound) +def annual_statistics(cube, operator='mean'): + """ + Compute annual statistics. + + Note that this function does not weight the annual mean if + uneven time periods are present. Ie, all data inside the year + are treated equally. + + Parameters + ---------- + cube: iris.cube.Cube + input cube. + + operator: str, optional + Select operator to apply. + Available operators: 'mean', 'median', 'std_dev', 'min', 'max' + + Returns + ------- + iris.cube.Cube + Annual statistics cube + """ + # TODO: Add weighting in time dimension. See iris issue 3290 + # https://github.com/SciTools/iris/issues/3290 + + operator = get_iris_analysis_operation(operator) + + if not cube.coords('year'): + iris.coord_categorisation.add_year(cube, 'time') + return cube.aggregated_by('year', operator) + + +def decadal_statistics(cube, operator='mean'): + """ + Compute decadal statistics. + + Note that this function does not weight the decadal mean if + uneven time periods are present. Ie, all data inside the decade + are treated equally. + + Parameters + ---------- + cube: iris.cube.Cube + input cube. + + operator: str, optional + Select operator to apply. + Available operators: 'mean', 'median', 'std_dev', min', 'max' + + Returns + ------- + iris.cube.Cube + Decadal statistics cube + """ + # TODO: Add weighting in time dimension. See iris issue 3290 + # https://github.com/SciTools/iris/issues/3290 + + operator = get_iris_analysis_operation(operator) + + if not cube.coords('decade'): + def get_decade(coord, value): + """Callback function to get decades from cube.""" + date = coord.units.num2date(value) + return date.year - date.year % 10 + + iris.coord_categorisation.add_categorised_coord( + cube, 'decade', 'time', get_decade) + + return cube.aggregated_by('decade', operator) + + +def climate_statistics(cube, operator='mean', period='full'): + """ + Compute climate statistics with the specified granularity. + + Computes statistics for the whole dataset. It is possible to get them for + the full period or with the data grouped by day, month or season + + Parameters + ---------- + cube: iris.cube.Cube + input cube. + + operator: str, optional + Select operator to apply. + Available operators: 'mean', 'median', 'std_dev', 'min', 'max' + + period: str, optional + Period to compute the statistic over. + Available periods: 'full', 'season', 'seasonal', 'monthly', 'month', + 'mon', 'daily', 'day' + + Returns + ------- + iris.cube.Cube + Monthly statistics cube + """ + period = period.lower() + + if period in ('full', ): + operator_method = get_iris_analysis_operation(operator) + if operator_accept_weights(operator): + time_weights = get_time_weights(cube) + cube = cube.collapsed( + 'time', operator_method, weights=time_weights + ) + else: + cube = cube.collapsed('time', operator_method) + return cube + + clim_coord = _get_period_coord(cube, period) + operator = get_iris_analysis_operation(operator) + cube = cube.aggregated_by(clim_coord, operator) + cube.remove_coord('time') + iris.util.promote_aux_coord_to_dim_coord(cube, clim_coord.name()) + return cube + + +def anomalies(cube, period): + """ + Compute anomalies using a mean with the specified granularity. + + Computes anomalies based on daily, monthly, seasonal or yearly means for + the full available period + + Parameters + ---------- + cube: iris.cube.Cube + input cube. + + period: str, optional + Period to compute the statistic over. + Available periods: 'full', 'season', 'seasonal', 'monthly', 'month', + 'mon', 'daily', 'day' + + Returns + ------- + iris.cube.Cube + Monthly statistics cube + """ + reference = climate_statistics(cube, period=period) + if period in ['full']: + return cube - reference + + cube_coord = _get_period_coord(cube, period) + ref_coord = _get_period_coord(reference, period) + + data = cube.core_data() + cube_time = cube.coord('time') + ref = {} + for ref_slice in reference.slices_over(ref_coord): + ref[ref_slice.coord(ref_coord).points[0]] = da.ravel( + ref_slice.core_data()) + cube_coord_dim = cube.coord_dims(cube_coord)[0] + for i in range(cube_time.shape[0]): + time = cube_time.points[i] + indexes = cube_time.points == time + indexes = iris.util.broadcast_to_shape( + indexes, data.shape, (cube_coord_dim, ) + ) + data[indexes] = data[indexes] - ref[cube_coord.points[i]] + + cube = cube.copy(data) + return cube + + +def _get_period_coord(cube, period): + if period in ['daily', 'day']: + if not cube.coords('day_of_year'): + iris.coord_categorisation.add_day_of_year(cube, 'time') + return cube.coord('day_of_year') + elif period in ['monthly', 'month', 'mon']: + if not cube.coords('month_number'): + iris.coord_categorisation.add_month_number(cube, 'time') + return cube.coord('month_number') + elif period in ['seasonal', 'season']: + if not cube.coords('season_number'): + iris.coord_categorisation.add_season_number(cube, 'time') + return cube.coord('season_number') + raise ValueError('Period %s not supported') + + def regrid_time(cube, frequency): """ Align time axis for cubes so they can be subtracted. @@ -328,46 +557,3 @@ def regrid_time(cube, frequency): name='day_of_year') return cube - - -def annual_mean(cube, decadal=False): - """ - Compute annual or decadal means. - - Note that this function does not weight the annual or decadal mean if - uneven time periods are present. Ie, all data inside the year/decade - are treated equally. - - Parameters - ---------- - cube: iris.cube.Cube - input cube. - decadal: bool - Annual average (:obj:`True`) or decadal average (:obj:`False`) - - Returns - ------- - iris.cube.Cube - Annual mean cube - """ - # time_weights = get_time_weights(cube) - - # TODO: Add weighting in time dimension. See iris issue 3290 - # https://github.com/SciTools/iris/issues/3290 - - if decadal: - if not cube.coords('decade'): - - def get_decade(coord, value): - """Callback function to get decades from cube.""" - date = coord.units.num2date(value) - return date.year - date.year % 10 - - iris.coord_categorisation.add_categorised_coord( - cube, 'decade', 'time', get_decade) - - return cube.aggregated_by('decade', iris.analysis.MEAN) - - if not cube.coords('year'): - iris.coord_categorisation.add_year(cube, 'time') - return cube.aggregated_by('year', iris.analysis.MEAN) diff --git a/tests/unit/preprocessor/_time/test_time.py b/tests/unit/preprocessor/_time/test_time.py index 6c0beffcea..66685ed9d9 100644 --- a/tests/unit/preprocessor/_time/test_time.py +++ b/tests/unit/preprocessor/_time/test_time.py @@ -1,20 +1,25 @@ """Unit tests for the :func:`esmvalcore.preprocessor._time` module.""" import unittest +import pytest +import tests + +import numpy as np +from numpy.testing import assert_array_equal +from cf_units import Unit import iris import iris.coord_categorisation import iris.coords -import numpy as np -import pytest -from cf_units import Unit from iris.cube import Cube -from numpy.testing import assert_array_equal -import tests -from esmvalcore.preprocessor._time import (annual_mean, extract_month, - extract_season, extract_time, - regrid_time, time_average) +from esmvalcore.preprocessor._time import ( + extract_month, extract_season, extract_time, + regrid_time, + decadal_statistics, annual_statistics, seasonal_statistics, + monthly_statistics, daily_statistics, + climate_statistics, anomalies +) def _create_sample_cube(): @@ -48,11 +53,17 @@ def setUp(self): def test_get_january(self): """Test january extraction""" sliced = extract_month(self.cube, 1) - print(sliced) assert_array_equal( np.array([1, 1]), sliced.coord('month_number').points) + def test_bad_month_raises(self): + """Test january extraction""" + with self.assertRaises(ValueError): + extract_month(self.cube, 13) + with self.assertRaises(ValueError): + extract_month(self.cube, -1) + class TestTimeSlice(tests.Test): """Tests for extract_time.""" @@ -64,7 +75,6 @@ def setUp(self): def test_extract_time(self): """Test extract_time.""" sliced = extract_time(self.cube, 1950, 1, 1, 1950, 12, 31) - print(sliced) assert_array_equal( np.arange(1, 13, 1), sliced.coord('month_number').points) @@ -79,15 +89,12 @@ def test_extract_time_one_time(self): cube = _create_sample_cube() cube = cube.collapsed('time', iris.analysis.MEAN) sliced = extract_time(cube, 1950, 1, 1, 1952, 12, 31) - print(sliced) assert_array_equal(np.array([360.]), sliced.coord('time').points) def test_extract_time_no_time(self): """Test extract_time with no time step.""" cube = _create_sample_cube()[0] sliced = extract_time(cube, 1950, 1, 1, 1950, 12, 31) - print('sliced', sliced, sliced.shape) - print('cube', cube, cube.shape) assert cube == sliced @@ -101,7 +108,6 @@ def setUp(self): def test_get_djf(self): """Test function for winter""" sliced = extract_season(self.cube, 'djf') - print(sliced) assert_array_equal( np.array([1, 2, 12, 1, 2, 12]), sliced.coord('month_number').points) @@ -109,7 +115,6 @@ def test_get_djf(self): def test_get_djf_caps(self): """Test function works when season specified in caps""" sliced = extract_season(self.cube, 'DJF') - print(sliced) assert_array_equal( np.array([1, 2, 12, 1, 2, 12]), sliced.coord('month_number').points) @@ -117,7 +122,6 @@ def test_get_djf_caps(self): def test_get_mam(self): """Test function for spring""" sliced = extract_season(self.cube, 'mam') - print(sliced) assert_array_equal( np.array([3, 4, 5, 3, 4, 5]), sliced.coord('month_number').points) @@ -125,7 +129,6 @@ def test_get_mam(self): def test_get_jja(self): """Test function for summer""" sliced = extract_season(self.cube, 'jja') - print(sliced) assert_array_equal( np.array([6, 7, 8, 6, 7, 8]), sliced.coord('month_number').points) @@ -133,63 +136,304 @@ def test_get_jja(self): def test_get_son(self): """Test function for summer""" sliced = extract_season(self.cube, 'son') - print(sliced) assert_array_equal( np.array([9, 10, 11, 9, 10, 11]), sliced.coord('month_number').points) -class TestTimeAverage(tests.Test): - """Test class for the :func:`esmvalcore.preprocessor._time_pp` module""" +class TestClimatology(tests.Test): + """Test class for :func:`esmvalcore.preprocessor._time.climatology`""" - def test_time_average(self): - """Test for time average of a 1D field.""" - data = np.ones((3)) - times = np.array([15., 45., 75.]) - bounds = np.array([[0., 30.], [30., 60.], [60., 90.]]) + @staticmethod + def _create_cube(data, times, bounds): time = iris.coords.DimCoord( times, bounds=bounds, standard_name='time', units=Unit('days since 1950-01-01', calendar='gregorian')) cube = iris.cube.Cube(data, dim_coords_and_dims=[(time, 0)]) + return cube - result = time_average(cube) + def test_time_mean(self): + """Test for time average of a 1D field.""" + data = np.ones((3)) + times = np.array([15., 45., 75.]) + bounds = np.array([[0., 30.], [30., 60.], [60., 90.]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='mean') expected = np.array([1.]) assert_array_equal(result.data, expected) - def test_time_average_uneven(self): + def test_time_mean_uneven(self): """Test for time average of a 1D field with uneven time boundaries.""" data = np.array([1., 5.]) times = np.array([5., 25.]) bounds = np.array([[0., 1.], [1., 4.]]) - time = iris.coords.DimCoord( - times, - bounds=bounds, - standard_name='time', - units=Unit('days since 1950-01-01', calendar='gregorian')) - cube = iris.cube.Cube(data, dim_coords_and_dims=[(time, 0)]) + cube = self._create_cube(data, times, bounds) - result = time_average(cube) + result = climate_statistics(cube, operator='mean') expected = np.array([4.]) assert_array_equal(result.data, expected) - def test_time_average_365_day(self): + def test_time_mean_365_day(self): """Test for time avg of a realisitc time axis and 365 day calendar""" data = np.ones((6, )) times = np.array([15, 45, 74, 105, 135, 166]) bounds = np.array([[0, 31], [31, 59], [59, 90], [90, 120], [120, 151], [151, 181]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='mean') + expected = np.array([1.]) + assert_array_equal(result.data, expected) + + def test_season_climatology(self): + """Test for time avg of a realisitc time axis and 365 day calendar""" + data = np.ones((6, )) + times = np.array([15, 45, 74, 105, 135, 166]) + bounds = np.array([[0, 31], [31, 59], [59, 90], [90, 120], [120, 151], + [151, 181]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='mean', period='season') + expected = np.array([1., 1., 1.]) + assert_array_equal(result.data, expected) + + def test_monthly(self): + """Test for time avg of a realisitc time axis and 365 day calendar""" + data = np.ones((6, )) + times = np.array([15, 45, 74, 105, 135, 166]) + bounds = np.array([[0, 31], [31, 59], [59, 90], [90, 120], [120, 151], + [151, 181]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='mean', period='mon') + expected = np.ones((6, )) + assert_array_equal(result.data, expected) + + def test_day(self): + """Test for time avg of a realisitc time axis and 365 day calendar""" + data = np.ones((6, )) + times = np.array([0.5, 1.5, 2.5, 365.5, 366.5, 367.5]) + bounds = np.array([[0, 1], [1, 2], [2, 3], + [365, 366], [366, 367], [367, 368]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='mean', period='day') + expected = np.array([1, 1, 1]) + assert_array_equal(result.data, expected) + + def test_period_not_supported(self): + """Test for time avg of a realisitc time axis and 365 day calendar""" + data = np.ones((6, )) + times = np.array([15, 45, 74, 105, 135, 166]) + bounds = np.array([[0, 31], [31, 59], [59, 90], [90, 120], [120, 151], + [151, 181]]) + cube = self._create_cube(data, times, bounds) + + with self.assertRaises(ValueError): + climate_statistics(cube, operator='mean', period='bad') + + def test_time_max(self): + """Test for time max of a 1D field.""" + data = np.arange((3)) + times = np.array([15., 45., 75.]) + bounds = np.array([[0., 30.], [30., 60.], [60., 90.]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='max') + expected = np.array([2.]) + assert_array_equal(result.data, expected) + + def test_time_min(self): + """Test for time min of a 1D field.""" + data = np.arange((3)) + times = np.array([15., 45., 75.]) + bounds = np.array([[0., 30.], [30., 60.], [60., 90.]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='min') + expected = np.array([0.]) + assert_array_equal(result.data, expected) + + def test_time_median(self): + """Test for time meadian of a 1D field.""" + data = np.arange((3)) + times = np.array([15., 45., 75.]) + bounds = np.array([[0., 30.], [30., 60.], [60., 90.]]) + cube = self._create_cube(data, times, bounds) + + result = climate_statistics(cube, operator='median') + expected = np.array([1.]) + assert_array_equal(result.data, expected) + + +class TestSeasonalStatistics(tests.Test): + """Test :func:`esmvalcore.preprocessor._time.seasonal_statistics`""" + + @staticmethod + def _create_cube(data, times): time = iris.coords.DimCoord( times, - bounds=bounds, standard_name='time', - var_name='time', - units=Unit('days since 1950-01-01', calendar='365_day')) + units=Unit('days since 1950-01-01', calendar='360_day')) + time.guess_bounds() cube = iris.cube.Cube(data, dim_coords_and_dims=[(time, 0)]) + return cube - result = time_average(cube) - expected = np.array([1.]) + def test_season_mean(self): + """Test for season average of a 1D field.""" + data = np.arange(12) + times = np.arange(15, 360, 30) + cube = self._create_cube(data, times) + + result = seasonal_statistics(cube, 'mean') + expected = np.array([3., 6., 9.]) + assert_array_equal(result.data, expected) + + def test_season_median(self): + """Test for season median of a 1D field.""" + data = np.arange(12) + times = np.arange(15, 360, 30) + cube = self._create_cube(data, times) + + result = seasonal_statistics(cube, 'median') + expected = np.array([3., 6., 9.]) + assert_array_equal(result.data, expected) + + def test_season_min(self): + """Test for season min of a 1D field.""" + data = np.arange(12) + times = np.arange(15, 360, 30) + cube = self._create_cube(data, times) + + result = seasonal_statistics(cube, 'min') + expected = np.array([2., 5., 8.]) + assert_array_equal(result.data, expected) + + def test_season_max(self): + """Test for season max of a 1D field.""" + data = np.arange(12) + times = np.arange(15, 360, 30) + cube = self._create_cube(data, times) + + result = seasonal_statistics(cube, 'max') + expected = np.array([4., 7., 10.]) + assert_array_equal(result.data, expected) + + +class TestMonthlyStatistics(tests.Test): + """Test :func:`esmvalcore.preprocessor._time.monthly_statistics`""" + + @staticmethod + def _create_cube(data, times): + time = iris.coords.DimCoord( + times, + standard_name='time', + units=Unit('days since 1950-01-01', calendar='360_day')) + time.guess_bounds() + cube = iris.cube.Cube(data, dim_coords_and_dims=[(time, 0)]) + return cube + + def test_mean(self): + """Test average of a 1D field.""" + data = np.arange(24) + times = np.arange(7, 360, 15) + cube = self._create_cube(data, times) + + result = monthly_statistics(cube, 'mean') + expected = np.array([ + 0.5, 2.5, 4.5, 6.5, 8.5, 10.5, 12.5, 14.5, + 16.5, 18.5, 20.5, 22.5 + ]) + assert_array_equal(result.data, expected) + + def test_median(self): + """Test median of a 1D field.""" + data = np.arange(24) + times = np.arange(7, 360, 15) + cube = self._create_cube(data, times) + + result = monthly_statistics(cube, 'median') + expected = np.array([ + 0.5, 2.5, 4.5, 6.5, 8.5, 10.5, 12.5, 14.5, + 16.5, 18.5, 20.5, 22.5 + ]) + assert_array_equal(result.data, expected) + + def test_min(self): + """Test min of a 1D field.""" + data = np.arange(24) + times = np.arange(7, 360, 15) + cube = self._create_cube(data, times) + + result = monthly_statistics(cube, 'min') + expected = np.array([0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22]) + assert_array_equal(result.data, expected) + + def test_max(self): + """Test max of a 1D field.""" + data = np.arange(24) + times = np.arange(7, 360, 15) + cube = self._create_cube(data, times) + + result = monthly_statistics(cube, 'max') + expected = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23]) + assert_array_equal(result.data, expected) + + +class TestDailyStatistics(tests.Test): + """Test :func:`esmvalcore.preprocessor._time.monthly_statistics`""" + + @staticmethod + def _create_cube(data, times): + time = iris.coords.DimCoord( + times, + standard_name='time', + units=Unit('hours since 1950-01-01', calendar='360_day')) + time.guess_bounds() + cube = iris.cube.Cube(data, dim_coords_and_dims=[(time, 0)]) + return cube + + def test_mean(self): + """Test average of a 1D field.""" + data = np.arange(8) + times = np.arange(0, 48, 6) + cube = self._create_cube(data, times) + + result = daily_statistics(cube, 'mean') + expected = np.array([1.5, 5.5]) + assert_array_equal(result.data, expected) + + def test_median(self): + """Test median of a 1D field.""" + data = np.arange(8) + times = np.arange(0, 48, 6) + cube = self._create_cube(data, times) + + result = daily_statistics(cube, 'median') + expected = np.array([1.5, 5.5]) + assert_array_equal(result.data, expected) + + def test_min(self): + """Test min of a 1D field.""" + data = np.arange(8) + times = np.arange(0, 48, 6) + cube = self._create_cube(data, times) + + result = daily_statistics(cube, 'min') + expected = np.array([0., 4.]) + assert_array_equal(result.data, expected) + + def test_max(self): + """Test max of a 1D field.""" + data = np.arange(8) + times = np.arange(0, 48, 6) + cube = self._create_cube(data, times) + + result = daily_statistics(cube, 'max') + expected = np.array([3., 7.]) assert_array_equal(result.data, expected) @@ -305,7 +549,6 @@ def setUp(self): ), 0, ) - print(self.cube_1) add_auxiliary_coordinate([self.cube_1, self.cube_2]) def test_regrid_time_6hour(self): @@ -353,7 +596,6 @@ def setUp(self): ), 0, ) - print(self.cube_1) add_auxiliary_coordinate([self.cube_1, self.cube_2]) def test_regrid_time_3hour(self): @@ -443,7 +685,7 @@ def test_annual_average(existing_coord): if existing_coord: iris.coord_categorisation.add_year(cube, 'time') - result = annual_mean(cube, decadal=False) + result = annual_statistics(cube) expected = np.array([1., 1.]) assert_array_equal(result.data, expected) expected_time = np.array([180., 540.]) @@ -464,12 +706,86 @@ def get_decade(coord, value): iris.coord_categorisation.add_categorised_coord( cube, 'decade', 'time', get_decade) - result = annual_mean(cube, decadal=True) + result = decadal_statistics(cube) expected = np.array([1., 1.]) assert_array_equal(result.data, expected) expected_time = np.array([1800., 5400.]) assert_array_equal(result.coord('time').points, expected_time) +def make_map_data(number_years=2): + """Make a cube with time, lat and lon dimensions.""" + times = np.arange(0.5, number_years * 360) + bounds = np.stack(((times - 0.5), (times + 0.5)), 1) + time = iris.coords.DimCoord( + times, + bounds=bounds, + standard_name='time', + units=Unit('days since 1950-01-01', calendar='360_day')) + lat = iris.coords.DimCoord( + range(2), + standard_name='latitude', + ) + lon = iris.coords.DimCoord( + range(2), + standard_name='longitude', + ) + data = np.array([[[0], [1], ], [[1], [0], ]]) * times + cube = iris.cube.Cube( + data, + dim_coords_and_dims=[(lon, 0), (lat, 1), (time, 2)] + ) + return cube + + +@pytest.mark.parametrize('period', ['full', 'day', 'month', 'season']) +def test_anomalies(period): + cube = make_map_data(number_years=2) + result = anomalies(cube, period) + if period == 'full': + anom = np.arange(-359.5, 360, 1) + zeros = np.zeros_like(anom) + assert_array_equal( + result.data, + np.array([[zeros, anom], [anom, zeros]]) + ) + elif period == 'day': + anom = np.concatenate((np.ones(360) * -180, np.ones(360) * 180)) + zeros = np.zeros_like(anom) + assert_array_equal( + result.data, + np.array([[zeros, anom], [anom, zeros]]) + ) + elif period == 'month': + anom1 = np.concatenate([np.arange(-194.5, -165) for x in range(12)]) + anom2 = np.concatenate([np.arange(165.5, 195) for x in range(12)]) + anom = np.concatenate((anom1, anom2)) + zeros = np.zeros_like(anom) + print(result.data[0, 1]) + assert_array_equal( + result.data, + np.array([[zeros, anom], [anom, zeros]]) + ) + elif period == 'season': + anom = np.concatenate(( + np.arange(-314.5, -255), + np.arange(-224.5, -135), + np.arange(-224.5, -135), + np.arange(-224.5, -135), + np.arange(15.5, 105), + np.arange(135.5, 225), + np.arange(135.5, 225), + np.arange(135.5, 225), + np.arange(375.5, 405), + )) + zeros = np.zeros_like(anom) + print(result.data[0, 1]) + assert_array_equal( + result.data, + np.array([[zeros, anom], [anom, zeros]]) + ) + assert_array_equal(result.coord('time').points, cube.coord('time').points) + + if __name__ == '__main__': unittest.main()