CDO-like convenience methods to select times #557

markelg · 2015-09-02T13:42:48Z

I feel like the time selecting features of xray can be improved. Currently, some common operations are too involved or verbose, like selecting the data in a group of months that are not a standard season (e.g. the monsoon season in india JJAS), or in non consecutive years (e.g. El Niño years). I think it would be great to implement (and easy), some methods inspired in the widely used Climate Data Operators https://code.zmaw.de/projects/cdo For example: selyear, selmon, selday and selhour.

Then we could easily do a composite of JJAS seasons in El Niño years like this:

pr_dataset = xray.open(my_precipitation_dataset)
elnino_years = [year list here...]
pr_dataset.selyear(elnino_years).selmon([6, 7, 8, 9]).mean('time')

This would make me very happy. The way to go would be to write methods that call grouby, then select the years/months, merge them, and return the corresponding dataset/dataarray, but I am not sure about what is the most efficient way to do this.

The text was updated successfully, but these errors were encountered:

clarkfitzg · 2015-09-02T15:32:17Z

Just curious- how would we currently do this?

shoyer · 2015-09-02T17:37:56Z

Currently, the way to do this is to create a boolean indexer, with something like the following:

pr_dataset.sel(time=np.in1d(pr_dataset['time.year'], elnino_years))

I agree that this is overly verbose and we can come up with something better. I'm not quite happy with selyear, though:

It's ambiguous which datetime variable the year refers do (there can be more than one time variable on some datasets).
I'm also not a big fan of adding a bunch of new API methods that are datetime specific -- it creates a lot of noise (pandas has an issue with this).

Something like pr_dataset.sel(year=elnino_years) would be the ideal fix for this second concern (we've discussed this in another issue, can't remember which one now), but it's still ambiguous which time variable it refers to.

So, some other possible ways to spell this:

pr_dataset.sel('time', year=elnino_years)
pr_dataset.sel('time.year', elnino_years)
pr_dataset.sel.time.year(elnino_years)

jhamman · 2015-09-02T21:57:54Z

I'd also like to see this come into being.

My two cents on the syntax is that your second idea is best:

pr_dataset.sel('time.year', elnino_years)

I don't care much for option 3 since the variable name is being used as an attribute (what if my time variable is called "Times"?).

markelg · 2015-09-03T15:05:35Z

I agree with your arguments against creating too many datetime specific methods. However, I am not convinced about using .sel with 'time.year', it looks a bit hackish to me. Maybe all "CDO methods" can be integrated in one single "seltimes" method. Please look at the function below. The time_coord arguments lets you choose the time coordinate that you want to filter, and by using numpy.logical_and is it possible to do the whole operation in one single function call. It could be turned into a method. What do you think?

%matplotlib inline
import xray
import numpy as np
from matplotlib import pyplot as plt

ifile = "HadCRUT.4.3.0.0.median.nc"
ixd = xray.open_dataset(ifile)

You can download the example file from here http://www.metoffice.gov.uk/hadobs/hadcrut4/data/4.3.0.0/gridded_fields/HadCRUT.4.3.0.0.median_netcdf.zip Now comes the function.

def seltime(ixd, time_coord, **kwargs):
    """
    Select time steps by groups of years, months, etc.

    Parameters
    ----------
    ixd : xray.Dataset or xray.DataArray, input dataset. Will be "self" if 
    the function is turned into a method.
    time_coord: str, name of the time coordinate to use. Needed to avoid ambiguities.
    **kwargs: String or interable: Currently years, months, days and hours are supported. 

    Returns
    ----------
    xray.Dataset or xrat.DataArray filtered by the constraints defined
    by the kwargs.
    """
    ntimes = len(ixd.coords[time_coord])
    time_mask = np.repeat(True, ntimes)

    for time_unit, time_values in kwargs.iteritems():
        if time_unit not in ("year", "month", "day", "hour"):
            raise KeyError(time_unit)
        time_str = "{}.{}".format(time_coord, time_unit)
        time_mask_temp = np.in1d(ixd[time_str], time_values)
        time_mask = np.logical_and(time_mask, time_mask_temp)
    oxd = ixd.sel(**{time_coord : time_mask})
    return oxd

It works, and it looks fast enough.

elnino_years = (1958, 1973, 1983, 1987, 1998)
months = (1, 2, 3, 4)
%timeit elnino_xd = seltime(ixd, "time", year=elnino_years, month=months)
elnino_xd.time

    1000 loops, best of 3: 1.05 ms per loop

    <xray.DataArray 'time' (time: 20)>
    array(['1958-01-16T12:00:00.000000000Z', '1958-02-15T00:00:00.000000000Z',
           '1958-03-16T12:00:00.000000000Z', '1958-04-16T00:00:00.000000000Z',
           '1973-01-16T13:00:00.000000000+0100',
           '1973-02-15T01:00:00.000000000+0100',
           '1973-03-16T13:00:00.000000000+0100',
           '1973-04-16T01:00:00.000000000+0100',
           '1983-01-16T13:00:00.000000000+0100',
           '1983-02-15T01:00:00.000000000+0100',
           '1983-03-16T13:00:00.000000000+0100',
           '1983-04-16T02:00:00.000000000+0200',
           '1987-01-16T13:00:00.000000000+0100',
           '1987-02-15T01:00:00.000000000+0100',
           '1987-03-16T13:00:00.000000000+0100',
           '1987-04-16T02:00:00.000000000+0200',
           '1998-01-16T13:00:00.000000000+0100',
           '1998-02-15T01:00:00.000000000+0100',
           '1998-03-16T13:00:00.000000000+0100',
           '1998-04-16T02:00:00.000000000+0200'], dtype='datetime64[ns]')
    Coordinates:
      * time     (time) datetime64[ns] 1958-01-16T12:00:00 1958-02-15 ...
    Attributes:
        start_month: 1
        start_year: 1850
        end_month: 4
        end_year: 2015
        long_name: time
        standard_name: time
        axis: T

If we average in time we can see the warm tonge in the pacific.

plt.figure(figsize=(12, 5))
elnino_xd.temperature_anomaly.mean("time").plot()
plt.gca().invert_yaxis()

rabernat · 2018-10-04T16:12:43Z

Is there a way to do this easily now with vectorized indexing?

shoyer · 2018-10-05T07:59:47Z

I don’t think there’s an easy way to this with vectorized indexing, but if we supported multidimensional indexing with boolean keys as proposed in #1887 (equivalent to where with drop=True) we could write pr_dataset[pr_dataset['time.year'].isin(elnino_years)]

markelg · 2018-10-05T08:46:12Z

I though this issue was long forgotten ; ) The fact is that today I still call this seltime function very often in my code so I am glad to see it back, thank you.

I like the .isin(elnino_years) syntax, and I see that it is consistent with pandas. Something like dataset.time.year.isin(elnino_years) would be very nice too, right now is just a "to_index()" away, as this works: dataset.time.to_index().year.isin(elnino_years). However I am not sure about how this is related with multidimensional indexing, as time is only one dimension.

shoyer · 2018-10-05T11:33:45Z

dataset.time.dt.year.isin(elnino_years) should work already with.

stale · 2020-12-25T15:49:52Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

markelg changed the title ~~CDO like convenience methods to select times~~ CDO-like convenience methods to select times Sep 2, 2015

dcherian added topic-indexing enhancement labels Jan 22, 2019

stale bot added the stale label Dec 25, 2020

dcherian added API design and removed stale labels Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDO-like convenience methods to select times #557

CDO-like convenience methods to select times #557

markelg commented Sep 2, 2015

clarkfitzg commented Sep 2, 2015

shoyer commented Sep 2, 2015

jhamman commented Sep 2, 2015

markelg commented Sep 3, 2015

rabernat commented Oct 4, 2018

shoyer commented Oct 5, 2018 •

edited

Loading

markelg commented Oct 5, 2018

shoyer commented Oct 5, 2018

stale bot commented Dec 25, 2020

CDO-like convenience methods to select times #557

CDO-like convenience methods to select times #557

Comments

markelg commented Sep 2, 2015

clarkfitzg commented Sep 2, 2015

shoyer commented Sep 2, 2015

jhamman commented Sep 2, 2015

markelg commented Sep 3, 2015

rabernat commented Oct 4, 2018

shoyer commented Oct 5, 2018 • edited Loading

markelg commented Oct 5, 2018

shoyer commented Oct 5, 2018

stale bot commented Dec 25, 2020

shoyer commented Oct 5, 2018 •

edited

Loading