-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDO-like convenience methods to select times #557
Comments
Just curious- how would we currently do this? |
Currently, the way to do this is to create a boolean indexer, with something like the following:
I agree that this is overly verbose and we can come up with something better. I'm not quite happy with
Something like So, some other possible ways to spell this:
|
I'd also like to see this come into being. My two cents on the syntax is that your second idea is best: pr_dataset.sel('time.year', elnino_years) I don't care much for option 3 since the variable name is being used as an attribute (what if my time variable is called "Times"?). |
I agree with your arguments against creating too many datetime specific methods. However, I am not convinced about using .sel with 'time.year', it looks a bit hackish to me. Maybe all "CDO methods" can be integrated in one single "seltimes" method. Please look at the function below. The time_coord arguments lets you choose the time coordinate that you want to filter, and by using numpy.logical_and is it possible to do the whole operation in one single function call. It could be turned into a method. What do you think? %matplotlib inline
import xray
import numpy as np
from matplotlib import pyplot as plt
ifile = "HadCRUT.4.3.0.0.median.nc"
ixd = xray.open_dataset(ifile) You can download the example file from here http://www.metoffice.gov.uk/hadobs/hadcrut4/data/4.3.0.0/gridded_fields/HadCRUT.4.3.0.0.median_netcdf.zip Now comes the function. def seltime(ixd, time_coord, **kwargs):
"""
Select time steps by groups of years, months, etc.
Parameters
----------
ixd : xray.Dataset or xray.DataArray, input dataset. Will be "self" if
the function is turned into a method.
time_coord: str, name of the time coordinate to use. Needed to avoid ambiguities.
**kwargs: String or interable: Currently years, months, days and hours are supported.
Returns
----------
xray.Dataset or xrat.DataArray filtered by the constraints defined
by the kwargs.
"""
ntimes = len(ixd.coords[time_coord])
time_mask = np.repeat(True, ntimes)
for time_unit, time_values in kwargs.iteritems():
if time_unit not in ("year", "month", "day", "hour"):
raise KeyError(time_unit)
time_str = "{}.{}".format(time_coord, time_unit)
time_mask_temp = np.in1d(ixd[time_str], time_values)
time_mask = np.logical_and(time_mask, time_mask_temp)
oxd = ixd.sel(**{time_coord : time_mask})
return oxd It works, and it looks fast enough. elnino_years = (1958, 1973, 1983, 1987, 1998)
months = (1, 2, 3, 4)
%timeit elnino_xd = seltime(ixd, "time", year=elnino_years, month=months)
elnino_xd.time
If we average in time we can see the warm tonge in the pacific. plt.figure(figsize=(12, 5))
elnino_xd.temperature_anomaly.mean("time").plot()
plt.gca().invert_yaxis() |
Is there a way to do this easily now with vectorized indexing? |
I don’t think there’s an easy way to this with vectorized indexing, but if we supported multidimensional indexing with boolean keys as proposed in #1887 (equivalent to where with drop=True) we could write |
I though this issue was long forgotten ; ) The fact is that today I still call this seltime function very often in my code so I am glad to see it back, thank you. I like the .isin(elnino_years) syntax, and I see that it is consistent with pandas. Something like dataset.time.year.isin(elnino_years) would be very nice too, right now is just a "to_index()" away, as this works: dataset.time.to_index().year.isin(elnino_years). However I am not sure about how this is related with multidimensional indexing, as time is only one dimension. |
|
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
I feel like the time selecting features of xray can be improved. Currently, some common operations are too involved or verbose, like selecting the data in a group of months that are not a standard season (e.g. the monsoon season in india JJAS), or in non consecutive years (e.g. El Niño years). I think it would be great to implement (and easy), some methods inspired in the widely used Climate Data Operators https://code.zmaw.de/projects/cdo For example: selyear, selmon, selday and selhour.
Then we could easily do a composite of JJAS seasons in El Niño years like this:
This would make me very happy. The way to go would be to write methods that call grouby, then select the years/months, merge them, and return the corresponding dataset/dataarray, but I am not sure about what is the most efficient way to do this.
The text was updated successfully, but these errors were encountered: