-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Coarsen #2612
Added Coarsen #2612
Changes from 21 commits
3525b9c
5ff3102
6f3cf0c
f1f4804
ab5d2f6
9123fd4
c85d18a
0aa7a37
b656d62
2ffcb23
04773eb
b33020b
b13af18
24f3061
d806c96
96bf29b
b70996a
827794e
a354005
82c08af
d73d1d5
a92c431
0e53c7b
07b8060
aa41f39
4c347af
2a06b05
50fa6aa
1d04bdd
1523292
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,6 +50,10 @@ Enhancements | |
- :py:class:`CFTimeIndex` uses slicing for string indexing when possible (like | ||
:py:class:`pandas.DatetimeIndex`), which avoids unnecessary copies. | ||
By `Stephan Hoyer <https://github.com/shoyer>`_ | ||
- :py:meth:`~xarray.DataArray.coarsen` and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This now needs to move up to the section for 0.11.2. Also it would be nice to add a link to the new doc section "Coarsen large arrays". |
||
:py:meth:`~xarray.Dataset.coarsen` are newly added. | ||
(:issue:`2525`) | ||
By `Keisuke Fujii <https://github.com/fujiisoup>`_. | ||
- Enable passing ``rasterio.io.DatasetReader`` or ``rasterio.vrt.WarpedVRT`` to | ||
``open_rasterio`` instead of file path string. Allows for in-memory | ||
reprojection, see (:issue:`2588`). | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -590,6 +590,65 @@ def rolling(self, dim=None, min_periods=None, center=False, **dim_kwargs): | |
return self._rolling_cls(self, dim, min_periods=min_periods, | ||
center=center) | ||
|
||
def coarsen(self, dim=None, boundary='exact', side='left', | ||
coord_func='mean', **dim_kwargs): | ||
""" | ||
Coarsen object. | ||
|
||
Parameters | ||
---------- | ||
dim: dict, optional | ||
Mapping from the dimension name to the window size. | ||
dim : str | ||
Name of the dimension to create the rolling iterator | ||
along (e.g., `time`). | ||
window : int | ||
Size of the moving window. | ||
boundary : 'exact' | 'trim' | 'pad' | ||
If 'exact', a ValueError will be raised if dimension size is not a | ||
multiple of the window size. If 'trim', the excess entries are | ||
dropped. If 'pad', NA will be padded. | ||
side : 'left' or 'right' or mapping from dimension to 'left' or 'right' | ||
coord_func: function (name) that is applied to the coordintes, | ||
or a mapping from coordinate name to function (name). | ||
|
||
Returns | ||
------- | ||
Coarsen object (core.rolling.DataArrayCoarsen for DataArray, | ||
core.rolling.DatasetCoarsen for Dataset.) | ||
|
||
Examples | ||
-------- | ||
Coarsen the long time series by averaging over every four days. | ||
|
||
>>> da = xr.DataArray(np.linspace(0, 364, num=364), | ||
... dims='time', | ||
... coords={'time': pd.date_range( | ||
... '15/12/1999', periods=364)}) | ||
>>> da | ||
>>> <xarray.DataArray (time: 364)> | ||
>>> array([ 0. , 1.002755, 2.00551 , ..., 362.997245, | ||
364. ]) | ||
>>> Coordinates: | ||
>>> * time (time) datetime64[ns] 1999-12-15 ... 2000-12-12 | ||
>>> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the results here should not be prefaced with |
||
>>> da.coarsen(time=4).mean() | ||
>>> <xarray.DataArray (time: 91)> | ||
>>> array([ 1.504132, 5.515152, 9.526171, 13.53719 , ..., | ||
>>> 362.495868]) | ||
>>> Coordinates: | ||
>>> * time (time) datetime64[ns] 1999-12-16T12:00:00 ... | ||
|
||
See Also | ||
-------- | ||
core.rolling.DataArrayCoarsen | ||
core.rolling.DatasetCoarsen | ||
""" | ||
dim = either_dict_or_kwargs(dim, dim_kwargs, 'coarsen') | ||
return self._coarsen_cls( | ||
self, dim, boundary=boundary, side=side, | ||
coord_func=coord_func) | ||
|
||
def resample(self, indexer=None, skipna=None, closed=None, label=None, | ||
base=0, keep_attrs=None, loffset=None, **indexer_kwargs): | ||
"""Returns a Resample object for performing resampling operations. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,7 @@ | |
import numpy as np | ||
import pandas as pd | ||
|
||
from . import dask_array_ops, dtypes, npcompat, nputils | ||
from . import dask_array_ops, dtypes, npcompat, nputils, utils | ||
from .nputils import nanfirst, nanlast | ||
from .pycompat import dask_array_type | ||
|
||
|
@@ -261,8 +261,6 @@ def f(values, axis=None, skipna=None, **kwargs): | |
sum = _create_nan_agg_method('sum') | ||
sum.numeric_only = True | ||
sum.available_min_count = True | ||
mean = _create_nan_agg_method('mean') | ||
mean.numeric_only = True | ||
std = _create_nan_agg_method('std') | ||
std.numeric_only = True | ||
var = _create_nan_agg_method('var') | ||
|
@@ -278,6 +276,25 @@ def f(values, axis=None, skipna=None, **kwargs): | |
cumsum_1d.numeric_only = True | ||
|
||
|
||
_mean = _create_nan_agg_method('mean') | ||
|
||
|
||
def mean(array, axis=None, skipna=None, **kwargs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to make this compatible with CFTime index. @spencerkclark, could you comment for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @fujiisoup! I think something like the following would work: from ..coding.times import format_cftime_datetime
from .common import contains_cftime_datetimes
def mean(array, axis=None, skipna=None, **kwargs):
array = asarray(array)
if array.dtype.kind == 'M':
offset = min(array)
# infer the compatible timedelta dtype
dtype = (np.empty((1,), dtype=array.dtype) - offset).dtype
return _mean(utils.datetime_to_numeric(array, offset), axis=axis,
skipna=skipna, **kwargs).astype(dtype) + offset
elif contains_cftime_datetimes(xr.DataArray(array)):
import cftime
offset = min(array)
numeric_dates = utils.datetime_to_numeric(xr.DataArray(array), offset,
datetime_unit='s').data
mean_dates = _mean(numeric_dates, axis=axis, skipna=skipna, **kwargs)
units = 'seconds since {}'.format(format_cftime_datetime(offset))
calendar = offset.calendar
return cftime.num2date(mean_dates, units=units, calendar=calendar,
only_use_cftime_datetimes=True)
else:
return _mean(array, axis=axis, skipna=skipna, **kwargs) Ideally we would modify There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, @spencerkclark It would be nice if you could send a follow-up PR :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure thing, I'd be happy to take care of making this compatible with cftime dates. |
||
""" inhouse mean that can handle datatime dtype """ | ||
array = asarray(array) | ||
if array.dtype.kind == 'M': | ||
offset = min(array) | ||
# infer the compatible timedelta dtype | ||
dtype = (np.empty((1,), dtype=array.dtype) - offset).dtype | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just to find the corresponding timedelta from datetime. Is there any good function to find an appropriate dtype? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could be missing something, but since xarray always coerces all NumPy dates to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. I just realized we are always using [ns] for datetime. Updated. |
||
return _mean(utils.datetime_to_numeric(array, offset), axis=axis, | ||
skipna=skipna, **kwargs).astype(dtype) + offset | ||
else: | ||
return _mean(array, axis=axis, skipna=skipna, **kwargs) | ||
|
||
|
||
mean.numeric_only = True | ||
|
||
|
||
def _nd_cum_func(cum_func, array, axis, **kwargs): | ||
array = asarray(array) | ||
if axis is None: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,7 +39,6 @@ | |
|
||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
|
||
# for pandas 0.19 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
coord_func
, notcoordinate_func
.