Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds.mean bugs with cftime objects #5897

Open
aulemahal opened this issue Oct 25, 2021 · 1 comment
Open

ds.mean bugs with cftime objects #5897

aulemahal opened this issue Oct 25, 2021 · 1 comment

Comments

@aulemahal
Copy link
Contributor

What happened:
Given a dataset that has a variable with cftime objects along dimension A, averaging (mean) leads to buggy behaviour:

  1. Averaging over 'A' drops the variable instead of averaging it.
  2. Averaging over any other dimension will fail if that variable is on the dask backend.

What you expected to happen:

  1. I expected the average to fail in the case of a dask-backed cftime variable, given that this code exists:
    elif _contains_cftime_datetimes(array):
    if is_duck_dask_array(array):
    raise NotImplementedError(
    "Computing the mean of an array containing "
    "cftime.datetime objects is not yet implemented on "
    "dask arrays."
    )
    offset = min(array)
    timedeltas = datetime_to_numeric(array, offset, datetime_unit="us")
    mean_timedeltas = _mean(timedeltas, axis=axis, skipna=skipna, **kwargs)
    return _to_pytimedelta(mean_timedeltas, unit="us") + offset

And I expected the average to work (not drop the var) in the case of the numpy backend.

  1. I expected the fact that dask is used to be irrelevant to the result. I expected the mean to conserve the cftime variable as-is since it doesn't include the averaged dimension.

Minimal Complete Verifiable Example:

# Put your MCVE code here
import xarray as xr

ds = xr.Dataset({
    'var1': (('time',), xr.cftime_range('2021-10-31', periods=10, freq='D')),
    'var2': (('x',), list(range(10)))
 })
# var1 contains cftime objects
# var2 contains integers
# They do not share dims

ds.mean('time')  # var1 has disappeared instead of being averaged

ds.mean('x') # Everything ok

dsc = ds.chunk({})

dsc.mean('time') # var1 has disappeared. I would expected this line to fail.

dsc.mean('x') # Raises NotImplementedError. I would expect this line to run flawlessly.

Anything else we need to know?:
A culprit is #5393, but maybe the bug is older? I think the change introduced there causes the issue (2) above.

In duck_array_ops.py the mean operation is declared numeric_only, which is kinda incoherent with the implementation allowing means of datetime objects. This setting causes my (1) above.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: fdabf3b
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.14.12-arch1-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.utf8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.19.1.dev89+gfdabf3be
pandas: 1.3.4
numpy: 1.21.3
scipy: 1.7.1
netCDF4: 1.5.7
pydap: installed
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.10.1
cftime: 1.5.1
nc_time_axis: 1.4.0
PseudoNetCDF: installed
rasterio: 1.2.10
cfgrib: 0.9.9.1
iris: 3.1.0
bottleneck: 1.3.2
dask: 2021.10.0
distributed: 2021.10.0
matplotlib: 3.4.3
cartopy: 0.20.1
seaborn: 0.11.2
numbagg: 0.2.1
fsspec: 2021.10.1
cupy: None
pint: 0.17
sparse: 0.13.0
setuptools: 58.2.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.28.0
sphinx: None

@andersy005
Copy link
Member

This applies to pandas datetime objects as well (#5898)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants