Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch enable_cftimeindex to True by default #2516

Merged
merged 17 commits into from
Nov 1, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,4 @@
plot.FacetGrid.map

CFTimeIndex.shift
CFTimeIndex.to_datetimeindex
103 changes: 64 additions & 39 deletions doc/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,11 @@ One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
can be used for indexing. The :py:class:`~xarray.CFTimeIndex` enables only a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only enabled
when using the standalone version of ``cftime`` (not the version packaged with
earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more information.
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
fully compatible with the standalone version of ``cftime`` (not the version
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
information.

Datetime indexing
-----------------
Expand Down Expand Up @@ -221,18 +222,28 @@ Non-standard calendars and dates outside the Timestamp-valid range
Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars or dates using a standard calendar, but
outside the `Timestamp-valid range`_ (approximately between years 1678 and
2262). This behavior has not yet been turned on by default; to take advantage
of this functionality, you must have the ``enable_cftimeindex`` option set to
``True`` within your context (see :py:func:`~xarray.set_options` for more
information). It is expected that this will become the default behavior in
xarray version 0.11.

For instance, you can create a DataArray indexed by a time
coordinate with a no-leap calendar within a context manager setting the
``enable_cftimeindex`` option, and the time index will be cast to a
:py:class:`~xarray.CFTimeIndex`:
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `Timestamp-valid range`_
(approximately between years 1678 and 2262).

.. note::

As of xarray version 0.11, by default, :py:class:`cftime.datetime` objects
will be used to represent times (either in indexes, as a
:py:class:`~xarray.CFTimeIndex`, or in data arrays with dtype object) if
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range.

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:

.. ipython:: python

Expand All @@ -241,27 +252,11 @@ coordinate with a no-leap calendar within a context manager setting the

dates = [DatetimeNoLeap(year, month, 1) for year, month in
product(range(1, 3), range(1, 13))]
with xr.set_options(enable_cftimeindex=True):
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'],
name='foo')
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

.. note::

With the ``enable_cftimeindex`` option activated, a :py:class:`~xarray.CFTimeIndex`
will be used for time indexing if any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range

Otherwise a :py:class:`pandas.DatetimeIndex` will be used. In addition, if any
variable (not just an index variable) is encoded using a non-standard
calendar, its times will be decoded into :py:class:`cftime.datetime` objects,
regardless of whether or not they can be represented using
``np.datetime64[ns]`` objects.

xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For instance, we can
create the same dates and DataArray we created above using:
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For
instance, we can create the same dates and DataArray we created above using:

.. ipython:: python

Expand Down Expand Up @@ -317,13 +312,43 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

.. ipython:: python

da.to_netcdf('example.nc')
xr.open_dataset('example.nc')
da.to_netcdf('example-no-leap.nc')
xr.open_dataset('example-no-leap.nc')

.. note::

Currently resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` is not supported.
While much of the time series functionality that is possible for standard
dates has been implemented for dates from non-standard calendars, there are
still some remaining important features that have yet to be implemented,
for example:

- Resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` (:issue:`2191`, :issue:`2458`)
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes
(:issue:`2164`).

For some use-cases it may still be useful to convert from
a :py:class:`~xarray.CFTimeIndex` to a :py:class:`pandas.DatetimeIndex`,
despite the difference in calendar types (e.g. to allow the use of some
forms resample with non-standard calendars). The recommended way of doing
this is to use the built-in :py:meth:`~xarray.CFTimeIndex.to_datetimeindex`
method:

.. ipython:: python

modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')
da = xr.DataArray(range(24), [('time', modern_times)])
da
datetimeindex = da.indexes['time'].to_datetimeindex()
da['time'] = datetimeindex
da.resample(time='Y').mean('time')

However in this case one should
use caution to only perform operations which do not depend on differences
between dates (e.g. differentiation, interpolation, or upsampling with
resample), as these could introduce subtle and silent errors due to the
difference in calendar types between the dates encoded in your data and the
dates stored in memory.

.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601
Expand Down
11 changes: 11 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,17 @@ v0.11.0 (unreleased)
Breaking changes
~~~~~~~~~~~~~~~~

- For non-standard calendars commonly used in climate science, xarray will now
always use :py:class:`cftime.datetime` objects, rather than by default try to
coerce them to ``np.datetime64[ns]`` objects. A
:py:class:`~xarray.CFTimeIndex` will be used for indexing along time
coordinates in these cases. A new method,
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex`, has been added
to aid in converting from a :py:class:`~xarray.CFTimeIndex` to a
:py:class:`pandas.DatetimeIndex` for the remaining use-cases where
using a :py:class:`~xarray.CFTimeIndex` is still a limitation (e.g. for
resample or plotting). Setting the ``enable_cftimeindex`` option is now a
no-op and emits a ``FutureWarning``.
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`.
Call :py:meth:`Dataset.transpose` directly instead.
- Iterating over a ``Dataset`` now includes only data variables, not coordinates.
Expand Down
47 changes: 47 additions & 0 deletions xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
from __future__ import absolute_import

import re
import warnings
from datetime import timedelta

import numpy as np
Expand All @@ -50,6 +51,8 @@
from xarray.core import pycompat
from xarray.core.utils import is_scalar

from .times import cftime_to_nptime, infer_calendar_name, _STANDARD_CALENDARS


def named(name, pattern):
return '(?P<' + name + '>' + pattern + ')'
Expand Down Expand Up @@ -381,6 +384,50 @@ def _add_delta(self, deltas):
# pandas. No longer used as of pandas 0.23.
return self + deltas

def to_datetimeindex(self):
"""If possible, convert this index to a pandas.DatetimeIndex.

Returns
-------
pandas.DatetimeIndex

Raises
------
ValueError
If the CFTimeIndex contains dates that are not possible in the
standard calendar or outside the pandas.Timestamp-valid range.

Warns
-----
RuntimeWarning
If converting from a non-standard calendar to a DatetimeIndex.

Warnings
--------
Note that for non-standard calendars, this will change the calendar
type of the index. In that case the result of this method should be
used with caution.

Examples
--------
>>> import xarray as xr
>>> times = xr.cftime_range('2000', periods=2, calendar='gregorian')
>>> times
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00], dtype='object')
>>> times.to_datetimeindex()
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[ns]', freq=None)
""" # noqa: E501
nptimes = cftime_to_nptime(self)
calendar = infer_calendar_name(self)
if calendar not in _STANDARD_CALENDARS:
warnings.warn(
'Converting a CFTimeIndex with dates from a non-standard '
'calendar, {!r}, to a pandas.DatetimeIndex, which uses dates '
'from the standard calendar. This may lead to subtle errors '
'in operations that depend on the length of time between '
'dates.'.format(calendar), RuntimeWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, but can we add a keyword argument for silencing the warning? Maybe index.to_datetimeindex(unsafe=True) could silence the warning?

return pd.DatetimeIndex(nptimes)


def _parse_iso8601_without_reso(date_type, datetime_str):
date, _ = _parse_iso8601_with_reso(date_type, datetime_str)
Expand Down
56 changes: 23 additions & 33 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from ..core import indexing
from ..core.common import contains_cftime_datetimes
from ..core.formatting import first_n_items, format_timestamp, last_item
from ..core.options import OPTIONS
from ..core.pycompat import PY3
from ..core.variable import Variable
from .variables import (
Expand Down Expand Up @@ -61,8 +60,9 @@ def _require_standalone_cftime():
try:
import cftime # noqa: F401
except ImportError:
raise ImportError('Using a CFTimeIndex requires the standalone '
'version of the cftime library.')
raise ImportError('Decoding times with non-standard calendars '
'or outside the pandas.Timestamp-valid range '
'requires the standalone cftime package.')


def _netcdf_to_numpy_timeunit(units):
Expand All @@ -84,41 +84,32 @@ def _unpack_netcdf_time_units(units):
return delta_units, ref_date


def _decode_datetime_with_cftime(num_dates, units, calendar,
enable_cftimeindex):
def _decode_datetime_with_cftime(num_dates, units, calendar):
cftime = _import_cftime()
if enable_cftimeindex:
_require_standalone_cftime()

if cftime.__name__ == 'cftime':
dates = np.asarray(cftime.num2date(num_dates, units, calendar,
only_use_cftime_datetimes=True))
else:
# Must be using num2date from an old version of netCDF4 which
# does not have the only_use_cftime_datetimes option.
dates = np.asarray(cftime.num2date(num_dates, units, calendar))

if (dates[np.nanargmin(num_dates)].year < 1678 or
dates[np.nanargmax(num_dates)].year >= 2262):
if not enable_cftimeindex or calendar in _STANDARD_CALENDARS:
if calendar in _STANDARD_CALENDARS:
warnings.warn(
'Unable to decode time axis into full '
'numpy.datetime64 objects, continuing using dummy '
'cftime.datetime objects instead, reason: dates out '
'of range', SerializationWarning, stacklevel=3)
else:
if enable_cftimeindex:
if calendar in _STANDARD_CALENDARS:
dates = cftime_to_nptime(dates)
else:
try:
dates = cftime_to_nptime(dates)
except ValueError as e:
warnings.warn(
'Unable to decode time axis into full '
'numpy.datetime64 objects, continuing using '
'dummy cftime.datetime objects instead, reason:'
'{0}'.format(e), SerializationWarning, stacklevel=3)
if calendar in _STANDARD_CALENDARS:
dates = cftime_to_nptime(dates)
return dates


def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
def _decode_cf_datetime_dtype(data, units, calendar):
# Verify that at least the first and last date can be decoded
# successfully. Otherwise, tracebacks end up swallowed by
# Dataset.__repr__ when users try to view their lazily decoded array.
Expand All @@ -128,8 +119,7 @@ def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
last_item(values) or [0]])

try:
result = decode_cf_datetime(example_value, units, calendar,
enable_cftimeindex)
result = decode_cf_datetime(example_value, units, calendar)
except Exception:
calendar_msg = ('the default calendar' if calendar is None
else 'calendar %r' % calendar)
Expand All @@ -145,8 +135,7 @@ def _decode_cf_datetime_dtype(data, units, calendar, enable_cftimeindex):
return dtype


def decode_cf_datetime(num_dates, units, calendar=None,
enable_cftimeindex=False):
def decode_cf_datetime(num_dates, units, calendar=None):
"""Given an array of numeric dates in netCDF format, convert it into a
numpy array of date time objects.

Expand Down Expand Up @@ -200,8 +189,7 @@ def decode_cf_datetime(num_dates, units, calendar=None,

except (OutOfBoundsDatetime, OverflowError):
dates = _decode_datetime_with_cftime(
flat_num_dates.astype(np.float), units, calendar,
enable_cftimeindex)
flat_num_dates.astype(np.float), units, calendar)

return dates.reshape(num_dates.shape)

Expand Down Expand Up @@ -291,7 +279,12 @@ def cftime_to_nptime(times):
times = np.asarray(times)
new = np.empty(times.shape, dtype='M8[ns]')
for i, t in np.ndenumerate(times):
dt = datetime(t.year, t.month, t.day, t.hour, t.minute, t.second)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason we did not include the microsecond attribute of the datetime here previously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly microsecond is only a recent addition to cftime.datetime?

try:
dt = pd.Timestamp(t.year, t.month, t.day, t.hour, t.minute,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to using pd.Timestamp here, because directly converting from datetime.datetime to np.datetime64[ns] did not raise an error for dates outside 1678 to 2262:

In [1]: from datetime import datetime; import numpy as np

In [2]: np.datetime64(datetime(1, 1, 1), 'ns')
Out[2]: numpy.datetime64('1754-08-30T22:43:41.128654848')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good -- please note this in a comment in the code

The usual rule is that if you have to comment on your pull request to explain why you did something, you should just comment the code instead! :)

t.second)
except ValueError as e:
raise ValueError('Cannot convert date {} to a date in the '
'standard calendar. Reason: {}.'.format(t, e))
new[i] = np.datetime64(dt)
return new

Expand Down Expand Up @@ -399,15 +392,12 @@ def encode(self, variable, name=None):
def decode(self, variable, name=None):
dims, data, attrs, encoding = unpack_for_decoding(variable)

enable_cftimeindex = OPTIONS['enable_cftimeindex']
if 'units' in attrs and 'since' in attrs['units']:
units = pop_to(attrs, encoding, 'units')
calendar = pop_to(attrs, encoding, 'calendar')
dtype = _decode_cf_datetime_dtype(
data, units, calendar, enable_cftimeindex)
dtype = _decode_cf_datetime_dtype(data, units, calendar)
transform = partial(
decode_cf_datetime, units=units, calendar=calendar,
enable_cftimeindex=enable_cftimeindex)
decode_cf_datetime, units=units, calendar=calendar)
data = lazy_elemwise_func(data, transform, dtype)

return Variable(dims, data, attrs, encoding)
Expand Down
Loading