Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch enable_cftimeindex to True by default #2516

Merged
merged 17 commits into from
Nov 1, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 50 additions & 36 deletions doc/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,11 @@ One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
can be used for indexing. The :py:class:`~xarray.CFTimeIndex` enables only a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only enabled
when using the standalone version of ``cftime`` (not the version packaged with
earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more information.
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
fully compatible with the standalone version of ``cftime`` (not the version
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
information.

Datetime indexing
-----------------
Expand Down Expand Up @@ -223,16 +224,26 @@ Through the standalone ``cftime`` library and a custom subclass of
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars or dates using a standard calendar, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify "non-standard calendars" -> "non-standard calendars commonly used in climate science"

This paragraph is missing a bit of context for non-climate users :)

outside the `Timestamp-valid range`_ (approximately between years 1678 and
2262). This behavior has not yet been turned on by default; to take advantage
of this functionality, you must have the ``enable_cftimeindex`` option set to
``True`` within your context (see :py:func:`~xarray.set_options` for more
information). It is expected that this will become the default behavior in
xarray version 0.11.
2262).

For instance, you can create a DataArray indexed by a time
coordinate with a no-leap calendar within a context manager setting the
``enable_cftimeindex`` option, and the time index will be cast to a
:py:class:`~xarray.CFTimeIndex`:
.. note::

As of xarray version 0.11, by default, :py:class:`cftime.datetime` objects
will be used to represent times (either in indexes, as a
:py:class:`~xarray.CFTimeIndex`, or in data arrays with dtype object) if
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range.

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:

.. ipython:: python

Expand All @@ -241,27 +252,11 @@ coordinate with a no-leap calendar within a context manager setting the

dates = [DatetimeNoLeap(year, month, 1) for year, month in
product(range(1, 3), range(1, 13))]
with xr.set_options(enable_cftimeindex=True):
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'],
name='foo')
da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

.. note::

With the ``enable_cftimeindex`` option activated, a :py:class:`~xarray.CFTimeIndex`
will be used for time indexing if any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range

Otherwise a :py:class:`pandas.DatetimeIndex` will be used. In addition, if any
variable (not just an index variable) is encoded using a non-standard
calendar, its times will be decoded into :py:class:`cftime.datetime` objects,
regardless of whether or not they can be represented using
``np.datetime64[ns]`` objects.

xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For instance, we can
create the same dates and DataArray we created above using:
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For
instance, we can create the same dates and DataArray we created above using:

.. ipython:: python

Expand Down Expand Up @@ -317,13 +312,32 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

.. ipython:: python

da.to_netcdf('example.nc')
xr.open_dataset('example.nc')
da.to_netcdf('example-no-leap.nc')
xr.open_dataset('example-no-leap.nc')

.. note::

Currently resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` is not supported.
While much of the time series functionality that is possible for standard
dates has been implemented for dates from non-standard calendars, there are
still some remaining important features that have yet to be implemented,
for example:

- Resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex`
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to link to the GitHub issues for these.


If at any time you would like to restore the old default behavior, which was
to attempt to decode datetimes into ``np.datetime64[ns]`` objects whenever
possible (regardless of calendar type), you can set ``enable_cftimeindex`` to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather encourage people to convert their data to a standard calendar via normal xarray/cftime APIs. That's a cleaner solution than setting global configuration.

Can we add an example of this with existing APIs? Or maybe we should augment CFTimeIndex with a new to_datetime64 method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather encourage people to convert their data to a standard calendar via normal xarray/cftime APIs. That's a cleaner solution than setting global configuration.

This is a really good point and would make things a lot easier to understand. Currently we use xarray.coding.times.cftime_to_nptime internally; I can convert it to a better-documented public function (perhaps with friendlier error handling too).

``False`` within your context when opening a file (see
:py:func:`~xarray.set_options` for more information). For some use-cases
this behavior may still be useful (e.g. to allow the use of some forms
resample with non-standard calendars); however in this case one should use
caution to only perform operations which do not depend on differences
between dates (e.g. differentiation, interpolation, or upsampling with
resample), as these could introduce subtle and silent errors due to the
difference in calendar types between the dates encoded in your data and the
dates stored in memory.

.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601
Expand Down
6 changes: 6 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ v0.11.0 (unreleased)
Breaking changes
~~~~~~~~~~~~~~~~

- The option ``enable_cftimeindex`` has now been set to ``True`` by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about eventually removing this option entirely, and making it simply a "no-op" for now (raising FutureWarning). In the long term I think we would prefer to avoid keeping around extraneous global options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- that will clean up the datetime decoding logic (and tests) quite a bit.

This means that by default any dates encoded using a non-standard calendar
will be decoded into objects of type :py:class:`cftime.datetime`, regardless
of whether or not it might be possible to coerce them into
``np.datetime64[ns]`` objects. One can explicitly set the option to
``False`` to restore the old behavior.
- ``Dataset.T`` has been removed as a shortcut for :py:meth:`Dataset.transpose`.
Call :py:meth:`Dataset.transpose` directly instead.
- Iterating over a ``Dataset`` now includes only data variables, not coordinates.
Expand Down
4 changes: 2 additions & 2 deletions xarray/core/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
OPTIONS = {
DISPLAY_WIDTH: 80,
ARITHMETIC_JOIN: 'inner',
ENABLE_CFTIMEINDEX: False,
ENABLE_CFTIMEINDEX: True,
FILE_CACHE_MAXSIZE: 128,
CMAP_SEQUENTIAL: 'viridis',
CMAP_DIVERGENT: 'RdBu_r',
Expand Down Expand Up @@ -52,7 +52,7 @@ class set_options(object):
Default: ``'inner'``.
- ``enable_cftimeindex``: flag to enable using a ``CFTimeIndex``
for time indexes with non-standard calendars or dates outside the
Timestamp-valid range. Default: ``False``.
Timestamp-valid range. Default: ``True``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this now -- no need to document a deprecated/unused option.

- ``file_cache_maxsize``: maximum number of open files to hold in xarray's
global least-recently-usage cached. This should be smaller than your
system's per-process file descriptor limit, e.g., ``ulimit -n`` on Linux.
Expand Down