Merge branch 'master' into open_mfzarr

pydata · Sep 21, 2020 · ca4e526 · ca4e526
2 parents 40c4d46 + 1155f56
commit ca4e526
Show file tree

Hide file tree

Showing 19 changed files with 259 additions and 73 deletions.
diff --git a/HOW_TO_RELEASE.md b/HOW_TO_RELEASE.md
@@ -3,28 +3,34 @@
 Time required: about an hour.
 
 These instructions assume that `upstream` refers to the main repository:
-```
+
+```sh
 $ git remote -v
 {...}
 upstream        https://github.com/pydata/xarray (fetch)
 upstream        https://github.com/pydata/xarray (push)
 ```
 
+<!-- markdownlint-disable MD031 -->
+
  1. Ensure your master branch is synced to upstream:
      ```sh
      git pull upstream master
      ```
- 2. Get a list of contributors with:
+ 2. Add a list of contributors with:
     ```sh
     git log "$(git tag --sort="v:refname" | sed -n 'x;$p').." --format=%aN | sort -u | perl -pe 's/\n/$1, /'
     ```
     or by substituting the _previous_ release in {0.X.Y-1}:
     ```sh
     git log v{0.X.Y-1}.. --format=%aN | sort -u | perl -pe 's/\n/$1, /'
     ```
-    Add these into `whats-new.rst` somewhere :)
+    This will return the number of contributors:
+    ```sh
+    git log v{0.X.Y-1}.. --format=%aN | sort -u | wc -l
+    ```
  3. Write a release summary: ~50 words describing the high level features. This
-    will be used in the release emails, tweets, GitHub release notes, etc. 
+    will be used in the release emails, tweets, GitHub release notes, etc.
  4. Look over whats-new.rst and the docs. Make sure "What's New" is complete
     (check the date!) and add the release summary at the top.
     Things to watch out for:
@@ -45,7 +51,7 @@ upstream        https://github.com/pydata/xarray (push)
       ```
  8. Check that the ReadTheDocs build is passing.
  9. On the master branch, commit the release in git:
-      ```s
+      ```sh
       git commit -am 'Release v{0.X.Y}'
       ```
 10. Tag the release:
@@ -67,7 +73,7 @@ upstream        https://github.com/pydata/xarray (push)
       twine upload dist/xarray-{0.X.Y}*
       ```
     You will need to be listed as a package owner at
-    https://pypi.python.org/pypi/xarray for this to work.
+    <https://pypi.python.org/pypi/xarray> for this to work.
 14. Push your changes to master:
       ```sh
       git push upstream master
@@ -80,11 +86,11 @@ upstream        https://github.com/pydata/xarray (push)
       git push --force upstream stable
       git checkout master
      ```
-    It's OK to force push to 'stable' if necessary. (We also update the stable 
-    branch with `git cherry-pick` for documentation only fixes that apply the 
+    It's OK to force push to 'stable' if necessary. (We also update the stable
+    branch with `git cherry-pick` for documentation only fixes that apply the
     current released version.)
 16. Add a section for the next release {0.X.Y+1} to doc/whats-new.rst:
-     ```
+     ```rst
      .. _whats-new.{0.X.Y+1}:
 
      v{0.X.Y+1} (unreleased)
@@ -116,12 +122,12 @@ upstream        https://github.com/pydata/xarray (push)
       ```
     You're done pushing to master!
 18. Issue the release on GitHub. Click on "Draft a new release" at
-    https://github.com/pydata/xarray/releases. Type in the version number
+    <https://github.com/pydata/xarray/releases>. Type in the version number
     and paste the release summary in the notes.
-19. Update the docs. Login to https://readthedocs.org/projects/xray/versions/
+19. Update the docs. Login to <https://readthedocs.org/projects/xray/versions/>
     and switch your new release tag (at the bottom) from "Inactive" to "Active".
     It should now build automatically.
-20. Issue the release announcement to mailing lists & Twitter. For bug fix releases, I 
+20. Issue the release announcement to mailing lists & Twitter. For bug fix releases, I
     usually only email [email protected]. For major/feature releases, I will email a broader
     list (no more than once every 3-6 months):
       - [email protected]
@@ -133,6 +139,8 @@ upstream        https://github.com/pydata/xarray (push)
     Google search will turn up examples of prior release announcements (look for
     "ANN xarray").
 
+<!-- markdownlint-enable MD013 -->
+
 ## Note on version numbering
 
 We follow a rough approximation of semantic version. Only major releases (0.X.0)

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -74,6 +74,18 @@ jobs:
   - bash: black --check .
     displayName: black formatting check
 
+- job: Doctests
+  variables:
+    conda_env: py38
+  pool:
+    vmImage: 'ubuntu-16.04'
+  steps:
+    - template: ci/azure/install.yml
+    - bash: |
+        source activate xarray-tests
+        python -m pytest --doctest-modules xarray --ignore xarray/tests
+      displayName: Run doctests
+
 - job: TypeChecking
   variables:
     conda_env: py38

diff --git a/ci/requirements/py36-min-nep18.yml b/ci/requirements/py36-min-nep18.yml
@@ -10,7 +10,7 @@ dependencies:
   - distributed=2.9
   - numpy=1.17
   - pandas=0.25
-  - pint=0.13
+  - pint=0.15
   - pip
   - pytest
   - pytest-cov

diff --git a/doc/indexing.rst b/doc/indexing.rst
@@ -339,7 +339,7 @@ MATLAB, or after using the :py:func:`numpy.ix_` helper:
         coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]},
     )
     da
-    da[[0, 1], [1, 1]]
+    da[[0, 2, 2], [1, 3]]
 
 For more flexibility, you can supply :py:meth:`~xarray.DataArray` objects
 as indexers.

diff --git a/doc/io.rst b/doc/io.rst
@@ -26,7 +26,7 @@ The recommended way to store xarray data structures is `netCDF`__, which
 is a binary file format for self-described datasets that originated
 in the geosciences. xarray is based on the netCDF data model, so netCDF files
 on disk directly correspond to :py:class:`Dataset` objects (more accurately,
-a group in a netCDF file directly corresponds to a to :py:class:`Dataset` object.
+a group in a netCDF file directly corresponds to a :py:class:`Dataset` object.
 See :ref:`io.netcdf_groups` for more.)
 
 NetCDF is supported on almost all platforms, and parsers exist

diff --git a/doc/quick-overview.rst b/doc/quick-overview.rst
@@ -46,7 +46,7 @@ Here are the key properties for a ``DataArray``:
 Indexing
 --------
 
-xarray supports four kind of indexing. Since we have assigned coordinate labels to the x dimension we can use label-based indexing along that dimension just like pandas. The four examples below all yield the same result (the value at `x=10`) but at varying levels of convenience and intuitiveness.
+xarray supports four kinds of indexing. Since we have assigned coordinate labels to the x dimension we can use label-based indexing along that dimension just like pandas. The four examples below all yield the same result (the value at `x=10`) but at varying levels of convenience and intuitiveness.
 
 .. ipython:: python
 

diff --git a/doc/whats-new.rst b/doc/whats-new.rst
@@ -14,49 +14,93 @@ What's New
 
     np.random.seed(123456)
 
+
+.. _whats-new.0.16.2:
+
+v0.16.2 (unreleased)
+--------------------
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+
+New Features
+~~~~~~~~~~~~
+
+- :py:func:`open_dataset` and :py:func:`open_mfdataset`
+  now works with ``engine="zarr"`` (:issue:`3668`, :pull:`4003`, :pull:`4187`).
+  By `Miguel Jimenez <https://github.com/Mikejmnez>`_ and `Wei Ji Leong <https://github.com/weiji14>`_.
+
+Bug fixes
+~~~~~~~~~
+
+
+Documentation
+~~~~~~~~~~~~~
+
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+
 .. _whats-new.0.16.1:
 
-v0.16.1 (unreleased)
+v0.16.1 (2020-09-20)
 ---------------------
 
+This patch release fixes an incompatibility with a recent pandas change, which
+was causing an issue indexing with a ``datetime64``. It also includes
+improvements to ``rolling``, ``to_dataframe``, ``cov`` & ``corr`` methods and
+bug fixes. Our documentation has a number of improvements, including fixing all
+doctests and confirming their accuracy on every commit.
+
+Many thanks to the 36 contributors who contributed to this release:
+
+Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, Alexandre Poux,
+Caleb, Dan Nowacki, Deepak Cherian, Gerardo Rivera, Jacob Tomlinson, James A.
+Bednar, Joe Hamman, Julia Kent, Kai Mühlbauer, Keisuke Fujii, Mathias Hauser,
+Maximilian Roos, Nick R. Papior, Pascal Bourgault, Peter Hausamann, Romain
+Martinez, Russell Manser, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer,
+Thomas Zilio, Tobias Kölling, Tom Augspurger, alexamici, crusaderky, darikg,
+inakleinbottle, jenssss, johnomotani, keewis, and rpgoldman.
+
 Breaking changes
 ~~~~~~~~~~~~~~~~
+
 - :py:meth:`DataArray.astype` and :py:meth:`Dataset.astype` now preserve attributes. Keep the
   old behavior by passing `keep_attrs=False` (:issue:`2049`, :pull:`4314`).
   By `Dan Nowacki <https://github.com/dnowacki-usgs>`_ and `Gabriel Joel Mitchell <https://github.com/gajomi>`_.
 
 New Features
 ~~~~~~~~~~~~
-- Support multiple outputs in :py:func:`xarray.apply_ufunc` when using ``dask='parallelized'``. (:issue:`1815`, :pull:`4060`)
-  By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
+
 - :py:meth:`~xarray.DataArray.rolling` and :py:meth:`~xarray.Dataset.rolling`
   now accept more than 1 dimension. (:pull:`4219`)
   By `Keisuke Fujii <https://github.com/fujiisoup>`_.
+- :py:meth:`~xarray.DataArray.to_dataframe` and :py:meth:`~xarray.Dataset.to_dataframe`
+  now accept a ``dim_order`` parameter allowing to specify the resulting dataframe's
+  dimensions order (:issue:`4331`, :pull:`4333`).
+  By `Thomas Zilio <https://github.com/thomas-z>`_.
+- Support multiple outputs in :py:func:`xarray.apply_ufunc` when using
+  ``dask='parallelized'``. (:issue:`1815`, :pull:`4060`).
+  By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
 - ``min_count`` can be supplied to reductions such as ``.sum`` when specifying
-  multiple dimension to reduce over. (:pull:`4356`) 
+  multiple dimension to reduce over; (:pull:`4356`).
   By `Maximilian Roos <https://github.com/max-sixty>`_.
-- :py:func:`xarray.cov` and :py:func:`xarray.corr` now handle missing values. (:pull:`4351`)
+- :py:func:`xarray.cov` and :py:func:`xarray.corr` now handle missing values; (:pull:`4351`).
   By `Maximilian Roos <https://github.com/max-sixty>`_.
+- Add support for parsing datetime strings formatted following the default
+  string representation of cftime objects, i.e. YYYY-MM-DD hh:mm:ss, in
+  partial datetime string indexing, as well as :py:meth:`~xarray.cftime_range`
+  (:issue:`4337`). By `Spencer Clark <https://github.com/spencerkclark>`_.
 - Build ``CFTimeIndex.__repr__`` explicitly as :py:class:`pandas.Index`. Add ``calendar`` as a new
   property for :py:class:`CFTimeIndex` and show ``calendar`` and ``length`` in
   ``CFTimeIndex.__repr__`` (:issue:`2416`, :pull:`4092`)
   By `Aaron Spring <https://github.com/aaronspring>`_.
-- Relaxed the :ref:`mindeps_policy` to support:
-
-  - all versions of setuptools released in the last 42 months (but no older than 38.4)
-  - all versions of dask and dask.distributed released in the last 12 months (but no
-    older than 2.9)
-  - all versions of other packages released in the last 12 months
-
-  All are  up from 6 months (:issue:`4295`)
-  `Guido Imperiale <https://github.com/crusaderky>`_.
 - Use a wrapped array's ``_repr_inline_`` method to construct the collapsed ``repr``
   of :py:class:`DataArray` and :py:class:`Dataset` objects and
   document the new method in :doc:`internals`. (:pull:`4248`).
   By `Justus Magin <https://github.com/keewis>`_.
-- :py:func:`open_dataset` and :py:func:`open_mfdataset`
-  now works with ``engine="zarr"`` (:issue:`3668`, :pull:`4003`, :pull:`4187`).
-  By `Miguel Jimenez <https://github.com/Mikejmnez>`_ and `Wei Ji Leong <https://github.com/weiji14>`_.
 - Add support for parsing datetime strings formatted following the default
   string representation of cftime objects, i.e. YYYY-MM-DD hh:mm:ss, in
   partial datetime string indexing, as well as :py:meth:`~xarray.cftime_range`
@@ -65,12 +109,18 @@ New Features
   now accept a ``dim_order`` parameter allowing to specify the resulting dataframe's
   dimensions order (:issue:`4331`, :pull:`4333`).
   By `Thomas Zilio <https://github.com/thomas-z>`_.
+- Allow per-variable fill values in most functions. (:pull:`4237`).
+  By `Justus Magin <https://github.com/keewis>`_.
 - Expose ``use_cftime`` option in :py:func:`~xarray.open_zarr` (:issue:`2886`, :pull:`3229`)
   By `Samnan Rahee <https://github.com/Geektrovert>`_ and `Anderson Banihirwe <https://github.com/andersy005>`_.
 
 
 Bug fixes
 ~~~~~~~~~
+
+- Fix indexing with datetime64 scalars with pandas 1.1 (:issue:`4283`).
+  By `Stephan Hoyer <https://github.com/shoyer>`_ and
+  `Justus Magin <https://github.com/keewis>`_.
 - Variables which are chunked using dask only along some dimensions can be chunked while storing with zarr along previously
   unchunked dimensions (:pull:`4312`) By `Tobias Kölling <https://github.com/d70-t>`_.
 - Fixed a bug in backend caused by basic installation of Dask (:issue:`4164`, :pull:`4318`)
@@ -80,7 +130,7 @@ Bug fixes
   and :py:meth:`DataArray.str.wrap` (:issue:`4334`). By `Mathias Hauser <https://github.com/mathause>`_.
 - Fixed overflow issue causing incorrect results in computing means of :py:class:`cftime.datetime`
   arrays (:issue:`4341`). By `Spencer Clark <https://github.com/spencerkclark>`_.
-- Fixed :py:meth:`Dataset.coarsen`, :py:meth:`DataArray.coarsen` dropping attributes on original object (:issue:`4120`, :pull:`4360`). by `Julia Kent <https://github.com/jukent>`_.
+- Fixed :py:meth:`Dataset.coarsen`, :py:meth:`DataArray.coarsen` dropping attributes on original object (:issue:`4120`, :pull:`4360`). By `Julia Kent <https://github.com/jukent>`_.
 - fix the signature of the plot methods. (:pull:`4359`) By `Justus Magin <https://github.com/keewis>`_.
 - Fix :py:func:`xarray.apply_ufunc` with ``vectorize=True`` and ``exclude_dims`` (:issue:`3890`).
   By `Mathias Hauser <https://github.com/mathause>`_.
@@ -89,9 +139,15 @@ Bug fixes
   By `Jens Svensmark <https://github.com/jenssss>`_
 - Fix incorrect legend labels for :py:meth:`Dataset.plot.scatter` (:issue:`4126`).
   By `Peter Hausamann <https://github.com/phausamann>`_.
-- Fix indexing with datetime64 scalars with pandas 1.1 (:issue:`4283`).
-  By `Stephan Hoyer <https://github.com/shoyer>`_ and
-  `Justus Magin <https://github.com/keewis>`_.
+- Fix ``dask.optimize`` on ``DataArray`` producing an invalid Dask task graph (:issue:`3698`)
+  By `Tom Augspurger <https://github.com/TomAugspurger>`_
+- Fix ``pip install .`` when no ``.git`` directory exists; namely when the xarray source
+  directory has been rsync'ed by PyCharm Professional for a remote deployment over SSH.
+  By `Guido Imperiale <https://github.com/crusaderky>`_
+- Preserve dimension and coordinate order during :py:func:`xarray.concat` (:issue:`2811`, :issue:`4072`, :pull:`4419`).
+  By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
+- Avoid relying on :py:class:`set` objects for the ordering of the coordinates (:pull:`4409`)
+  By `Justus Magin <https://github.com/keewis>`_.
 
 Documentation
 ~~~~~~~~~~~~~
@@ -101,15 +157,28 @@ Documentation
 - Removed skipna argument from :py:meth:`DataArray.count`, :py:meth:`DataArray.any`, :py:meth:`DataArray.all`. (:issue:`755`)
   By `Sander van Rijn <https://github.com/sjvrijn>`_
 - Update the contributing guide to use merges instead of rebasing and state
-  that we squash-merge. (:pull:`4355`) By `Justus Magin <https://github.com/keewis>`_.
+  that we squash-merge. (:pull:`4355`). By `Justus Magin <https://github.com/keewis>`_.
+- Make sure the examples from the docstrings actually work (:pull:`4408`).
+  By `Justus Magin <https://github.com/keewis>`_.
+- Updated Vectorized Indexing to a clearer example.
+  By `Maximilian Roos <https://github.com/max-sixty>`_
 
 Internal Changes
 ~~~~~~~~~~~~~~~~
+
+- Fixed all doctests and enabled their running in CI.
+  By `Justus Magin <https://github.com/keewis>`_.
+- Relaxed the :ref:`mindeps_policy` to support:
+
+  - all versions of setuptools released in the last 42 months (but no older than 38.4)
+  - all versions of dask and dask.distributed released in the last 12 months (but no
+    older than 2.9)
+  - all versions of other packages released in the last 12 months
+
+  All are  up from 6 months (:issue:`4295`)
+  `Guido Imperiale <https://github.com/crusaderky>`_.
 - Use :py:func:`dask.array.apply_gufunc` instead of :py:func:`dask.array.blockwise` in
   :py:func:`xarray.apply_ufunc` when using ``dask='parallelized'``. (:pull:`4060`, :pull:`4391`, :pull:`4392`)
-- Fix ``pip install .`` when no ``.git`` directory exists; namely when the xarray source
-  directory has been rsync'ed by PyCharm Professional for a remote deployment over SSH.
-  By `Guido Imperiale <https://github.com/crusaderky>`_
 - Align ``mypy`` versions to ``0.782`` across ``requirements`` and
   ``.pre-commit-config.yml`` files. (:pull:`4390`)
   By `Maximilian Roos <https://github.com/max-sixty>`_

diff --git a/xarray/core/combine.py b/xarray/core/combine.py
@@ -711,8 +711,8 @@ def combine_by_coords(
     <xarray.Dataset>
     Dimensions:        (x: 3, y: 4)
     Coordinates:
-      * x              (x) int64 10 20 30
       * y              (y) int64 0 1 2 3
+      * x              (x) int64 10 20 30
     Data variables:
         temperature    (y, x) float64 10.98 14.3 12.06 10.9 ... 1.743 0.4044 16.65
         precipitation  (y, x) float64 0.4376 0.8918 0.9637 ... 0.7992 0.4615 0.7805

diff --git a/xarray/core/concat.py b/xarray/core/concat.py
@@ -349,7 +349,11 @@ def _parse_datasets(
         all_coord_names.update(ds.coords)
         data_vars.update(ds.data_vars)
 
-        for dim in set(ds.dims) - dims:
+        # preserves ordering of dimensions
+        for dim in ds.dims:
+            if dim in dims:
+                continue
+
             if dim not in dim_coords:
                 dim_coords[dim] = ds.coords[dim].variable
         dims = dims | set(ds.dims)
@@ -459,6 +463,9 @@ def ensure_common_dims(vars):
             combined = concat_vars(vars, dim, positions)
             assert isinstance(combined, Variable)
             result_vars[k] = combined
+        elif k in result_vars:
+            # preserves original variable order
+            result_vars[k] = result_vars.pop(k)
 
     result = Dataset(result_vars, attrs=result_attrs)
     absent_coord_names = coord_names - set(result.variables)

diff --git a/xarray/core/coordinates.py b/xarray/core/coordinates.py
@@ -215,7 +215,9 @@ def __getitem__(self, key: Hashable) -> "DataArray":
 
     def to_dataset(self) -> "Dataset":
         """Convert these coordinates into a new Dataset"""
-        return self._data._copy_listed(self._names)
+
+        names = [name for name in self._data._variables if name in self._names]
+        return self._data._copy_listed(names)
 
     def _update_coords(
         self, coords: Dict[Hashable, Variable], indexes: Mapping[Hashable, pd.Index]