Flexible indexes: add Index base class and xindexes properties #5102

benbovy · 2021-04-02T16:18:07Z

This PR clears up the path for flexible indexes:

it adds a new ~~IndexAdapter~~ Index base class that is meant to be inherited by all xarray-compatible indexes (built-in or 3rd-party)
PandasIndexAdapter now inherits from ~~IndexAdapter~~ Index
the xarray_obj.xindexes properties return Index (PandasIndexAdapter) instances. xarray_obj.indexes properties still return pandas.Index instances.

~~The latter is a breaking change, although I'm not sure if the indexes property has been made public yet.~~

This is still work in progress, there are many broken tests that are not fixed yet. (EDIT: all tests should be fixed now).

There's a lot of dirty fixes to avoid circular dependencies and in the many places where we still need direct access to the pandas.Index objects, but I'd expect that these will be cleaned-up further in the refactoring.

shoyer · 2021-04-02T23:30:53Z

the xarray_obj.indexes properties now returns IndexAdapter (PandasIndexAdapter) instances instead of pandas.Index instances

The latter is a breaking change, although I'm not sure if the indexes property has been made public yet.

This is indeed unfortunately a public API, so we should think about how to roll this out with minimal disruption.

For example: maybe .indexes should continue to return pandas.Index objects for now, by unwrapped IndexAdapters sorted in ._indexes?

benbovy · 2021-04-03T08:46:23Z

maybe .indexes should continue to return pandas.Index objects for now, by unwrapped IndexAdapters sorted in ._indexes?

Yes we could make a special case for pandas indexes. This would also make the refactoring easier now since .indexes is used internally in quite many places that expect pandas.Index objects. Not sure if it's a good solution in the mid/long term, though.

For example, it would be nice to move the logic implemented in convert_label_indexer into PandasIndexAdapter and PandasMultiIndexAdapter classes and call methods of those classes instead of dealing directly with pandas index objects. For this example (and perhaps others) we could use _indexes internally. Alternatively it might make sense to have an .index_adapters property to make things a bit clearer (this may be welcome even for internal purpose IMO).

shoyer · 2021-04-04T19:56:51Z

Rather than xarray.IndexAdapter, maybe we should just call this new object xarray.Index? Calling this object an "adapter" diminishing its importance in Xarray's future API.

I agree that switching the return type of .indexes is probably worthy of a breaking change -- but that breaking change should be done intentionally, once the new indexing functionality works and we are ready to make a major release. We may also want a deprecation cycle. What we don't want to do is change things in an incomplete way now, in a way that makes it hard for us to issue a bug-fix release.

To make development easier, I would suggest adding a new attribute to xarray.Dataset and xarray.DataArray that exposes the new data model, e.g., perhaps .xindexes as short for "xarray indexes". We would then:

Immediately switch xarray to use .xindexes instead of .indexes internally.
Once the new indexing functionality is ready, encourage users to gradually switch from .indexes -> .xindexes by issuing a FutureWarning warning.
After an appropriate period of time, consider making .indexes an alias for .xindexes in a breaking release.

benbovy · 2021-04-06T11:09:38Z

Agreed for adding another property along with a couple of depreciation cycles for smooth transition on what is returned by .indexes. I've suggested .index_adapters in my previous comment but I prefer .xindexes.

xarray.Index makes sense too. For subclasses wrapping another index class perhaps it might be still be relevant to use the Adapter suffix, for example to distinguish between a pandas.Index object and a xarray.PandasIndexAdapter object.

pep8speaks · 2021-04-29T14:13:05Z

Hello @benbovy! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-10 15:16:46 UTC

Use it internally instead of indexes

Also improve xarray_obj.indexes property implementation

benbovy · 2021-04-30T14:53:12Z

This is ready for review!

I implemented @shoyer's suggestions and finished updating the (many) places where Xarray directly uses pandas indexes.

I also added a to_pandas_index() method to the Index base class so that we can easily raise when an Xarray operation doesn't support flexible indexes (i.e., indexes that cannot be cast to a pd.Index).

There might still be some quick fixes that we could clean up now, but I think that most things will have to be cleaned up later in the refactoring once additional features/classes/etc. are implemented.

I haven't wrote tests for the Index base class yet as this is still very much preliminary work, I plan to do it in follow-up PRs once the API becomes more stable.

shoyer

great start! I have only a few minor concerns :)

xarray/core/dataarray.py

xarray/core/dataset.py

xarray/core/indexes.py

xarray/tests/test_dataarray.py

xarray/core/indexes.py

benbovy · 2021-05-04T09:01:10Z

Thanks for the review @shoyer, I addressed your comments.

Illviljan

Some minor comments from someone who's excited for fast and lazy coords. :)

xarray/core/indexes.py

Illviljan · 2021-05-04T16:51:11Z

xarray/core/indexes.py

+    @property
+    def shape(self) -> Tuple[int]:
+        return (len(self.array),)


These len() operations adds up and shape is a very popular property in xarray code. I think you should consider caching the property with for example @pandas.util.cache_readonly to speed it up quite a bit.

The cache would have to be cleared if self.array changed size after init, but should that even be allowed?

That would probably be welcome indeed. Let's save this for later, we'll need to see if/how we can formalize an (optional) nd-array interface for any xarray.Index (which would be required to reuse the index data as coordinate data like it's the case for pandas indexes but maybe other xarray indexes in the future).

The cache would have to be cleared if self.array changed size after init, but should that even be allowed?

That should not be allowed.

xarray/core/indexes.py

mathause

Looks good.

xarray/core/dataset.py

xarray/core/dataarray.py

mathause · 2021-05-04T17:30:10Z

xarray/core/indexes.py

+
+        return result
+
+    def transpose(self, order) -> pd.Index:


Why does this not return a PandasIndex?

Good catch. That's something I overlooked, probably among other things. That would indeed make sense to return a PandasIndex, although it might not be necessary (at least for now) as the returned pd.Index is later converted into a PandasIndex elsewhere (e.g., like in Variable.transpose). That might even not be desirable as currently (if I'm correct) creating a new PandasIndex from a PandasIndex re-builds the whole underlying pd.Index. Or alternatively we need a fastpath creation for this specific case.

benbovy · 2021-05-10T07:12:36Z

Should we merge this?

In follow-up PRs, I plan to:

Create a PandasMultiIndex class and refactor all places in Xarray that rely on pd.MultiIndex (internal changes only).
Add public API to Xarray Index for label-based data selection and refactor/move the logic in convert_label_indexer into the Xarray PandasIndex and PandasMultiIndex classes.

shoyer

I agree -- let's merge this and do improvements / further work in follow-up PRs :)

doc/whats-new.rst

xarray/core/indexes.py

shoyer · 2021-05-10T07:49:58Z

(Feel free to self-merge after fixing the merge conflict! My suggested fix can be done later, I don't want this to block you)

benbovy · 2021-05-11T08:21:03Z

All right let's merge this! Thanks everyone for your review comments.

* upstream/master: combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041) Explained what a deprecation cycle is (pydata#5289) Code cleanup (pydata#5234) FacetGrid docstrings (pydata#5293) Add whats new for dataset interpolation with non-numerics (pydata#5297) Allow dataset interpolation with different datatypes (pydata#5008) Flexible indexes: add Index base class and xindexes properties (pydata#5102) pre-commit: autoupdate hook versions (pydata#5280) convert the examples for apply_ufunc to doctest (pydata#5279) fix the new whatsnew section Ensure `HighLevelGraph` layers are `Layer` instances (pydata#5271)

* upstream/master: (23 commits) combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041) Explained what a deprecation cycle is (pydata#5289) Code cleanup (pydata#5234) FacetGrid docstrings (pydata#5293) Add whats new for dataset interpolation with non-numerics (pydata#5297) Allow dataset interpolation with different datatypes (pydata#5008) Flexible indexes: add Index base class and xindexes properties (pydata#5102) pre-commit: autoupdate hook versions (pydata#5280) convert the examples for apply_ufunc to doctest (pydata#5279) fix the new whatsnew section Ensure `HighLevelGraph` layers are `Layer` instances (pydata#5271) New whatsnew section Release-workflow: Bug fix (pydata#5273) more maintenance on whats-new.rst (pydata#5272) v0.18.0 release highlights (pydata#5266) Fix exception when display_expand_data=False for file-backed array. (pydata#5235) Warn ignored keep attrs (pydata#5265) Disable workflows on forks (pydata#5267) fix the built wheel test (pydata#5270) pypi upload workflow maintenance (pydata#5269) ...

…e_units * upstream/master: combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041) Explained what a deprecation cycle is (pydata#5289) Code cleanup (pydata#5234) FacetGrid docstrings (pydata#5293) Add whats new for dataset interpolation with non-numerics (pydata#5297) Allow dataset interpolation with different datatypes (pydata#5008) Flexible indexes: add Index base class and xindexes properties (pydata#5102)

Revert some changes made in pydata#5102 + additional (temporary) fixes.

* split index / coordinate variable(s) - Pass Variable objects to xarray.Index constructor - The index should create IndexVariable objects (`coords` attribute) - PandasIndex: IndexVariable wraps PandasIndexingAdpater wraps pd.Index * one PandasIndexingAdapter subclass for multiindex * fastpath Index init + from_pandas_index classmethods * use classmethod constructors instead * add Index.copy and Index.__getitem__ methods * wip: clean-up Revert some changes made in #5102 + additional (temporary) fixes. * clean-up * add PandasIndex and PandasMultiIndex tests * remove unused import * doc: update what's new * use xindexes in map_blocks + temp fix Dataset constructor doesn't accept xarray indexes yet. Create new coordinates from the underlying pandas indexes. * update what's new with #5670 * typo

benbovy added 2 commits April 2, 2021 10:42

add IndexAdapter class + move PandasIndexAdapter

face5db

wip: xarray_obj.indexes -> IndexAdapter objects

ce3e185

spencerkclark mentioned this pull request Apr 4, 2021

Converting cftime.datetime objects to np.datetime64 values through astype #5107

Open

dcherian added the grant-czi label Apr 19, 2021

fix more broken tests

7dd1d0e

benbovy marked this pull request as ready for review April 29, 2021 14:09

benbovy marked this pull request as draft April 29, 2021 14:10

Merge branch 'master' into index-adapter-base-classes

16dc836

benbovy added 6 commits April 29, 2021 16:24

fix merge glitch

7b3e39c

fix group bins tests

c1ecd49

add xindexes property

51074d8

Use it internally instead of indexes

rename IndexAdapter -> Index

89e018a

rename _to_index_adpater (typo) -> _to_xindex

c8a5dd8

add Index.to_pandas_index() method

c492e3e

Also improve xarray_obj.indexes property implementation

benbovy marked this pull request as ready for review April 30, 2021 14:36

benbovy changed the title ~~Flexible indexes: add and use IndexAdapter base class~~ Flexible indexes: add Index base class and xindexes properties Apr 30, 2021

shoyer reviewed May 1, 2021

View reviewed changes

djhoese mentioned this pull request May 1, 2021

Checkout pyresample xarray-contrib/xoak#14

Open

benbovy added 6 commits May 3, 2021 17:33

rename PandasIndexAdpater -> PandasIndex

39e7842

update index type in tests

44230cc

ensure .indexes only returns pd.Index objects

6f2cd91

PandasIndex: normalize other index in cmp funcs

c3a2d60

Merge branch 'master' into index-adapter-base-classes

3f1be92

fix merge lint errors

f8b8ff4

[skip-ci] add TODO comment about index sizes

e25348e

benbovy mentioned this pull request May 4, 2021

Alignment with tolerance2 #4489

Open

5 tasks

Illviljan reviewed May 4, 2021

View reviewed changes

shoyer approved these changes May 4, 2021

View reviewed changes

xarray/core/indexes.py Outdated Show resolved Hide resolved

mathause reviewed May 4, 2021

View reviewed changes

benbovy added 2 commits May 4, 2021 22:12

address more PR comments

b8f5de8

[skip-ci] update what's new

5ee8307

shoyer approved these changes May 10, 2021

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

xarray/core/indexes.py Outdated Show resolved Hide resolved

benbovy added 3 commits May 10, 2021 17:06

Merge branch 'master' into index-adapter-base-classes

fc06f96

fix coord_names normalization

ec0a2d6

move what's new entry to unreleased section

ce59dec

benbovy merged commit 6e14df6 into pydata:master May 11, 2021

Illviljan mentioned this pull request May 13, 2021

Allow dataset interpolation with different datatypes #5008

Merged

4 tasks

benbovy mentioned this pull request May 17, 2021

Internal refactor of label-based data selection #5322

Merged

benbovy mentioned this pull request May 27, 2021

Regression: "ValueError: cannot unstack dimensions that do not have a MultiIndex" when unstacking a MultiIndex #5384

Closed

benbovy mentioned this pull request Jun 7, 2021

Idea: functionally-derived non-dimensional coordinates #3620

Open

benbovy mentioned this pull request Jul 26, 2021

Refactor index vs. coordinate variable(s) #5636

Merged

4 tasks

benbovy added a commit to benbovy/xarray that referenced this pull request Jul 29, 2021

wip: clean-up

84cbf15

Revert some changes made in pydata#5102 + additional (temporary) fixes.

benbovy deleted the index-adapter-base-classes branch March 29, 2022 07:10

snowman2 mentioned this pull request Jun 7, 2022

custom numpy dtypes geoxarray/geoxarray#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible indexes: add Index base class and xindexes properties #5102

Flexible indexes: add Index base class and xindexes properties #5102

benbovy commented Apr 2, 2021 •

edited

Loading

shoyer commented Apr 2, 2021

benbovy commented Apr 3, 2021

shoyer commented Apr 4, 2021

benbovy commented Apr 6, 2021

pep8speaks commented Apr 29, 2021 •

edited

Loading

benbovy commented Apr 30, 2021

shoyer left a comment

benbovy commented May 4, 2021

Illviljan left a comment

Illviljan May 4, 2021

benbovy May 4, 2021

mathause left a comment

mathause May 4, 2021

benbovy May 4, 2021 •

edited

Loading

benbovy commented May 10, 2021

shoyer left a comment

shoyer commented May 10, 2021

benbovy commented May 11, 2021

Flexible indexes: add Index base class and xindexes properties #5102

Flexible indexes: add Index base class and xindexes properties #5102

Conversation

benbovy commented Apr 2, 2021 • edited Loading

shoyer commented Apr 2, 2021

benbovy commented Apr 3, 2021

shoyer commented Apr 4, 2021

benbovy commented Apr 6, 2021

pep8speaks commented Apr 29, 2021 • edited Loading

Comment last updated at 2021-05-10 15:16:46 UTC

benbovy commented Apr 30, 2021

shoyer left a comment

Choose a reason for hiding this comment

benbovy commented May 4, 2021

Illviljan left a comment

Choose a reason for hiding this comment

Illviljan May 4, 2021

Choose a reason for hiding this comment

benbovy May 4, 2021

Choose a reason for hiding this comment

mathause left a comment

Choose a reason for hiding this comment

mathause May 4, 2021

Choose a reason for hiding this comment

benbovy May 4, 2021 • edited Loading

Choose a reason for hiding this comment

benbovy commented May 10, 2021

shoyer left a comment

Choose a reason for hiding this comment

shoyer commented May 10, 2021

benbovy commented May 11, 2021

benbovy commented Apr 2, 2021 •

edited

Loading

pep8speaks commented Apr 29, 2021 •

edited

Loading

benbovy May 4, 2021 •

edited

Loading