-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible indexes: add Index base class and xindexes properties #5102
Conversation
This is indeed unfortunately a public API, so we should think about how to roll this out with minimal disruption. For example: maybe |
Yes we could make a special case for pandas indexes. This would also make the refactoring easier now since For example, it would be nice to move the logic implemented in |
Rather than I agree that switching the return type of To make development easier, I would suggest adding a new attribute to
|
Agreed for adding another property along with a couple of depreciation cycles for smooth transition on what is returned by
|
Use it internally instead of indexes
Also improve xarray_obj.indexes property implementation
This is ready for review! I implemented @shoyer's suggestions and finished updating the (many) places where Xarray directly uses pandas indexes. I also added a There might still be some quick fixes that we could clean up now, but I think that most things will have to be cleaned up later in the refactoring once additional features/classes/etc. are implemented. I haven't wrote tests for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great start! I have only a few minor concerns :)
Thanks for the review @shoyer, I addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments from someone who's excited for fast and lazy coords. :)
@property | ||
def shape(self) -> Tuple[int]: | ||
return (len(self.array),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These len() operations adds up and shape
is a very popular property in xarray code. I think you should consider caching the property with for example @pandas.util.cache_readonly
to speed it up quite a bit.
The cache would have to be cleared if self.array changed size after init, but should that even be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would probably be welcome indeed. Let's save this for later, we'll need to see if/how we can formalize an (optional) nd-array interface for any xarray.Index
(which would be required to reuse the index data as coordinate data like it's the case for pandas indexes but maybe other xarray indexes in the future).
The cache would have to be cleared if self.array changed size after init, but should that even be allowed?
That should not be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
|
||
return result | ||
|
||
def transpose(self, order) -> pd.Index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this not return a PandasIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That's something I overlooked, probably among other things. That would indeed make sense to return a PandasIndex
, although it might not be necessary (at least for now) as the returned pd.Index
is later converted into a PandasIndex
elsewhere (e.g., like in Variable.transpose
). That might even not be desirable as currently (if I'm correct) creating a new PandasIndex
from a PandasIndex
re-builds the whole underlying pd.Index
. Or alternatively we need a fastpath creation for this specific case.
Should we merge this? In follow-up PRs, I plan to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- let's merge this and do improvements / further work in follow-up PRs :)
(Feel free to self-merge after fixing the merge conflict! My suggested fix can be done later, I don't want this to block you) |
All right let's merge this! Thanks everyone for your review comments. |
* upstream/master: combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041) Explained what a deprecation cycle is (pydata#5289) Code cleanup (pydata#5234) FacetGrid docstrings (pydata#5293) Add whats new for dataset interpolation with non-numerics (pydata#5297) Allow dataset interpolation with different datatypes (pydata#5008) Flexible indexes: add Index base class and xindexes properties (pydata#5102) pre-commit: autoupdate hook versions (pydata#5280) convert the examples for apply_ufunc to doctest (pydata#5279) fix the new whatsnew section Ensure `HighLevelGraph` layers are `Layer` instances (pydata#5271)
* upstream/master: (23 commits) combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041) Explained what a deprecation cycle is (pydata#5289) Code cleanup (pydata#5234) FacetGrid docstrings (pydata#5293) Add whats new for dataset interpolation with non-numerics (pydata#5297) Allow dataset interpolation with different datatypes (pydata#5008) Flexible indexes: add Index base class and xindexes properties (pydata#5102) pre-commit: autoupdate hook versions (pydata#5280) convert the examples for apply_ufunc to doctest (pydata#5279) fix the new whatsnew section Ensure `HighLevelGraph` layers are `Layer` instances (pydata#5271) New whatsnew section Release-workflow: Bug fix (pydata#5273) more maintenance on whats-new.rst (pydata#5272) v0.18.0 release highlights (pydata#5266) Fix exception when display_expand_data=False for file-backed array. (pydata#5235) Warn ignored keep attrs (pydata#5265) Disable workflows on forks (pydata#5267) fix the built wheel test (pydata#5270) pypi upload workflow maintenance (pydata#5269) ...
…e_units * upstream/master: combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041) Explained what a deprecation cycle is (pydata#5289) Code cleanup (pydata#5234) FacetGrid docstrings (pydata#5293) Add whats new for dataset interpolation with non-numerics (pydata#5297) Allow dataset interpolation with different datatypes (pydata#5008) Flexible indexes: add Index base class and xindexes properties (pydata#5102)
Revert some changes made in pydata#5102 + additional (temporary) fixes.
* split index / coordinate variable(s) - Pass Variable objects to xarray.Index constructor - The index should create IndexVariable objects (`coords` attribute) - PandasIndex: IndexVariable wraps PandasIndexingAdpater wraps pd.Index * one PandasIndexingAdapter subclass for multiindex * fastpath Index init + from_pandas_index classmethods * use classmethod constructors instead * add Index.copy and Index.__getitem__ methods * wip: clean-up Revert some changes made in #5102 + additional (temporary) fixes. * clean-up * add PandasIndex and PandasMultiIndex tests * remove unused import * doc: update what's new * use xindexes in map_blocks + temp fix Dataset constructor doesn't accept xarray indexes yet. Create new coordinates from the underlying pandas indexes. * update what's new with #5670 * typo
This PR clears up the path for flexible indexes:
IndexAdapter
Index
base class that is meant to be inherited by all xarray-compatible indexes (built-in or 3rd-party)PandasIndexAdapter
now inherits fromIndexAdapter
Index
xarray_obj.xindexes
properties returnIndex
(PandasIndexAdapter
) instances.xarray_obj.indexes
properties still returnpandas.Index
instances.The latter is a breaking change, although I'm not sure if theindexes
property has been made public yet.This is still work in progress, there are many broken tests that are not fixed yet. (EDIT: all tests should be fixed now).
There's a lot of dirty fixes to avoid circular dependencies and in the many places where we still need direct access to the
pandas.Index
objects, but I'd expect that these will be cleaned-up further in the refactoring.