Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py38-upstream-dev failure: 'CFTimeIndex' object has no attribute '_id' #4516

Closed
mathause opened this issue Oct 16, 2020 · 6 comments
Closed

Comments

@mathause
Copy link
Collaborator

Now that #4502 was fixed upstream we get a new issue in py38-upstream-dev (which I am pretty sure is unrelated as dask is not involved). xarray/tests/test_cftimeindex_resample.py::test_resample fails with:

AttributeError: 'CFTimeIndex' object has no attribute '_id'

See: https://dev.azure.com/xarray/xarray/_build/results?buildId=4038&view=logs&jobId=603f3fdc-8af6-5e0a-f594-fa71bc949352&j=603f3fdc-8af6-5e0a-f594-fa71bc949352&t=51624cf6-228d-5319-1d6f-8cd30bcca2e7

The failure did not happen on the 14. Oct. (db4f03e the failures here are #4502) but it appeared on the 15. Oct. (1553749)

Maybe a change in pandas? I have not looked at it closely - maybe @spencerkclark sees what's going on?

@mathause
Copy link
Collaborator Author

pandas-dev/pandas#37087 recently touched Index._id but doesn't seem to change its functionality so might not be the cause.

@spencerkclark
Copy link
Member

spencerkclark commented Oct 17, 2020

Indeed a bisect reveals pandas-dev/pandas#37087 was the cause:

abd3acf05516611e9e90d57ae363f1567e30f49a is the first bad commit
commit abd3acf05516611e9e90d57ae363f1567e30f49a
Author: Terji Petersen <[email protected]>
Date:   Wed Oct 14 13:26:45 2020 +0100

    CLN: clean Index._id (#37087)

 pandas/core/indexes/base.py            | 21 +++++++++++++--------
 pandas/core/indexes/multi.py           |  4 +++-
 pandas/tests/arithmetic/test_object.py |  3 ++-
 3 files changed, 18 insertions(+), 10 deletions(-)

I think it's that _id used to automatically be initialized to None and now it needs to be explicitly initialized within the constructor. They added some try-except logic in the is_ method to handle the case where _id has not been initialized, but there's an error happening now within the view method. Here's a more minimal example:

In [1]: import xarray as xr

In [2]: xr.cftime_range("2000", periods=2).view()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-5226cd5e3bc1> in <module>
----> 1 xr.cftime_range("2000", periods=2).view()

~/Software/pandas/pandas/core/indexes/base.py in view(self, cls)
    630             result = self._shallow_copy()
    631         if isinstance(result, Index):
--> 632             result._id = self._id
    633         return result
    634

AttributeError: 'CFTimeIndex' object has no attribute '_id'

I'll have to think about things a little more to see if we should pursue an upstream fix or whether we can address it in xarray.

@spencerkclark
Copy link
Member

I posted a question in the pandas repo related to this: pandas-dev/pandas#37213.

@mathause mathause mentioned this issue Oct 19, 2020
2 tasks
@topper-123
Copy link

I’ve looked at the CFTimeIndex.__new__ class method, and recommend you add a line result._id = result._reset_id() there.

This will fix this bug, but should be done even if this particular error didn`t appear, because it will give a performance boost when comparing indexes.

@spencerkclark
Copy link
Member

Thanks @topper-123 -- I see you also made pandas-dev/pandas#37321, making Index._id optional again, which I appreciate!

We do rely on private attributes in some places, but we're typically cautious about doing that, because their behavior can change without warning. For instance, in this case pandas-dev/pandas#37087 would have been a breaking change in a different way, since _reset_identity went from returning an Index, to operating on the Index in place.

Looking at the history of pandas.Index, it generally seems like outside the recent change, _reset_identity has been pretty stable over the years, though we'd still need to mull over the trade-off between the performance benefit and the robustness of the subclass to changes in pandas.

@spencerkclark
Copy link
Member

Thanks to @topper-123's PR being merged upstream, this is now fixed. The next time our CI runs it should be green (I tested things out offline). If we want we can consider leveraging _id in CFTimeIndex later, but this particular issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants