-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index([1,2,.np.nan]).get_indexer([np.nan]) returns wrong value? #7820
Comments
This is how |
@JanSchulz not sure if that answers your question, but maybe sheds some light on why you're getting this result |
Where is this failing for you? I get a failure during |
also i would expect |
Note that the above is run on top of #7768, which already has a fix for encoding In Categoricals, the order which is passed in to the index is enough (no sorting necessary), so IMO it should return the position of the In some other cases this actually happen when I use the internal cython code, e.g. in
So IMO a float index should also consider If NaN will be a proper value in a float index, I can use that. Otherwise I will probably ensure that I have a object index if NaN is in levels, because this works:
|
@JanSchulz where are you directly using |
Categorical does not use
|
Note that this bug is not so much about the Categorical useage, but about the inconsistency between Object and Float index, which handle IMO there ware three ways:
From my standpoint both the last two are fine: either I change the Categorical to change the levels to dtype object of a NaN is in levels or I don't have anything to do in Categorical :-) |
hmm. @cpcloud how tricky to change object dtypes are no longer used for |
Hm i'll see ... can take a look tonight |
@JanSchulz this ONLY works for a unique index and ONLY for object e.g. I think these could return the same result
These both don't know what to do (and so return -1)
So @JanSchulz still need to be wary of the multiple |
Uniqueness isn't a problem with (unique) Categorical levels. But why not do the same as with this:
|
since you cannot compare This is what @cpcloud will look at (for a single nan case it IS possible to get an indexer). |
NaNs are not well-behaved objects. They have the property that
I looked into the C code here to finally put my curiosity about /* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
} So in the case of
Finally, I should note that |
I still don't get what it would be so problematic to return the
|
If this is fixed, the workaround in categorical.py should be removed:
[This is in #8007, so not yet merged] |
yep, @JanSchulz put this on the list of future fixmes for categorical (create a new issue) |
Isn't it easier to add a todo to this issue that this is taken out when this issue is fixed? |
Seems someone already did in #7855 :-) So I think this is taken care of. |
I did that yes it's easier if their is a specifc issue involved the problem is that unless it's at the top it's not obvious and won't get done |
this is ok in master, just needs tests to close |
When this is merged, make the below change in Categorical as well!
Having
np.nan
in anIndex
is not returning the position of NaN but-1
:I'm not sure if that's a bug or intended. What happens is that this (new) test for Categoricals fails:
The text was updated successfully, but these errors were encountered: