Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Series with CategoricalIndex, __getitem__ not equal to .loc #15470

Closed
kernc opened this issue Feb 21, 2017 · 3 comments · Fixed by #45023
Closed

On Series with CategoricalIndex, __getitem__ not equal to .loc #15470

kernc opened this issue Feb 21, 2017 · 3 comments · Fixed by #45023
Labels
Categorical Categorical Data Type Docs Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@kernc
Copy link
Contributor

kernc commented Feb 21, 2017

Code Sample, a copy-pastable example if possible

>>> s = pd.Series([2, 1, 0], index=pd.CategoricalIndex([2, 1, 0]))
>>> s.index
CategoricalIndex([2, 1, 0], categories=[0, 1, 2], ordered=False, dtype='category')
>>> s[[0, 1, 2]].equals(s.loc[[0, 1, 2]])  # Should be True
False
>>> s[[0, 1, 2]].equals(s.iloc[[0, 1, 2]])  # Should be False
True

Problem description

For Series with CategoricalIndex, __getitem__ indexing behaves differently than .loc. On other-indexed series, these accessors return the same result:

>>> s = pd.Series([2, 1, 0], index=[2, 1, 0])
>>> s[[0, 1, 2]].equals(s.loc[[0, 1, 2]])  # Should be True
True
>>> s[[0, 1, 2]].equals(s.iloc[[0, 1, 2]])  # Should be True
False

Expected Output

>>> s = pd.Series([2, 1, 0], index=pd.CategoricalIndex([2, 1, 0]))
>>> s[[0, 1, 2]].equals(s.loc[[0, 1, 2]])
True

Output of pd.show_versions()

0.19.0+479.git

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 21, 2017

You have encountered a specific gotcha of getitem indexing. When you have a numerical index, it is label-based indexing (the second example you give). But for other types of indexes, it is positional indexing (eg try with strings).
The questions of course here is then: is a Categorical index with numerical categories a numerical index or a 'other type' of index?

If we say it is not a numerical index (which I think we should do, as the way of indexing of a CategoricalIndex should IMO not depend on the type of the categories), the current behaviour is correct. But I admit this is rather confusing ..

@jorisvandenbossche jorisvandenbossche added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 21, 2017
@jorisvandenbossche
Copy link
Member

For some more discussion on getitem indexing, see #9595. The issue you raise here is yet another example of where the behaviour of getitem can be confusing/unpredictable due to the mixing of both label-based and positional indexing.

@jreback jreback added API Design Categorical Categorical Data Type labels Mar 7, 2017
@jreback jreback added this to the Next Major Release milestone Mar 7, 2017
@jorisvandenbossche
Copy link
Member

It seems that this issue only occurs for list-like indexers, as a scalar label works correctly. With the example of above:

In [53]: s[0]
Out[53]: 0    <---- label based

In [54]: s[[0]]
Out[54]: 
2    2      <---- position based
dtype: int64

In [55]: s.loc[[0]]
Out[55]: 
0    0
dtype: int64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
4 participants