Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Error raised accessing tuple value in Series when index contains duplicates #37800

Open
rhshadrach opened this issue Nov 12, 2020 · 5 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Series Series data structure

Comments

@rhshadrach
Copy link
Member

E.g.

s = pd.Series([1, 1], index=[(1, 1), (1, 1)])
s[(1, 1)]

raises KeyError: (1, 1). Within pandas._libs.index.IndexEngine._get_loc_duplicates, we're using ndarray.searchsorted which interprets the tuple as an array-like of values to search for rather than a single tuple.

@rhshadrach rhshadrach added Bug Indexing Related to indexing on series/frames, not to indexes themselves Series Series data structure labels Nov 12, 2020
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Nov 12, 2020
@phofl
Copy link
Member

phofl commented Nov 13, 2020

Similar problems with loc. Index does not have to be non-unique there

s = pd.Series([1, 1], index=[(1, 2), (1, 1)])

s.loc[(1, 2)]
pandas.core.indexing.IndexingError: Too many indexers

Tuple support is not very good here.

@GYHHAHA
Copy link
Contributor

GYHHAHA commented Nov 23, 2020

This is conflicting with the multiindex selector which uses tuple as the label for different levels.

>>>s = pd.Series([1, 1], index=pd.Index([(1, 2), (1, 1)]))
>>>s.loc[(1,2)]
1

Ambiguous when using tuple index.

@rhshadrach
Copy link
Member Author

@GYHHAHA what is ambiguous? When Series has a multi-index, a tuple consists of labels for different levels. When Series doesn't have a multi-index, a tuple is a single label. Or is there some other case(s)?

@GYHHAHA
Copy link
Contributor

GYHHAHA commented Nov 24, 2020

>>>s = pd.Series([1,2], index=pd.Index([((1,1),1),((1,2),2)]))
>>>s.loc[(1,1)]
1    1
dtype: int64

When tuple index is the zero level of multiindex.

But it seems still reasonable when applying to a non-multiindex.

@rhshadrach
Copy link
Member Author

Ack, I see. A similar example:

s = pd.Series([1, 2], index=pd.Index([((1, 1), 1), (1, 1)]))
print(s.loc[(1, 1)])

one might expect to get 2 rather than the actual output of 1. In any case, oddities in the case of nested tuples should not imply that we shouldn't make improvements on supporting tuples.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Series Series data structure
Projects
None yet
Development

No branches or pull requests

4 participants