Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: casting loc to labels dtype before searchsorted #14551

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pandas/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1907,6 +1907,7 @@ def convert_indexer(start, stop, step, indexer=indexer, labels=labels):
return np.array(labels == loc, dtype=bool)
else:
# sorted, so can return slice object -> view
loc = labels.dtype.type(loc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should do this like

loc, orig_loc = lables.dtype.type(loc), loc
if  loc != orig_loc:
    loc = orig_loc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am not sure how much checking would be needed here.

My understanding is that it probably is not needed in this case, as loc is coming from loc = level_index.get_loc(key), and labels and level_index are from the same MultiIndex. So I would assume that get_loc can only return an existing label, and so should fit in the dtype of labels?

(but probably also not that a perf issue to do the check)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds right; if the test suite passes, prob ok!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently not :-)
But it's only a partial indexing test that fails. So loc could also be a slice, and it is expected in partial indexing that this raises an error in searchsorted, but now it raises already a slightly different error in dtype.type(loc). So that can easily be solved with:

                try:
                    loc = labels.dtype.type(loc)
                except TypeError:
                    # this occurs when loc is a slice (partial string indexing)
                    # but the TypeError raised by searchsorted in this case
                    # is catched in Index._has_valid_type()
                    pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, or just tests if its a scalar to begin with

i = labels.searchsorted(loc, side='left')
j = labels.searchsorted(loc, side='right')
return slice(i, j)
Expand Down