-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: handle EAs and fast path (no bounds checking) in safe_sort #25696
CLN: handle EAs and fast path (no bounds checking) in safe_sort #25696
Conversation
Hello @jorisvandenbossche! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-05-06 18:52:38 UTC |
pandas/core/sorting.py
Outdated
@@ -425,6 +427,10 @@ def safe_sort(values, labels=None, na_sentinel=-1, assume_unique=False): | |||
assume_unique : bool, default False | |||
When True, ``values`` are assumed to be unique, which can speed up | |||
the calculation. Ignored when ``labels`` is None. | |||
check_outofbounds : bool, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a bad name but not consistent across pandas, we use verify elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update & add a versionadded tag
# deal with them here without performance loss using `mode='wrap'`.) | ||
new_labels = reverse_indexer.take(labels, mode='wrap') | ||
np.putmask(new_labels, mask, na_sentinel) | ||
if na_sentinel == -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather just fix take_1d
Codecov Report
@@ Coverage Diff @@
## master #25696 +/- ##
==========================================
+ Coverage 91.28% 91.28% +<.01%
==========================================
Files 173 173
Lines 52967 52969 +2
==========================================
+ Hits 48351 48353 +2
Misses 4616 4616
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25696 +/- ##
==========================================
- Coverage 91.98% 91.98% -0.01%
==========================================
Files 175 175
Lines 52374 52375 +1
==========================================
- Hits 48178 48175 -3
- Misses 4196 4200 +4
Continue to review full report at Codecov.
|
pandas/core/sorting.py
Outdated
@@ -425,6 +427,10 @@ def safe_sort(values, labels=None, na_sentinel=-1, assume_unique=False): | |||
assume_unique : bool, default False | |||
When True, ``values`` are assumed to be unique, which can speed up | |||
the calculation. Ignored when ``labels`` is None. | |||
check_outofbounds : bool, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update & add a versionadded tag
pandas/core/sorting.py
Outdated
@@ -461,7 +467,8 @@ def sort_mixed(values): | |||
return np.concatenate([nums, np.asarray(strs, dtype=object)]) | |||
|
|||
sorter = None | |||
if PY3 and lib.infer_dtype(values, skipna=False) == 'mixed-integer': | |||
if (PY3 and not isinstance(values, ABCExtensionArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah no more PY3 needed!
pandas/core/sorting.py
Outdated
@@ -461,7 +467,8 @@ def sort_mixed(values): | |||
return np.concatenate([nums, np.asarray(strs, dtype=object)]) | |||
|
|||
sorter = None | |||
if PY3 and lib.infer_dtype(values, skipna=False) == 'mixed-integer': | |||
if (PY3 and not isinstance(values, ABCExtensionArray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_extension_array
Coming back to this
For |
…s) + add whatsnew
can you merge master |
@jreback conflicts resolved, if you can take another look |
thanks @jorisvandenbossche nice cleanup & tests. |
This is a possible alternative solution to what we have been discussing in #25592.
This moves the logic into
safe_sort
, with:check_outofbounds
keyword to disable extra checks (otherwise the performance benefit oftake_1d
is lost)safe_sort
to work for EAsThe
check_outofbounds
make it a bit more complicated, but without it, we can't benefit of the performance improvement for whichtake_1d
was used originally infactorize
.(another solution is to simply decide that this performance improvement is not worth this extra code, and we simply use the current
safe_sort
(but fixed to work for EAs) infactorize
)Need to add some more tests for the combination of EAs with a custom na_sentinel (a case that is currently broken)