-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR MultiIndex.is_lexsorted and MultiIndex.lexsort_depth #38701
Conversation
b758956
to
c9d1616
Compare
c9d1616
to
b5d157a
Compare
|
pandas/core/indexes/multi.py
Outdated
return self._get_lexsort_depth() | ||
|
||
@cache_readonly | ||
def _get_lexsort_depth(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's already a _lexsort_depth
function, so am replacing this with _get_lexsort_depth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm that is extra confusing. pls don't do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just move the .sortorder stuff to _lexsort_depth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review - the difficulty I'm running into is that in pandas/core/indexes/multi.py
there's
if self.sortorder is not None:
if self.sortorder > self._lexsort_depth():
raise ValueError(
"Value for sortorder must be inferior or equal to actual "
f"lexsort_depth: sortorder {self.sortorder} "
f"with lexsort_depth {self._lexsort_depth()}"
)
so I can't just move self.sortorder
into self._lexsort_depth
else this will never raise, as self._lexsort_depth()
would always just return self.sortorder
in this snippet.
For now I've factored out the part which is used in the above check into
def _codes_lexsort_depth(self) -> int:
int64_codes = [ensure_int64(level_codes) for level_codes in self.codes]
for k in range(self.nlevels, 0, -1):
if libalgos.is_lexsorted(int64_codes[:k]):
return k
return 0
but I'd imagine this will be considered equally confusing. Any suggestions?
pandas/core/indexes/multi.py
Outdated
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c'], ['d', 'e', 'f']]).is_lexsorted() | ||
>>> pd.MultiIndex.from_arrays([['a', 'b'], ['d', 'e']])._is_lexsorted() | ||
True | ||
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c'], ['d', 'f', 'e']]).is_lexsorted() | ||
>>> pd.MultiIndex.from_arrays([['a', 'b'], ['d', 'f']])._is_lexsorted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making these a bit shorter so they fit on a single line
Some of the CI failures caused by #38703 |
I haven't checked yet, but thanks for looking into it! |
pandas/core/indexes/multi.py
Outdated
return self._get_lexsort_depth() | ||
|
||
@cache_readonly | ||
def _get_lexsort_depth(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm that is extra confusing. pls don't do this
pandas/core/indexes/multi.py
Outdated
return self._get_lexsort_depth() | ||
|
||
@cache_readonly | ||
def _get_lexsort_depth(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just move the .sortorder stuff to _lexsort_depth
@@ -35,15 +35,15 @@ def test_sort_index_and_reconstruction_doc_example(self): | |||
), | |||
) | |||
result = df.sort_index() | |||
assert result.index.is_lexsorted() | |||
assert result.index._is_lexsorted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the cases where these are the same as is_monotonic (vast majority), ok with removing the _is_lexsorted assert
|
||
@cache_readonly | ||
def lexsort_depth(self): | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metion in the whatsnew as well
doc/source/whatsnew/v0.20.0.rst
Outdated
value | ||
a aa 2 | ||
bb 1 | ||
b aa 4 | ||
bb 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something went wrong with pasting the indents here, will fix it in the next commit
pandas/core/indexes/multi.py
Outdated
return self.sortorder | ||
return self._codes_lexsort_depth() | ||
|
||
def _codes_lexsort_depth(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very confusing, why do you think you need yet another method here? we want to reduce the api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I totally agree - see this comment for why I thought it was necessary: #38701 (comment) :
the difficulty I'm running into is that in
pandas/core/indexes/multi.py
there'sif self.sortorder is not None: if self.sortorder > self._lexsort_depth(): raise ValueError( "Value for sortorder must be inferior or equal to actual " f"lexsort_depth: sortorder {self.sortorder} " f"with lexsort_depth {self._lexsort_depth()}" )so I can't just move
self.sortorder
intoself._lexsort_depth
else this will never raise, asself._lexsort_depth()
would always just returnself.sortorder
in this snippet.
Would it be less confusing if this was a module-level function? There must be some way of calling only this part of _lexsort_depth
because it's used in _verify_integrity
from pandas/core/indexes/multi.py :
pandas/pandas/core/indexes/multi.py
Lines 393 to 399 in d642b67
if self.sortorder is not None: | |
if self.sortorder > self._lexsort_depth(): | |
raise ValueError( | |
"Value for sortorder must be inferior or equal to actual " | |
f"lexsort_depth: sortorder {self.sortorder} " | |
f"with lexsort_depth {self._lexsort_depth()}" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it a module-level function, is this clearer?
return self.sortorder | ||
|
||
return self._lexsort_depth() | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the cache on this one (as its already on _lexsort_depth) and this is now user facing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, ping on green.
pandas/core/indexes/multi.py
Outdated
@@ -176,6 +176,15 @@ def new_meth(self_or_cls, *args, **kwargs): | |||
return new_meth | |||
|
|||
|
|||
def _lexsort_depth(codes: List[np.ndarray], nlevels: int) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to almost the end of the file, e.g. where the module level functions are
thanks @MarcoGorelli |
|
||
return self._lexsort_depth() | ||
warnings.warn( | ||
"MultiIndex.is_lexsorted is deprecated as a public function, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli should the message here refer to lexsort_depth instead of is_lexsorted?
this warning isn't being caught in a bunch of tests; not sure why that isn't caught by the npdev build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, looks like a typo, thanks for catching - we probably want to always use match=
when catching warning to help prevent this, i'll fix this now
this warning isn't being caught in a bunch of tests; not sure why that isn't caught by the npdev build
not sure what you mean here sorry
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
The issue is that
is_lexsorted
only looks at whether the codes are sorted - the order of the codes doesn't necessarily reflect the sorting of the index values, e.g. if the index is created from a.groupby(sort=False)
operation (see the linked issue for an example).So, instead of
is_lexsorted
, users should useis_monotonic_increasing
. There isn't an equivalent function to point them to forlexsort_depth
, right?