-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix multi-column sort that includes Categoricals / concat (GH7848/GH7864) #7850
Conversation
xref #7768 cc @JanSchulz I added
But this impacts your PR #7768 as you are doing essentially what this routine is doing. |
t.map_locations(levels) | ||
return com._ensure_platform_int(t.lookup(values)) | ||
|
||
def _codes_as_ordered(codes, levels, order=True, na_position='last', copy=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a different name or should also include the np.sort(...)
: When I first looked at this patch on my mobile, I expected from the name that this method does sorting, but actually it only does something with the NaN values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was your original routine. its internal. I think its ok
From my understanding of that function (I haven't run it), it changes NaN value from a -1 in codes to either a) this is wrong: sorting the values shouldn't change the the levels! and So if I understand your patch right, sorting NaN to the beginning will add I think adding # untested and you probably have a better idea how to do this :-)
codes = np.sort(self._codes.copy()) # -1,0,1,2,3,4
if not ascending
code = codes[::-1] # 4,3,2,1,0,-1
if na_position='first' and not ascending:
new_codes = codes.copy()
mask = (codes == -1)
n_nans = len(codes[mask])
# in this case sort to the front,
new_codes[1:n_nans] = -1
new_codes[n_nans:] = codes[mask]
if na_position='last' and ascending:
# ... |
@JanSchulz this doesn't change anything, except add the ability to sort by na_position by keyword to |
Ok, the I now see where the reverse sorting happens in There are also still two bugs, which didn't show up in the unittests :-( : edit: The second (and third) is the problem with the "-1 to 0/len(levels)" conversation. Not sure why the first shows up, but I think because
|
I didn't inline because I actually needed it for I'll look at those tests |
side issue: should (from your first test)
should res reverse the levels too? e.g. |
No, ordering should not reverse (or in any way change) the levels, that's what |
hmm, so then you have |
No: in normal ints you also wouldn't change the "underlying order" from "1 is in front of 2" to "2 is in front of 1" if you have to order an int array. If you would sort both the codes array and the levels, the values would actually be in ascending order, as afterwards a
|
CLN: refactor _lexsort_indexer to use Categoricals
BUG: fix multi-column sort that includes Categoricals / concat (GH7848/GH7864)
@JanSchulz I merged this in. After @cpcloud fix, then can deal (in your PR), the remaining nan issues. |
Thanks. Categorical is a real treat and all the fixes are working well. |
@has2k1 no thank you for fiinding issues! |
@jreback will try to get the fix in Categorical in in my PR, but i could happen that this takes another week as I'm on a Conference this weekend until Wednesday. |
no prob |
CLN: refactor _lexsort_indexer to use Categoricals
closes #7848
closes #7864