Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical in dataframe is sorted lexically. #7848

Closed
has2k1 opened this issue Jul 26, 2014 · 1 comment · Fixed by #7850
Closed

Categorical in dataframe is sorted lexically. #7848

has2k1 opened this issue Jul 26, 2014 · 1 comment · Fixed by #7850
Labels
Bug Categorical Categorical Data Type
Milestone

Comments

@has2k1
Copy link
Contributor

has2k1 commented Jul 26, 2014

code

df = pd.DataFrame({"id":[6,5,4,3,2,1], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
df["grade"] = pd.Categorical(df["raw_grade"])
df['grade'].cat.reorder_levels(['b', 'e', 'a'])

# sorts 'grade' according to the order of the levels
df.sort(columns=['grade'])  

correct output

  id   raw_grade  grade
4 5    b          b
3 4    b          b
0 1    e          e
2 6    a          a
1 3    a          a
5 2    a          a

code

# sorts 'grade' lexically
df.sort(columns=['grade', 'id'])

wrong output

  id   raw_grade  grade
4 2    a          a
3 3    a          a
0 6    a          a
2 4    b          b
1 5    b          b
5 1    e          e

When there is more than one element in the columns list, the Categoricals columns are sorted lexically.

pandas: 0.14.1-78-g24b309f

@jreback
Copy link
Contributor

jreback commented Jul 26, 2014

after #7850

In [1]: df = DataFrame({"id":[6,5,4,3,2,1], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})

In [2]: df["grade"] = pd.Categorical(df["raw_grade"])

In [3]: df['grade'].cat.reorder_levels(['b', 'e', 'a'])

In [4]: df
Out[4]: 
   id raw_grade grade
0   6         a     a
1   5         b     b
2   4         b     b
3   3         a     a
4   2         a     a
5   1         e     e

In [5]: df.dtypes
Out[5]: 
id              int64
raw_grade      object
grade        category
dtype: object

In [6]: df.sort(columns=['grade'])
Out[6]: 
   id raw_grade grade
1   5         b     b
2   4         b     b
5   1         e     e
0   6         a     a
3   3         a     a
4   2         a     a

In [7]: df.sort(columns=['grade', 'id'])
Out[7]: 
   id raw_grade grade
2   4         b     b
1   5         b     b
5   1         e     e
4   2         a     a
3   3         a     a
0   6         a     a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants