-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for TypeError: unorderable types" in when using set_index with multiple column names #22072
Conversation
dickreuter
commented
Jul 27, 2018
•
edited
Loading
edited
- closes "TypeError: unorderable types" in Python3 when column for MultiIndex contains tuple and int #15457
Going to need tests for this as well. |
Thanks for your attempt! But is this really the bug you're after?! (feel free to edit the description) |
Codecov Report
@@ Coverage Diff @@
## master #22072 +/- ##
=======================================
Coverage 92.08% 92.08%
=======================================
Files 169 169
Lines 50694 50694
=======================================
Hits 46682 46682
Misses 4012 4012
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is good, some comments.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -430,7 +430,7 @@ Bug Fixes | |||
Categorical | |||
^^^^^^^^^^^ | |||
|
|||
- | |||
- Fixes an issue with creating multiple indices with for example .set_index([a,b]) giving an error "TypeError: unorderable types" (:issue:'15457') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not telling the user what the problem really was (that example given works fine in 99% of cases). Moreover, the bug (in the sense of API misbehavior) is unrelated to categorical
Consider e.g. Fix issue when creating :class:`MultiIndex` from mixed type arrays including tuples (:issue:'15457')
pandas/core/arrays/categorical.py
Outdated
@@ -2520,7 +2520,7 @@ def _factorize_from_iterable(values): | |||
ordered=values.ordered) | |||
codes = values.codes | |||
else: | |||
cat = Categorical(values, ordered=True) | |||
cat = Categorical(values, ordered=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart fix... but not really simple to understand. Please add a comment stating that the value of ordered
is irrelevant since we don't use cat
as such, but only the resulting categories, the order of which is independent from ordered
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always write tests first - how else can you tell if you fixed something?
Hello @dickreuter! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 11, 2018 at 12:07 Hours UTC |
|
Would like to check it locally but building pandas seems impossible (cl.exe failed with exit status 2). Is there no additional info what version of visual studio dev tools is needed exactly? https://stackoverflow.com/questions/51566895/unable-to-build-pandas-from-source-cl-exe-failed-with-exit-status-2 |
I don't see how the error on the ci is related to my change (especially also since the tests passed yesterday) |
At least one is explained by
(see also the automatic comment by @pep8speaks ) |
Still getting an error which I can't see how it's related to this: BufferError: could not get scalar buffer information. Running this test locally passes fine. |
doc/source/whatsnew/v0.23.4.txt
Outdated
@@ -51,7 +51,7 @@ Bug Fixes | |||
|
|||
**Categorical** | |||
|
|||
- | |||
- Fix issue when creating :class:`MultiIndex` from mixed type arrays including tuples (:issue:'15457') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be even more explicit here, a user won't under stand this.
@@ -2520,7 +2520,10 @@ def _factorize_from_iterable(values): | |||
ordered=values.ordered) | |||
codes = values.codes | |||
else: | |||
cat = Categorical(values, ordered=True) | |||
# The value of ordered is irrelevant since we don't use cat as such, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue reference
@@ -121,3 +121,11 @@ def test_get_indexer_non_unique(self, idx_values, key_values, key_class): | |||
|
|||
tm.assert_numpy_array_equal(expected, result) | |||
tm.assert_numpy_array_equal(exp_miss, res_miss) | |||
|
|||
|
|||
class TestSetIndex(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in pandas/tests/frame/test_alter_axes.py.
put in a full comparison on the result table and use assert_fram_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df = DataFrame([[2, 1, 2], [4, (1, 2), 3]]).set_index([0, 1])
gives me
Out[97]:
2
0 1
2 1 2
4 (1, 2) 3
How I construct this directly from the DataFrame constructor? The first fow seems empty?
df.to_dict() gives {2: {(2, 1): 2, (4, (1, 2)): 3}}
I tried:
DataFrame(['',2,3],
columns=['2'],
index=[['0','2','4'], ['1','1','(1,2)']])
but it's not equal (even though it looks equal), but
DataFrame shape mismatch
[left]: (2, 1)
[right]: (3, 1)
Implemented with dict instead:
assert df.to_dict() == {2: {(2, 1): 2, (4, (1, 2)): 3}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it's not equal (even though it looks equal), but
Yes, they look equal, but you can see from how you built them they are not. [0, 1]
are the index names, not index values.
pd.DataFrame([2,3], columns=['2'],
index=pd.MultiIndex.from_arrays([['2','4'], ['1','(1,2)']], names=[0, 1]))
is what you're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df = DataFrame([[2, 1, 2], [4, (1, 2), 3]]).set_index([0, 1])
assert_frame_equal(df, pd.DataFrame([2, 3],
columns=['2'],
index=pd.MultiIndex.
from_arrays([['2', '4'],
['1', '(1,2)']],
names=[0, 1])))
MultiIndex level [0] classes are not equivalent
[left]: Int64Index([2, 4], dtype='int64', name=0)
[right]: Index(['2', '4'], dtype='object', name=0)
I also tried
df = DataFrame([[2, 1, 2], [4, (1, 2), 3]]).set_index([0, 1])
assert_frame_equal(df, pd.DataFrame([2, 3],
columns=['2'],
index=pd.MultiIndex.
from_arrays([[2, 4],
['1', '(1,2)']],
names=[0, 1])))
Attribute "inferred_type" are different
[left]: mixed-integer
[right]: string
Maybe the dictionary is a more feasible solution,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried
The original df
has no strings at all, just integers. Why are you putting strings in the other (e.g. '2'
, '(1,2)'
?!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the two strings I get ValueError: setting an array element with a sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the two strings I get ValueError: setting an array element with a sequence.
OK, pd.MultiIndex.from_tuples([(2, 1), (4, (1,2))], names=[0, 1])
doc/source/whatsnew/v0.23.4.txt
Outdated
@@ -51,7 +51,7 @@ Bug Fixes | |||
|
|||
**Categorical** | |||
|
|||
- | |||
- Fix issue when creating :class:`MultiIndex` from mixed type arrays including tuples (:issue:'15457'). In some cases df.set_index(['A','B']) raised a TypeError unorderable types when trying to set multiple columns as indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.24
@@ -1186,3 +1186,11 @@ def test_set_axis_prior_to_deprecation_signature(self): | |||
with tm.assert_produces_warning(FutureWarning): | |||
result = df.set_axis(axis, list('abc'), inplace=False) | |||
tm.assert_frame_equal(result, expected[axis]) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add a new class, rather put this test next to the other set_index tests
def test_set_index_tuple(self): | ||
# test for fix of issue #15457, where the following raised a TypeError | ||
df = DataFrame([[2, 1, 2], [4, (1, 2), 3]]).set_index([0, 1]) | ||
assert df.to_dict() == {2: {(2, 1): 2, (4, (1, 2)): 3}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert_frame_equal, IOW you must construct the expected frame. use result=
and expected=
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback @dickreuter since the bug is in MultiIndex.from_arrays
, I think that only reconstructing the index (as I suggested above, comparing from_arrays
with from_tuples
, which already works) rather than the full DataFrame
- and moving the test to pandas/tests/indexes/multi/test_constructor.py
- is OK (actually even better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz Comparing from_arrays with from_tuples? Not sure I understand. As you said, this already works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dickreuter from_tuples
already works, from_arrays
doesn't. With your patch, both do, so you can compare one to the other. Something like (untested):
result = pd.MultiIndex.from_arrays([[2, 4], [1, (1, 2)]])
expected = pd.MultiIndex.from_tuples([(2, 1), (4, (1,2))])
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I reverted to the test via dict as otherwise line 908 of cast.py causes an error when trying to create an array from [1, (1, 2)], raising {ValueError}setting an array element with a sequence. This may be an unrelated issue but not sure the test can be constructed in that way at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dickreuter this has to be an assert on a frame, otherwise this test is not testing what we need here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ValueError}setting an array element with a sequence
@dickreuter You are right, sorry. This should work instead:
expected = pd.DataFrame([[2, 1], [4, (1, 2)]]).set_index([0, 1]).index
result = pd.MultiIndex.from_tuples([(2, 1), (4, (1,2))], names=(0, 1))
@jreback see my comment above, the bug is unrelated to DataFrame
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -430,7 +430,7 @@ Bug Fixes | |||
Categorical | |||
^^^^^^^^^^^ | |||
|
|||
- | |||
- Fix issue when creating :class:`MultiIndex` from mixed type arrays including tuples (:issue:'15457'). In some cases df.set_index(['A','B']) raised a TypeError unorderable types when trying to set multiple columns as indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double backticks around TypeError, list the issue number here as well. Move this to MultiIndex section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want thesame issue number mentioned twice? I'm not aware of any second issue number.
@toobaz: I'm getting |
@toobaz I reverted to the test via dict as otherwise line 908 of cast.py causes an error when trying to create an array from [1, (1, 2)], raising {ValueError}setting an array element with a sequence. |
https://travis-ci.org/pandas-dev/pandas/jobs/413344917#L3557 @dickreuter : You have linter failures. I would fix those first before pinging. |
rebased @toobaz merge when satisfied |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -626,7 +626,7 @@ MultiIndex | |||
|
|||
- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`) | |||
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a :class:``MultiIndex``ed object) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`) | |||
- | |||
- Fix `TypeError` in Python 3 when creating `MultiIndex` in which some columns are of object type, e.g. when labels are tuples (:issue:'15457') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from my comment on "object
type" ( #22072 ), you have lost the ":class:" before "MultiIndex
", and you still have quotes rather than backticks around the issue number
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -626,7 +626,7 @@ MultiIndex | |||
|
|||
- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`) | |||
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a :class:``MultiIndex``ed object) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`) | |||
- | |||
- Fix `TypeError` in Python 3 when creating :class: `MultiIndex` in which some columns are of object type, e.g. when labels are tuples (:issue:`15457`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dickreuter OK, I think I see your point. What I'm saying is "not all object
" dtype give problems, what you are saying is "other object
dtypes than tuples give problems". I think a more precise description then is "in which some levels have mixed types", as I guess happens in your case (no error comes if all labels are ctype items, right?).
Then i think we're ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I believe in the example I first hit the error there where 3 columns. All of them were dtype and all the rows were homogenous of the same type as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz When you say mixed type, is the column name also relevant for that for each for? If yes, then I assume it’s a mixed type because the column names were strings in my case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your new version is almost ready (see my last comment), but for the records:
I believe in the example I first hit the error there where 3 columns. All of them were dtype and all the rows were homogenous of the same type as well.
The error only comes out when two elements (in a same column) cannot be ordered, which in Python 3 happens when they are of non-comparable types. If you got an error with columns which homogeneous types, I doubt it was the same error.
Notice that I'm referring to (Python) types, not dtypes. If you put mixed types in a single column, then its dtype will be just object
, but it is still a mixed type column.
When you say mixed type, is the column name also relevant for that for each for?
No, column names are stored separately and should be unrelated to this bug and to any issue with mixed type columns/levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume in my case the elements in the same column were unorderable not because they had multiple non-comparable types but because they contained just references to memory of python ctype objects.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -626,7 +626,7 @@ MultiIndex | |||
|
|||
- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`) | |||
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a :class:``MultiIndex``ed object) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`) | |||
- | |||
- Fix `TypeError` in Python 3 when creating :class: `MultiIndex` in which some levels have mixed types, e.g. when labels are tuples (:issue:`15457`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there... "when some labels are tuples". If (for instance) all labels are tuples of ints, there is no problem.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -626,7 +626,7 @@ MultiIndex | |||
|
|||
- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`) | |||
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a :class:``MultiIndex``ed object) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`) | |||
- | |||
- Fix `TypeError` in Python 3 when creating :class: `MultiIndex` in which some levels have mixed types. If (for instance) all labels are tuples of ints, there is no problem (:issue:`15457`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dickreuter the previous one was almost OK, this one is not... in the whatsnew, we don't need examples in which "there is no problem" :-)
Please revert to the previous one, just adding "some", that is
Fix `TypeError` in Python 3 when creating :class: `MultiIndex` in which some levels have mixed types, e.g. when some labels are tuples (:issue:`15457`)
This reverts commit b645286
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -626,7 +626,7 @@ MultiIndex | |||
|
|||
- Removed compatibility for :class:`MultiIndex` pickles prior to version 0.8.0; compatibility with :class:`MultiIndex` pickles from version 0.13 forward is maintained (:issue:`21654`) | |||
- :meth:`MultiIndex.get_loc_level` (and as a consequence, ``.loc`` on a :class:``MultiIndex``ed object) will now raise a ``KeyError``, rather than returning an empty ``slice``, if asked a label which is present in the ``levels`` but is unused (:issue:`22221`) | |||
- | |||
- Fix `TypeError` in Python 3 when creating :class: `MultiIndex` in which some levels have mixed types, e.g. when some labels are tuples (:issue:`15457`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backticks around TypeError
, no space after :class:
@toobaz Why does travis fail with "Unable to download 3.5 archive. The archive may not exist. Please consider a different version"? |
Thanks @dickreuter ! |
@dickreuter : FYI, those Travis failures that you saw earlier are spurious. I restarted the builds to get them passing and this PR merged 🙂 |
…with mixed dtypes (pandas-dev#22072) closes pandas-dev#15457