Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain Dict Ordering with Concat #21512

Closed
wants to merge 9 commits into from
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,4 @@ doc/source/styled.xlsx
doc/source/templates/
env/
doc/source/savefig/
*my-dev-test.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this from?

2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,6 @@ Other
^^^^^

- :meth: `~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
-
- Bug in :meth:`concat` should maintain dict order when :meth:`concat` is called (:issue:`2151`)
-
-
7 changes: 5 additions & 2 deletions pandas/core/reshape/concat.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""
concat routines
"""

from collections import OrderedDict
import numpy as np
from pandas import compat, DataFrame, Series, Index, MultiIndex
from pandas.core.index import (_get_objs_combined_axis,
Expand Down Expand Up @@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None,

if isinstance(objs, dict):
if keys is None:
keys = sorted(objs)
if not isinstance(objs, OrderedDict):
keys = sorted(objs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal dicts are ordered too in python3.6+, that must be checked too. There is a PY36 constant somewhere, that you can use for this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we already took a stance on this as a project, but the orderedness of dicts in 3.6 is considered an implementation detail of cpython, while it will be a language feature starting from 3.7 . So in principle in 3.6 users should not rely on it, and we should not assume they do.

@topper-123 's comment still holds I think, but for 3.7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we've already relied on that implementation detail in other changes (see #19830, #19859, #19884) so I'd be +1 on relying on that here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given that Python 3.6 is at 3.6.5 now, and its's being formalised in 3.7, I think this is quite safe. They're not going to change implementation now...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very simliar to in structure to pandas.core.common._dict_keys_to_ordered_list. structure like this. see if we can consolidate all of these try/sorting routings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback you mean use: keys = com._dict_keys_to_ordered_list(objs) ? If so, I'm afraid we'll need to refactor _dict_keys_to_ordered_list because it assumes that PY36 has sorted dict which I think its not always true, and will break test_concat_dict test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's the point i am raising, I want a refactor here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, from _dict_keys_to_ordered_list method, i suggest change the conditionif PY36 or isinstance(mapping, OrderedDict): to if PY37 or isinstance(mapping, OrderedDict): then, remove test_constructor_dict_order_insertion and test_constructor_dict_order. What do you think about it?

else:
keys = objs
objs = [objs[k] for k in keys]
else:
objs = list(objs)
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/reshape/test_concat.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import OrderedDict
from warnings import catch_warnings
from itertools import combinations, product

Expand Down Expand Up @@ -1294,6 +1295,17 @@ def test_concat_rename_index(self):
tm.assert_frame_equal(result, exp)
assert result.index.names == exp.index.names

def test_concat_with_ordered_dict(self):
# GH 21510
result = pd.concat(OrderedDict([('First', pd.Series(range(3))),
('Another', pd.Series(range(4)))]))
index = MultiIndex(levels=[['First', 'Another'], [0, 1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 1, 1],
[0, 1, 2, 0, 1, 2, 3]])
data = list(range(3)) + list(range(4))
expected = pd.Series(data, index=index)
tm.assert_series_equal(result, expected)

def test_crossed_dtypes_weird_corner(self):
columns = ['A', 'B', 'C', 'D']
df1 = DataFrame({'A': np.array([1, 2, 3, 4], dtype='f8'),
Expand Down