Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix concat not respecting order of OrderedDict #25224

Merged
merged 23 commits into from
Mar 13, 2019

Conversation

alexander-ponomaroff
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Feb 8, 2019

Hello @alexander-ponomaroff! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-12 16:12:36 UTC

@alexander-ponomaroff
Copy link
Contributor Author

@jreback @toobaz I've reviewed your comments in Pull Request #21512 that went stale, and created a fix based on what I observed. As @toobaz said, the ordering of dicts in 3.6 is an implementation detail of cpython, and users should not rely on it in 3.6. Based on this, I changed dict_keys_to_ordered_list to use PY37 instead of PY36 and changed the tests that were failing to use PY37 as well.

Please let me know if this is the wrong way of going about this. If my changes are acceptable, I will change the comments in the tests and dict_keys_to_ordered_list to use 3.7 instead of 3.6.

@codecov
Copy link

codecov bot commented Feb 8, 2019

Codecov Report

Merging #25224 into master will decrease coverage by 49.87%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25224       +/-   ##
===========================================
- Coverage   92.15%   42.28%   -49.88%     
===========================================
  Files         166      166               
  Lines       52294    52294               
===========================================
- Hits        48193    22111    -26082     
- Misses       4101    30183    +26082
Flag Coverage Δ
#multiple ?
#single 42.28% <66.66%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/concat.py 44% <0%> (-53.2%) ⬇️
pandas/core/common.py 68.81% <100%> (-29.57%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0eddba8...d4e70ef. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 8, 2019

Codecov Report

Merging #25224 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25224   +/-   ##
=======================================
  Coverage   91.28%   91.28%           
=======================================
  Files         173      173           
  Lines       52965    52965           
=======================================
  Hits        48348    48348           
  Misses       4617     4617
Flag Coverage Δ
#multiple 89.86% <100%> (ø) ⬆️
#single 41.74% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby/generic.py 87% <100%> (ø) ⬆️
pandas/core/reshape/concat.py 97.2% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26d991f...1557c21. Read the comment docs.

@WillAyd
Copy link
Member

WillAyd commented Feb 8, 2019

As @toobaz said, the ordering of dicts in 3.6 is an implementation detail of cpython, and users should not rely on it in 3.6

Might be out of the loop but wasn't it declared that at first but since then guaranteed as a feature? If so I don't think we need to make distinction between 3.6 and 3.7 here

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Bug labels Feb 9, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note to 0.25 in reshaping.

it may be an implementation detail in PY36 but we are consistently testing for this.

@alexander-ponomaroff
Copy link
Contributor Author

@jreback Thank you for the feedback, I will make changes tomorrow.

@alexander-ponomaroff
Copy link
Contributor Author

@WillAyd I get an error in one of the tests that you created. Would you have any suggestions for me? test_groupby_agg_ohlc_non_first() inside pandas/tests/groupby/test_groupby.py.

@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2019

Looks like its the ordering of the columns in the resulting data frame. I think this is actually preferable because it respects the order supplied by the user, so I think OK to update test accordingly

@alexander-ponomaroff
Copy link
Contributor Author

@WillAyd Thank you for such a fast response. Just to confirm, how do you feel about the change I made?

@WillAyd
Copy link
Member

WillAyd commented Feb 13, 2019 via email

@alexander-ponomaroff
Copy link
Contributor Author

@WillAyd I will have to investigate this and get back to you. I'm not entirely too sure how my change affected test_groupby_agg_ohlc_non_first() because my changes deal with dicts and OrderedDicts in the concat function. So, concat must be used within one of the functions that's used inside the test, but the test doesn't seem to be dealing with any dicts. Please let me know if anything I'm saying is wrong, I recently started solving actual code bugs in Pandas and am still learning. I started my contributions with easy issues and now trying to move onto harder ones :)

@WillAyd
Copy link
Member

WillAyd commented Feb 13, 2019

Yea no worries - your contributions are greatly appreciated.

If it helps here is where the ohlc function is defined. If you trace the underlying function there back you'll see a concat call, hence why this has some impacts

def ohlc(self):

@alexander-ponomaroff
Copy link
Contributor Author

@WillAyd Thank you. I will be looking into it and will update when I find something.

@alexander-ponomaroff
Copy link
Contributor Author

@WillAyd I've been trying to figure out how to solve the problem with test_groupby_agg_ohlc_non_first() but I'm stuck. Python 3.6 and after, .agg(['sum', 'ohlc']) maintains the order "sum -> ohlc", .agg(['ohlc', 'sum']) maintains the order "ohlc -> sum". When doing .agg(['ohlc', 'sum']), it passes the test in all the versions. This is due to my recent concat change and I traced the call and get why it happens, but I don't know how to solve the issue in this test, without affecting other tests negatively.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, ping on green.

@jreback jreback added this to the 0.25.0 milestone Feb 16, 2019
@alexander-ponomaroff
Copy link
Contributor Author

@jreback Why am I failing these tests now? Does it have anything to do with changes happened in the last 2 days?

@jreback
Copy link
Contributor

jreback commented Feb 17, 2019

make sure to merge master
these were just fixed up today

@alexander-ponomaroff
Copy link
Contributor Author

@jreback I did merge master, right before I pushed my changes. It says that everything is up to date.

@jreback
Copy link
Contributor

jreback commented Feb 17, 2019

these look related to your changes
run the tests locally and see

@alexander-ponomaroff
Copy link
Contributor Author

alexander-ponomaroff commented Feb 19, 2019

@jreback All tests are green now.

Looks like the previous errors were resolved by themselves. When I ran the test locally, I did not get them.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls merge master as well

@alexander-ponomaroff
Copy link
Contributor Author

@jreback Fixed requested changes, all tests are green.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTm, aside from a small request in the whatsnew.

('foo', 'ohlc', 'open'), ('foo', 'ohlc', 'high'),
('foo', 'ohlc', 'low'), ('foo', 'ohlc', 'close'),
('foo', 'sum', 'foo'))), index=pd.date_range(
('foo', 'sum', 'foo'), ('foo', 'ohlc', 'open'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only change in behavior is that the output order is now (consistently) the arguments passed to .agg?

Since we did ['sum', 'ohlc'] the output order is 'sum', 'open', 'high', 'low', 'close'?

If so, let's add an entry to the release notes, under the Groupby bug fix section, saying that .agg now respects the order of arguments when 'ohlc' is one of the aggfuncs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be rather difficult to express in a whatsnew since it's really just applicable to _aggregate_multiple_funcs here.

Perhaps a follow up to consistently use OrderedDict's in the groupby module is in order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is _aggregate_multiple_funcs called by just by functions like ohlc that have multiple outputs per input? Or is it also called by .agg([funcs])?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. So this has implications to ordering which previously would have been non-deterministic in Py35.

So I think it might make sense as a follow up to use a OrderedDict and make better guarantees about the returned column order, but don't think that should hold this one up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #25692 as follow up

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 12, 2019 via email

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alexander-ponomaroff
Copy link
Contributor Author

Thank you @WillAyd and @TomAugspurger for the feedback and the help on this one. @jreback could you take a look if there are any additional changes that need to take place here?

@jreback
Copy link
Contributor

jreback commented Mar 13, 2019

merging if any additional whatsnew is needed, can do in #25692

@jreback jreback merged commit 86879ac into pandas-dev:master Mar 13, 2019
sighingnow added a commit to sighingnow/pandas that referenced this pull request Mar 14, 2019
* master: (22 commits)
  Fixturize tests/frame/test_operators.py (pandas-dev#25641)
  Update ValueError message in corr (pandas-dev#25729)
  DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728)
  ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720)
  Make Rolling.apply documentation clearer (pandas-dev#25712)
  pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714)
  Json normalize nan support (pandas-dev#25619)
  TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868)
  DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486)
  BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585)
  Fix concat not respecting order of OrderedDict (pandas-dev#25224)
  CLN: remove pandas.core.categorical (pandas-dev#25655)
  TST/CLN: Remove more Panel tests (pandas-dev#25675)
  Pinned pycodestyle (pandas-dev#25701)
  DOC: update date of 0.24.2 release notes (pandas-dev#25699)
  BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644)
  BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592)
  BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602)
  CLN: Remove unused test code (pandas-dev#25670)
  CLN: remove Panel from concat error message (pandas-dev#25676)
  ...

# Conflicts:
#	doc/source/whatsnew/v0.25.0.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat does not respect order of an OrderedDict
5 participants