-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix concat not respecting order of OrderedDict #25224
Conversation
Hello @alexander-ponomaroff! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-03-12 16:12:36 UTC |
@jreback @toobaz I've reviewed your comments in Pull Request #21512 that went stale, and created a fix based on what I observed. As @toobaz said, the ordering of dicts in 3.6 is an implementation detail of cpython, and users should not rely on it in 3.6. Based on this, I changed dict_keys_to_ordered_list to use PY37 instead of PY36 and changed the tests that were failing to use PY37 as well. Please let me know if this is the wrong way of going about this. If my changes are acceptable, I will change the comments in the tests and dict_keys_to_ordered_list to use 3.7 instead of 3.6. |
Codecov Report
@@ Coverage Diff @@
## master #25224 +/- ##
===========================================
- Coverage 92.15% 42.28% -49.88%
===========================================
Files 166 166
Lines 52294 52294
===========================================
- Hits 48193 22111 -26082
- Misses 4101 30183 +26082
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25224 +/- ##
=======================================
Coverage 91.28% 91.28%
=======================================
Files 173 173
Lines 52965 52965
=======================================
Hits 48348 48348
Misses 4617 4617
Continue to review full report at Codecov.
|
Might be out of the loop but wasn't it declared that at first but since then guaranteed as a feature? If so I don't think we need to make distinction between 3.6 and 3.7 here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note to 0.25 in reshaping.
it may be an implementation detail in PY36 but we are consistently testing for this.
@jreback Thank you for the feedback, I will make changes tomorrow. |
@WillAyd I get an error in one of the tests that you created. Would you have any suggestions for me? test_groupby_agg_ohlc_non_first() inside pandas/tests/groupby/test_groupby.py. |
Looks like its the ordering of the columns in the resulting data frame. I think this is actually preferable because it respects the order supplied by the user, so I think OK to update test accordingly |
…and remove an unwanted import
@WillAyd Thank you for such a fast response. Just to confirm, how do you feel about the change I made? |
Right - I’m asking if there’s a way to keep the behavior here consistent in the code base rather than setting different expectations in the test
… On Feb 12, 2019, at 2:40 PM, Alexander Ponomaroff ***@***.***> wrote:
@alexander-ponomaroff commented on this pull request.
In pandas/tests/groupby/test_groupby.py <#25224 (comment)>:
> @@ -1695,6 +1695,10 @@ def test_groupby_agg_ohlc_non_first():
('foo', 'sum', 'foo'))), index=pd.date_range(
'2018-01-01', periods=2, freq='D'))
+ if compat.PY36:
@WillAyd <https://github.com/WillAyd> What do you mean by "Do you know the code responsible for this behavior?". Do I know what is responsible for different order of columns in the DataFrame in the particular case of this test?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#25224 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOUQ6ufDxF8qxcIyIHxcBpfSfu9SR_ks5vM0LEgaJpZM4attTh>.
|
@WillAyd I will have to investigate this and get back to you. I'm not entirely too sure how my change affected test_groupby_agg_ohlc_non_first() because my changes deal with dicts and OrderedDicts in the concat function. So, concat must be used within one of the functions that's used inside the test, but the test doesn't seem to be dealing with any dicts. Please let me know if anything I'm saying is wrong, I recently started solving actual code bugs in Pandas and am still learning. I started my contributions with easy issues and now trying to move onto harder ones :) |
Yea no worries - your contributions are greatly appreciated. If it helps here is where the ohlc function is defined. If you trace the underlying function there back you'll see a concat call, hence why this has some impacts pandas/pandas/core/groupby/groupby.py Line 1311 in 4d44a2a
|
@WillAyd Thank you. I will be looking into it and will update when I find something. |
@WillAyd I've been trying to figure out how to solve the problem with test_groupby_agg_ohlc_non_first() but I'm stuck. Python 3.6 and after, .agg(['sum', 'ohlc']) maintains the order "sum -> ohlc", .agg(['ohlc', 'sum']) maintains the order "ohlc -> sum". When doing .agg(['ohlc', 'sum']), it passes the test in all the versions. This is due to my recent concat change and I traced the call and get why it happens, but I don't know how to solve the issue in this test, without affecting other tests negatively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, ping on green.
@jreback Why am I failing these tests now? Does it have anything to do with changes happened in the last 2 days? |
make sure to merge master |
@jreback I did merge master, right before I pushed my changes. It says that everything is up to date. |
these look related to your changes |
@jreback All tests are green now. Looks like the previous errors were resolved by themselves. When I ran the test locally, I did not get them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls merge master as well
@jreback Fixed requested changes, all tests are green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTm, aside from a small request in the whatsnew.
('foo', 'ohlc', 'open'), ('foo', 'ohlc', 'high'), | ||
('foo', 'ohlc', 'low'), ('foo', 'ohlc', 'close'), | ||
('foo', 'sum', 'foo'))), index=pd.date_range( | ||
('foo', 'sum', 'foo'), ('foo', 'ohlc', 'open'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only change in behavior is that the output order is now (consistently) the arguments passed to .agg
?
Since we did ['sum', 'ohlc']
the output order is 'sum', 'open', 'high', 'low', 'close'
?
If so, let's add an entry to the release notes, under the Groupby bug fix section, saying that .agg
now respects the order of arguments when 'ohlc'
is one of the aggfuncs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be rather difficult to express in a whatsnew since it's really just applicable to _aggregate_multiple_funcs
here.
Perhaps a follow up to consistently use OrderedDict's in the groupby module is in order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is _aggregate_multiple_funcs
called by just by functions like ohlc
that have multiple outputs per input? Or is it also called by .agg([funcs])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter. So this has implications to ordering which previously would have been non-deterministic in Py35.
So I think it might make sense as a follow up to use a OrderedDict and make better guarantees about the returned column order, but don't think that should hold this one up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #25692 as follow up
Sounds good.
…On Tue, Mar 12, 2019 at 12:46 PM William Ayd ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/tests/groupby/test_groupby.py
<#25224 (comment)>:
> @@ -1690,14 +1690,11 @@ def test_groupby_agg_ohlc_non_first():
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]
], columns=pd.MultiIndex.from_tuples((
- ('foo', 'ohlc', 'open'), ('foo', 'ohlc', 'high'),
- ('foo', 'ohlc', 'low'), ('foo', 'ohlc', 'close'),
- ('foo', 'sum', 'foo'))), index=pd.date_range(
+ ('foo', 'sum', 'foo'), ('foo', 'ohlc', 'open'),
The latter. So this has implications to ordering which previously would
have been non-deterministic in Py35.
So I think it might make sense as a follow up to use a OrderedDict and
make better guarantees about the returned column order, but don't think
that should hold this one up
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25224 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIg6jzfNZb8_yQW9MXzmjZPYWaO0Cks5vV-gIgaJpZM4attTh>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thank you @WillAyd and @TomAugspurger for the feedback and the help on this one. @jreback could you take a look if there are any additional changes that need to take place here? |
merging if any additional whatsnew is needed, can do in #25692 |
* master: (22 commits) Fixturize tests/frame/test_operators.py (pandas-dev#25641) Update ValueError message in corr (pandas-dev#25729) DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728) ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720) Make Rolling.apply documentation clearer (pandas-dev#25712) pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714) Json normalize nan support (pandas-dev#25619) TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868) DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486) BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585) Fix concat not respecting order of OrderedDict (pandas-dev#25224) CLN: remove pandas.core.categorical (pandas-dev#25655) TST/CLN: Remove more Panel tests (pandas-dev#25675) Pinned pycodestyle (pandas-dev#25701) DOC: update date of 0.24.2 release notes (pandas-dev#25699) BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644) BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592) BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602) CLN: Remove unused test code (pandas-dev#25670) CLN: remove Panel from concat error message (pandas-dev#25676) ... # Conflicts: # doc/source/whatsnew/v0.25.0.rst
git diff upstream/master -u -- "*.py" | flake8 --diff