CLN: use dispatch_to_series where possible #22534

jbrockmendel · 2018-08-29T17:42:00Z

A bunch of PRs touching DataFrame ops have gone through recently. This does some follow-up cleanup to unify the way things are done across a few different methods.

pep8speaks · 2018-08-29T17:42:06Z

Hello @jbrockmendel! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/frame.py !
In the file pandas/core/ops.py, following are the PEP8 issues :

Line 1646:13: W504 line break after binary operator

Comment last updated on September 07, 2018 at 16:19 Hours UTC

…litops

jreback · 2018-08-31T10:08:03Z

can you check perf. rebase as well.

…litops

jbrockmendel · 2018-08-31T15:52:35Z

can you check perf.

First attempt to check perf turned up a bug in master:

dti = pd.date_range('2016-01-01', periods=10000)
tdi = pd.timedelta_range('1', periods=10000)
tser = pd.Series(tdi)
df = pd.DataFrame({0: dti, 1: tdi})
>>> df.add(tser, axis=0)

Expected (which the PR gets right):

                              0                      1
0 2016-01-01 00:00:00.000000001 0 days 00:00:00.000000
1 2016-01-03 00:00:00.000000001 2 days 00:00:00.000000
2 2016-01-05 00:00:00.000000001 4 days 00:00:00.000000

master raises

ValueError: operands could not be broadcast together with shapes (20000,) (10000,)

I'll add a test for this.

Non-broken cases, first a many-column case where we expect master to perform well:

df = pd.DataFrame(np.random.randn(100000, 60))
df[10:20] = df[10:20].astype('f4')
df[20:30] = df[20:30].astype('i8')
df[30:40] = df[30:40].astype('i4')
df[40:50] = df[40:50].astype('u8')
df[50:60] = df[50:60].astype('u4')

In [29]: %timeit out = df.add(df[0], axis=0)
100 loops, best of 3: 18.1 ms per loop   <-- master
100 loops, best of 3: 18.3 ms per loop  <-- PR

And a few-column case where we expect master to do poorly:

df = pd.DataFrame(np.random.randn(10000000, 6))
df[1] = df[1].astype('f4')
df[2] = df[2].astype('i8')
df[3] = df[3].astype('i4')
df[4] = df[4].astype('u8')
df[5] = df[5].astype('u4')

%timeit out = df.add(df[0], axis=0)
1 loop, best of 3: 903 ms per loop   <-- master
1 loop, best of 3: 582 ms per loop   <-- PR

As elsewhere, I expect perf to improve after #22284.

jreback · 2018-09-04T11:45:27Z

this duplicates #22572 a lot. is this the original?

jbrockmendel · 2018-09-04T13:50:56Z

this duplicates #22572 a lot. is this the original?

This is original. In the course of profiling this I found that this fixes a previously-unknown bug. So this now needs tests and whatsnew etc. #22572 splits off the still-easy part of this.

…litops

jbrockmendel · 2018-09-07T03:16:42Z

To fully finish this off will require resolution to #22614.

codecov · 2018-09-07T16:19:21Z

Codecov Report

❗ No coverage uploaded for pull request base (master@5eb9988). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #22534   +/-   ##
=========================================
  Coverage          ?   92.05%           
=========================================
  Files             ?      169           
  Lines             ?    50787           
  Branches          ?        0           
=========================================
  Hits              ?    46753           
  Misses            ?     4034           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.46% <100%> (?)`
#single	`42.29% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/ops.py	`96.91% <100%> (ø)`
pandas/core/frame.py	`97.2% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5eb9988...8f0cdbc. Read the comment docs.

jreback · 2018-09-08T02:43:27Z

can you rebase

jbrockmendel · 2018-09-08T02:58:50Z

can you rebase

Sure. Big evening for merging. Note comment above about #22614.

…litops

jbrockmendel · 2018-09-08T03:03:10Z

Heh, looks like the other PRs merged this evening already cover this. Closing. Will need to follow-up with a test for the bug that was accidentally fixed.

jreback · 2018-09-08T03:04:13Z

great! I am not really sure about #22614, none of the options are really palatable.

jbrockmendel · 2018-09-08T03:11:02Z

I am not really sure about #22614, none of the options are really palatable.

Well we've de-facto been going down the path of option 1. I actually prefer option 2 longer-term (better to discuss there), but for the time being correctness-first seems to favor option 1, and #22284 should take some of the pain out of it.

…v#22694)

CLN: use dispatch_to_series where possible

bc95077

jbrockmendel added 2 commits August 29, 2018 10:43

flake8 fixup

0b5f86a

Merge branch 'master' of https://github.com/pandas-dev/pandas into sp…

e21afdd

…litops

jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Clean labels Aug 31, 2018

jreback added this to the 0.24.0 milestone Aug 31, 2018

Merge branch 'master' of https://github.com/pandas-dev/pandas into sp…

80e445c

…litops

jbrockmendel mentioned this pull request Sep 1, 2018

Use dispatch_to_series where possible #22572

Merged

Merge branch 'master' of https://github.com/pandas-dev/pandas into sp…

f9e21ad

…litops

Merge branch 'master' of https://github.com/pandas-dev/pandas into sp…

8f0cdbc

…litops

jbrockmendel closed this Sep 8, 2018

jbrockmendel deleted the splitops branch September 8, 2018 03:11

jbrockmendel mentioned this pull request Sep 13, 2018

TST: Test for bug fixed during #22534 discussion #22694

Merged

jreback pushed a commit that referenced this pull request Sep 15, 2018

TST: Test for bug fixed during #22534 discussion (#22694)

45bbca0

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018

TST: Test for bug fixed during pandas-dev#22534 discussion (pandas-de…

1761dbc

…v#22694)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

TST: Test for bug fixed during pandas-dev#22534 discussion (pandas-de…

a092be8

…v#22694)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: use dispatch_to_series where possible #22534

CLN: use dispatch_to_series where possible #22534

jbrockmendel commented Aug 29, 2018

pep8speaks commented Aug 29, 2018 •

edited

Loading

jreback commented Aug 31, 2018

jbrockmendel commented Aug 31, 2018

jreback commented Sep 4, 2018

jbrockmendel commented Sep 4, 2018

jbrockmendel commented Sep 7, 2018

codecov bot commented Sep 7, 2018 •

edited

Loading

jreback commented Sep 8, 2018

jbrockmendel commented Sep 8, 2018

jbrockmendel commented Sep 8, 2018

jreback commented Sep 8, 2018

jbrockmendel commented Sep 8, 2018

CLN: use dispatch_to_series where possible #22534

CLN: use dispatch_to_series where possible #22534

Conversation

jbrockmendel commented Aug 29, 2018

pep8speaks commented Aug 29, 2018 • edited Loading

Comment last updated on September 07, 2018 at 16:19 Hours UTC

jreback commented Aug 31, 2018

jbrockmendel commented Aug 31, 2018

jreback commented Sep 4, 2018

jbrockmendel commented Sep 4, 2018

jbrockmendel commented Sep 7, 2018

codecov bot commented Sep 7, 2018 • edited Loading

Codecov Report

jreback commented Sep 8, 2018

jbrockmendel commented Sep 8, 2018

jbrockmendel commented Sep 8, 2018

jreback commented Sep 8, 2018

jbrockmendel commented Sep 8, 2018

pep8speaks commented Aug 29, 2018 •

edited

Loading

codecov bot commented Sep 7, 2018 •

edited

Loading