PERF: concat perf #23362

TomAugspurger · 2018-10-26T14:15:14Z

For Series[period], pd.concat is about 6x slower than PeriodArray._concat_same_type. There's always going to be some overhead, but I wonder how much we can narrow this.

In [1]: import numpy as np
   ...: import pandas as pd
   ...:
   ...: a = np.random.randint(2000, 2100, size=1000)
   ...: b = np.random.randint(2000, 2100, size=1000)
   ...:
   ...: x = pd.core.arrays.period_array(a, freq='B')
   ...: y = pd.core.arrays.period_array(b, freq='B')
   ...:
   ...: s = pd.Series(x)
   ...: t = pd.Series(y)


In [2]: %timeit pd.concat([s, t], ignore_index=True)
523 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: %timeit x._concat_same_type([x, y])
90.1 µs ± 948 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The text was updated successfully, but these errors were encountered:

qwhelan · 2018-10-29T01:52:14Z

Based on my testing it seems this unfortunately isn't PeriodArray specific but instead a more general performance issue due to some unnecessary DataFrame creation when calling pd.concat() on only Series objects - I have a simple fix for one hot spot and am chasing down another now.

TomAugspurger · 2018-11-02T19:43:21Z

With #23404 we're down to 2x slower

In [4]: %timeit x._concat_same_type([x, y])
   ...:
99.8 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit pd.concat([s, t], ignore_index=True, copy=False)
   ...:
   ...:
194 µs ± 3.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I looked very briefly at the snakeviz profile after that was merged. A decent chunk of it was in creating an Index. That's necessarily going to slow pd.concat down, but could perhaps be improved a bit.

arw2019 · 2020-09-24T03:11:56Z

The difference on 1.2 master is 10x, although it looks like that's because of a significant (10x) improvement in PeriodArray._concat_same_type that hasn't been matched by the speed-up in pd.concat:

In [2]:  %timeit x._concat_same_type([x, y])                                                           
9.53 µs ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [3]:  %timeit pd.concat([s, t], ignore_index=True)                                                  
100 µs ± 2.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

jbrockmendel · 2023-04-18T23:54:53Z

Still very close to 12x on main following #52290 and #52291, but now those are 4 µs and 47 µs, respectively.

About 11% of the runtime of concat is in __finalize__, xref #51280. Other than that I'm out of ideas on optimizing this any further.

TomAugspurger added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 26, 2018

TomAugspurger added this to the Contributions Welcome milestone Oct 26, 2018

TomAugspurger mentioned this issue Oct 26, 2018

REF: Make PeriodArray an ExtensionArray #22862

Merged

qwhelan mentioned this issue Oct 29, 2018

PERF: speed up concat on Series by skipping unnecessary DataFrame creation #23404

Merged

4 tasks

mroeschke added the Period Period data type label Dec 23, 2018

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

This was referenced Mar 29, 2023

PERF: concat_same_type for PeriodDtype #52290

Merged

PERF: concat #52291

Merged

jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label May 3, 2023

jbrockmendel closed this as completed Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: concat perf #23362

PERF: concat perf #23362

TomAugspurger commented Oct 26, 2018 •

edited

Loading

qwhelan commented Oct 29, 2018

TomAugspurger commented Nov 2, 2018

arw2019 commented Sep 24, 2020

jbrockmendel commented Apr 18, 2023

PERF: concat perf #23362

PERF: concat perf #23362

Comments

TomAugspurger commented Oct 26, 2018 • edited Loading

qwhelan commented Oct 29, 2018

TomAugspurger commented Nov 2, 2018

arw2019 commented Sep 24, 2020

jbrockmendel commented Apr 18, 2023

TomAugspurger commented Oct 26, 2018 •

edited

Loading