Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLS: 1.3.1 #42343

Closed
simonjayhawkins opened this issue Jul 2, 2021 · 16 comments
Closed

RLS: 1.3.1 #42343

simonjayhawkins opened this issue Jul 2, 2021 · 16 comments
Labels
Milestone

Comments

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jul 2, 2021

Tracking issue for the 1.3.1 release.

https://github.com/pandas-dev/pandas/milestone/87

Currently scheduled for July 25, 2021 (date flexible on severity of regressions)

List of open regressions: https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3ARegression

significant performance regressions between 1.2.5 and 1.3.0

       before           after         ratio
     [7c48ff44]       [f00ed8f4]
     <v1.2.5^0>       <v1.3.0^0>
+     2.43±0.03ms         28.4±1ms    11.66  indexing.InsertColumns.time_assign_list_like_with_setitem
+     5.79±0.08ms       29.9±0.4ms     5.17  frame_methods.MaskBool.time_frame_mask_bools
+      88.7±0.9μs          453±5μs     5.11  inference.ToTimedelta.time_convert_int
+        82.6±2ms          386±1ms     4.67  frame_ctor.FromDicts.time_nested_dict_int64
+        363±10μs      1.26±0.01ms     3.48  stat_ops.FrameOps.time_op('prod', 'int', 1)
+         537±7μs      1.84±0.01ms     3.43  stat_ops.FrameOps.time_op('mean', 'int', 1)
+         341±3μs      1.09±0.01ms     3.21  stat_ops.FrameOps.time_op('sum', 'int', 1)
+        643±10μs      1.95±0.01ms     3.03  rolling.EWMMethods.time_ewm('Series', 1000, 'float', 'mean')
+         626±6μs      1.88±0.03ms     3.00  rolling.EWMMethods.time_ewm('Series', 10, 'float', 'mean')
+         678±9μs      2.00±0.04ms     2.95  rolling.EWMMethods.time_ewm('Series', 10, 'int', 'mean')
+        696±10μs      2.01±0.01ms     2.89  rolling.EWMMethods.time_ewm('Series', 1000, 'int', 'mean')
+         592±9μs      1.68±0.01ms     2.84  stat_ops.FrameOps.time_op('mean', 'int', 0)
+        758±10μs      2.11±0.02ms     2.79  rolling.EWMMethods.time_ewm('DataFrame', 10, 'float', 'mean')
+     1.34±0.01ms       3.73±0.1ms     2.78  stat_ops.FrameOps.time_op('std', 'int', 0)
+        804±10μs      2.21±0.03ms     2.75  rolling.EWMMethods.time_ewm('DataFrame', 10, 'int', 'mean')
+         479±4μs      1.32±0.04ms     2.75  stat_ops.FrameOps.time_op('sum', 'int', 0)
+         213±4ns          583±8ns     2.74  tslibs.timestamp.TimestampProperties.time_freqstr(None, 'B')
+         218±2ns          592±4ns     2.72  tslibs.timestamp.TimestampProperties.time_freqstr(<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>, 'B')
+     1.32±0.03ms      3.58±0.04ms     2.70  stat_ops.FrameOps.time_op('var', 'int', 0)
+         221±2ns          595±9ns     2.69  tslibs.timestamp.TimestampProperties.time_freqstr(tzlocal(), 'B')
+         216±2ns          582±6ns     2.69  tslibs.timestamp.TimestampProperties.time_freqstr(datetime.timezone(datetime.timedelta(seconds=3600)), 'B')
+         212±4ns         567±10ns     2.68  tslibs.timestamp.TimestampProperties.time_freqstr(tzfile('/usr/share/zoneinfo/Asia/Tokyo'), 'B')

cc @pandas-dev/pandas-core @pandas-dev/pandas-triage

@simonjayhawkins simonjayhawkins added this to the 1.3.1 milestone Jul 2, 2021
@jbrockmendel
Copy link
Member

for morale purposes, any perf improvements?

@simonjayhawkins
Copy link
Member Author

-      83.3±0.3ms      4.42±0.08ms     0.05  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'Float64')
-        97.6±2ms       4.81±0.1ms     0.05  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'Int64')
-      95.3±0.3ms      4.43±0.07ms     0.05  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'Int64')
-        97.0±1ms      4.48±0.09ms     0.05  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'Int64')
-      27.7±0.2ms      1.27±0.01ms     0.05  rolling.GroupbyEWM.time_groupby_method('std')
-     17.9±0.09ms         786±10μs     0.04  arithmetic.Ops2.time_frame_float_div_by_zero
-         189±2ms      4.88±0.03ms     0.03  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-      62.8±0.3ms      1.05±0.01ms     0.02  reshape.ReshapeExtensionDtype.time_unstack_fast('datetime64[ns, US/Pacific]')
-     2.27±0.03ms       37.2±0.8μs     0.02  categoricals.Concat.time_append_overlapping_index
-       228±0.5ms      3.45±0.03ms     0.02  rolling.GroupbyEWM.time_groupby_method('corr')
-         224±2ms      3.26±0.02ms     0.01  rolling.GroupbyEWM.time_groupby_method('cov')
-         189±2ms      2.43±0.01ms     0.01  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-         189±2ms      2.30±0.01ms     0.01  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-      50.6±0.3μs         600±50ns     0.01  index_cached_properties.IndexCache.time_shape('RangeIndex')
-        51.4±2μs         600±50ns     0.01  index_cached_properties.IndexCache.time_shape('Int64Index')
-      69.6±0.6ms          455±6μs     0.01  indexing.DatetimeIndexIndexing.time_get_indexer_mismatched_tz
-      49.8±0.3ms          305±2μs     0.01  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_decr')
-      49.4±0.4ms          298±3μs     0.01  indexing.CategoricalIndexIndexing.time_get_indexer_list('monotonic_incr')
-         187±4ms      1.13±0.01ms     0.01  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-      49.6±0.3ms          290±2μs     0.01  indexing.CategoricalIndexIndexing.time_get_indexer_list('non_monotonic')
-         188±2ms         1.01±0ms     0.01  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
-         634±6μs      2.16±0.02μs     0.00  inference.ToNumericDowncast.time_downcast('datetime64', None)
-      55.1±0.1ms          155±2μs     0.00  reshape.ReshapeExtensionDtype.time_transpose('datetime64[ns, US/Pacific]')
-       205±0.4ms          223±4μs     0.00  hash_functions.UniqueForLargePyObjectInts.time_unique
-      3.02±0.1ms      1.08±0.02μs     0.00  categoricals.Indexing.time_unique

@jbrockmendel
Copy link
Member

jbrockmendel commented Jul 4, 2021

significant performance regressions between 1.2.5 and 1.3.0

rolling.EWMMethods.time_ewm -> @mroeschke is there a known cause for these?

stat_ops.FrameOps.time_ops based on a quick look we're spending way more time inside ufunc _reduce. The affected cases are the tiny ones, so t

@simonjayhawkins
Copy link
Member Author

stat_ops.FrameOps.time_ops based on a quick look we're spending way more time inside ufunc _reduce.

#38592?

@simonjayhawkins
Copy link
Member Author

  • 11.66 indexing.InsertColumns.time_assign_list_like_with_setitem

#39510 (comment)

  • 5.17 frame_methods.MaskBool.time_frame_mask_bools

#38709 (comment)

The list posted in the OP is not filtered. I think all the relevant performace regression in that list now have issues.

since the official benchmark machine only tracks master, I have unofficial results for 1.3.x at https://simonjayhawkins.github.io/fantastic-dollop/#regressions?sort=3&dir=desc&branch=1.3.x (where some are ignored)

@jbrockmendel
Copy link
Member

got it. if my post is extraneous feel free to delete

@simonjayhawkins
Copy link
Member Author

got it. if my post is extraneous feel free to delete

not at all.

@mroeschke
Copy link
Member

mroeschke commented Jul 5, 2021

rolling.EWMMethods.time_ewm -> @mroeschke is there a known cause for these?

Some of the work in EWM was moved from the cython to the python space which probably caused the slowdown. Need to look back for the reason why things were moved.

Actually from the commit where the regression happened here, https://pandas.pydata.org/speed/pandas/#rolling.EWMMethods.time_ewm?python=3.8&Cython=0.29.21&p-constructor='DataFrame'&p-constructor='Series'&p-window=1000&p-dtype='float'&p-method='mean'&commits=ab687aec-f066cda6, the major thing was that we used to have 2 ewm algorithms that were essentially the same and they were combined, and the combined one does an extra power and div operations per value compared to the old implementation.

The performance was somewhat ameliorated in #40164.

@simonjayhawkins
Copy link
Member Author

@pandas-dev/pandas-core reminder that 1.3.1 is scheduled for end of this week. It is a calendar release so will not block on open PRs. (get them in early if to be included in 1.3.1.)

@simonjayhawkins
Copy link
Member Author

will be moving open issues/ unfinished PRs on 1.3.1 milestone https://github.com/pandas-dev/pandas/milestone/87 to 1.3.2 milestone https://github.com/pandas-dev/pandas/milestone/88 tomorrow in readiness for release on Sunday.

@simonjayhawkins
Copy link
Member Author

open issues, except #42387, now moved to 1.3.2

#42387 is not a blocker, but has an open PR, #42690

1.3.1 is a calendar release. 1.3.2 scheduled for 3 weeks time. if #42690 not ready and merged before tomorrow, will move to 1.3.2

@simonjayhawkins
Copy link
Member Author

simonjayhawkins commented Jul 25, 2021

starting release now. 1.3.1 milestone closed off.

@simonjayhawkins
Copy link
Member Author

only linux_aarch64_numpy1.19python3.7.____73_pypy not yet built.

@simonjayhawkins
Copy link
Member Author

only linux_aarch64_numpy1.19python3.7.____73_pypy not yet built.

this built fine on the PR https://cloud.drone.io/conda-forge/pandas-feedstock/129 but is timing out on the master build. Restarted several times yesterday and several times the day before. Have restarted again today to see if the situation is any better.

@simonjayhawkins
Copy link
Member Author

linux_aarch64_numpy1.19python3.7.____73_pypy now built and available on conda-forge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants