opt: performance regression on April 21st #64214

erikgrinaker · 2021-04-26T15:09:57Z

roachperf graphs are showing a performance regression across the board on April 21st, e.g.:

Verified this to be caused by #63807, benchmarks of 22e4d64 (old) vs df67aa9 (new):

name                           old ops/sec  new ops/sec  delta
kv95/enc=false/nodes=1/cpu=32   73.1k ± 3%   61.1k ± 3%  -16.30%  (p=0.000 n=9+8)

name                           old p50      new p50      delta
kv95/enc=false/nodes=1/cpu=32    0.70 ± 0%    0.70 ± 0%     ~     (all equal)

name                           old p95      new p95      delta
kv95/enc=false/nodes=1/cpu=32    1.80 ± 0%    2.91 ± 6%  +61.81%  (p=0.000 n=8+8)

name                           old p99      new p99      delta
kv95/enc=false/nodes=1/cpu=32    4.70 ± 0%    6.82 ± 4%  +45.21%  (p=0.000 n=8+8)

Notice how the new April 21st graph shows a clear improvement from the start, but then slows down after about a minute to a level significantly below the old April 19th baseline. Statistics collection would be a likely cause for the abrupt change.

/cc @cockroachdb/test-eng

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2021-04-26T15:54:56Z

Wow.. thanks for tracking this down! I think we are missing special logic when the memo is determined to be "stale".

RaduBerinde · 2021-04-26T15:57:46Z

@erikgrinaker for how long did you run the benchmarks? It must have been more than a minute if it slows down after that?

tbg · 2021-04-26T15:58:32Z

roachtest usually runs 10 minute intervals, which is also what you can see in the above graph

erikgrinaker · 2021-04-26T15:59:27Z

Just did roachtest run --count 10 '^kv95/enc=false/nodes=1/cpu=32$', they're 10-minute runs plus possibly some ramp-up time.

RaduBerinde · 2021-04-26T17:30:26Z

Thanks. I understand the issue now. The problem is that we are using the statistics that are derived using placeholders, which are bogus. Once we have stats, our estimated row count is very large, and that affects the way the query is being executed.

planning time: 259µs
execution time: 4ms
distribution: local
vectorized: true
rows read from KV: 1 (0 B)
cumulative time spent in KV: 679µs
maximum memory usage: 90 KiB
network usage: 0 B (0 messages)

• scan
  columns: (k int, v bytes)
  cluster nodes: n1
  actual row count: 1
  vectorized batch count: 1
  KV rows read: 1
  KV bytes read: 0 B
  estimated row count: 20,653
  table: kv@primary
  spans: /-3812730450502136779/0

This change is best explained by this comment: ``` // We are dealing with a memo that still contains placeholders. The statistics // for such a memo can be wildly overestimated. Even if our plan is correct, // the estimated row count for a scan is passed to the execution engine which // uses it to make sizing decisions. Passing a very high count can affect // performance significantly (see cockroachdb#64214). So we only use the fast path if the // estimated row count is small; typically this will happen when we constrain // columns that form a key and we know there will be at most one row. ``` Fixes cockroachdb#64214. Release note (bug fix): fixed a performance regression for very simple queries.

64225: opt: make placeholder fast path conditional on the estimated row count r=RaduBerinde a=RaduBerinde #### opt: show statistics in placeholder-fast-path tests This highlights a problem with the fast path: we use statistics that are derived using placeholders, which are usually terrible. Release note: None #### opt: mark columns as constant for equality with placeholder This commit improves the FD generation for Select when we have filters like `x = $1`. Because these filters have placeholders, they do not generate constraints (which is the normal mechanism for detecting constant columns). This improves the cardinality property and row count estimate. The estimate will be used for the placeholder fast path. Release note: None #### opt: make placeholder fast path conditional on the estimated row count This change is best explained by this comment: ``` // We are dealing with a memo that still contains placeholders. The statistics // for such a memo can be wildly overestimated. Even if our plan is correct, // the estimated row count for a scan is passed to the execution engine which // uses it to make sizing decisions. Passing a very high count can affect // performance significantly (see #64214). So we only use the fast path if the // estimated row count is small; typically this will happen when we constrain // columns that form a key and we know there will be at most one row. ``` Fixes #64214. Release note (bug fix): fixed a performance regression for very simple queries. 64229: sql: add CREATE STATISTICS regression test r=mgartner a=mgartner This is a "forward-port" of regression tests from #64226. This commit contains no code changes because the bug was already been fixed on master by #59687. Release note: None 64235: roachtest: update version map and fixtures r=darinpp a=darinpp This commit adds the recently released 20.1.15 to the version map in PredecessorVersion. Release note: None (testing change) Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Darin Peshev <[email protected]>

erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-optimizer SQL logical planning and optimizations. labels Apr 26, 2021

erikgrinaker assigned RaduBerinde Apr 26, 2021

tbg mentioned this issue Apr 26, 2021

roachtest: schemachange/index/tpcc/w=1000 failed #62320

Closed

RaduBerinde mentioned this issue Apr 26, 2021

opt: make placeholder fast path conditional on the estimated row count #64225

Merged

craig bot closed this as completed in 976118b Apr 27, 2021

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: performance regression on April 21st #64214

opt: performance regression on April 21st #64214

erikgrinaker commented Apr 26, 2021

RaduBerinde commented Apr 26, 2021

RaduBerinde commented Apr 26, 2021

tbg commented Apr 26, 2021

erikgrinaker commented Apr 26, 2021 •

edited

Loading

RaduBerinde commented Apr 26, 2021

opt: performance regression on April 21st #64214

opt: performance regression on April 21st #64214

Comments

erikgrinaker commented Apr 26, 2021

RaduBerinde commented Apr 26, 2021

RaduBerinde commented Apr 26, 2021

tbg commented Apr 26, 2021

erikgrinaker commented Apr 26, 2021 • edited Loading

RaduBerinde commented Apr 26, 2021

erikgrinaker commented Apr 26, 2021 •

edited

Loading