opt: add cost penalty for scans with large cardinality #66979

rytaft · 2021-06-28T17:10:58Z

opt: ensure we prefer a reverse scan to sorting a forward scan

This commit fixes an issue where in some edge cases the optimizer would
prefer sorting the output of a forward scan over performing a reverse scan
(when there is no need to sort the output of the reverse scan).

Release note (performance improvement): The optimizer now prefers
performing a reverse scan over a forward scan + sort if the reverse
scan eliminates the need for a sort and the plans are otherwise
equivalent. This was the case before in most cases, but some edge
cases with a small number of rows have been fixed.

opt: add cost penalty for scans with large cardinality

This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes #64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.

This commit fixes an issue where in some edge cases the optimizer would prefer sorting the output of a forward scan over performing a reverse scan (when there is no need to sort the output of the reverse scan). Release note (performance improvement): The optimizer now prefers performing a reverse scan over a forward scan + sort if the reverse scan eliminates the need for a sort and the plans are otherwise equivalent. This was the case before in most cases, but some edge cases with a small number of rows have been fixed.

cockroach-teamcity · 2021-06-28T17:11:08Z

This change is

RaduBerinde

Thanks for this! Surprising (in a good way) that none of the "xform/external" plans are affected!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)

pkg/sql/opt/xform/coster.go, line 146 at r2 (raw file):

	// unbounded maximum cardinality. This helps prevent surprising plans for very
	// small tables or for when stats are stale.
	unboundedMaxCardinalityScanRowCountPenalty = fullScanRowCountPenalty

For full table scans, this is added on top of the fullScanRowPenalty? That should be clarified here.

pkg/sql/opt/xform/testdata/coster/scan, line 300 at r2 (raw file):

----

exec-ddl

[nit] add a comment explaining what these stats are trying to achieve. We're trying to make the (y,v) index more appetizing for finding (10,10,10), yes? Surprised why the histogram on x doesn't contain 10, doesn't this make the index on (x,y) just as good (in that we expect 0 rows)? (maybe the multi-col stats override that? though the test would be less effective if we improved stats to consider the histogram on x too)

pkg/sql/opt/xform/testdata/coster/scan, line 423 at r2 (raw file):


opt
UPSERT INTO t64570 VALUES (10, 10, 10)

[nit] add a comment explaining what we're looking for - a scan of the PK with cardinality 1.

cucaroach

Reviewed 5 of 5 files at r1, 12 of 41 files at r2, 3 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach and @rytaft)

cucaroach

Reviewed 29 of 41 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

This commit adds a new cost function, largeCardinalityRowCountPenalty, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes cockroachdb#64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale.

rytaft

TFTRs!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach and @RaduBerinde)

pkg/sql/opt/xform/coster.go, line 146 at r2 (raw file):

Previously, RaduBerinde wrote…

For full table scans, this is added on top of the fullScanRowPenalty? That should be clarified here.

Done.

pkg/sql/opt/xform/testdata/coster/scan, line 300 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] add a comment explaining what these stats are trying to achieve. We're trying to make the (y,v) index more appetizing for finding (10,10,10), yes? Surprised why the histogram on x doesn't contain 10, doesn't this make the index on (x,y) just as good (in that we expect 0 rows)? (maybe the multi-col stats override that? though the test would be less effective if we improved stats to consider the histogram on x too)

These were the stats created from the example in the issue. The multi-column stats aren't important here, so I've removed them. The reason this test case works is that we don't use histograms when the cardinality is less than 100, so we estimate 1 row for the primary index scan based on the distinct count and cardinality. But we do use the histograms when the cardinality is unbounded, so we estimate 0 rows for the secondary index scan. Here's the comment from statistics_builder.go (I completely forgot that we did this...):

	// This is the minimum cardinality an expression should have in order to make
	// it worth adding the overhead of using a histogram.
	minCardinalityForHistogram = 100

pkg/sql/opt/xform/testdata/coster/scan, line 423 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] add a comment explaining what we're looking for - a scan of the PK with cardinality 1.

Done.

RaduBerinde

Reviewed 1 of 5 files at r1, 39 of 41 files at r2, 3 of 3 files at r3, 2 of 2 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

rytaft · 2021-06-28T19:59:51Z

I'll let this bake for a week-or-so and then backport

bors r+

craig · 2021-06-28T21:13:50Z

Build succeeded:

GitHub CI (Cockroach)

rytaft · 2021-07-08T20:38:50Z

The backport to 20.2 is not clean, so I think I will only backport this to 21.1.

rytaft requested review from RaduBerinde, cucaroach and a team June 28, 2021 17:10

RaduBerinde approved these changes Jun 28, 2021

View reviewed changes

rytaft force-pushed the card-cost branch from 4beaeee to 83e6497 Compare June 28, 2021 18:54

cucaroach reviewed Jun 28, 2021

View reviewed changes

cucaroach approved these changes Jun 28, 2021

View reviewed changes

rytaft force-pushed the card-cost branch from 83e6497 to 215cdcb Compare June 28, 2021 19:49

rytaft commented Jun 28, 2021

View reviewed changes

RaduBerinde approved these changes Jun 28, 2021

View reviewed changes

rytaft added backport-21.1.x labels Jun 28, 2021

craig bot merged commit 045d42e into cockroachdb:master Jun 28, 2021

rytaft mentioned this pull request Jul 8, 2021

release-21.1: opt: add cost penalty for scans with large cardinality #67388

Merged

rytaft removed the backport-20.2.x label Jul 8, 2021

mgartner mentioned this pull request Aug 6, 2021

opt: unbounded max cardinality penalty makes number of index columns dominate scan cost #68556

Closed

rytaft mentioned this pull request May 13, 2022

sql: statement planner does not use existing index in some cases with a massive effect on performance #75555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: add cost penalty for scans with large cardinality #66979

opt: add cost penalty for scans with large cardinality #66979

rytaft commented Jun 28, 2021

cockroach-teamcity commented Jun 28, 2021

RaduBerinde left a comment

cucaroach left a comment

cucaroach left a comment

rytaft left a comment

RaduBerinde left a comment

rytaft commented Jun 28, 2021

craig bot commented Jun 28, 2021

rytaft commented Jul 8, 2021

opt: add cost penalty for scans with large cardinality #66979

opt: add cost penalty for scans with large cardinality #66979

Conversation

rytaft commented Jun 28, 2021

cockroach-teamcity commented Jun 28, 2021

RaduBerinde left a comment

Choose a reason for hiding this comment

cucaroach left a comment

Choose a reason for hiding this comment

cucaroach left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft commented Jun 28, 2021

craig bot commented Jun 28, 2021

rytaft commented Jul 8, 2021