-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add cost penalty for scans with large cardinality #66979
Conversation
This commit fixes an issue where in some edge cases the optimizer would prefer sorting the output of a forward scan over performing a reverse scan (when there is no need to sort the output of the reverse scan). Release note (performance improvement): The optimizer now prefers performing a reverse scan over a forward scan + sort if the reverse scan eliminates the need for a sort and the plans are otherwise equivalent. This was the case before in most cases, but some edge cases with a small number of rows have been fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Surprising (in a good way) that none of the "xform/external" plans are affected!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @rytaft)
pkg/sql/opt/xform/coster.go, line 146 at r2 (raw file):
// unbounded maximum cardinality. This helps prevent surprising plans for very // small tables or for when stats are stale. unboundedMaxCardinalityScanRowCountPenalty = fullScanRowCountPenalty
For full table scans, this is added on top of the fullScanRowPenalty
? That should be clarified here.
pkg/sql/opt/xform/testdata/coster/scan, line 300 at r2 (raw file):
---- exec-ddl
[nit] add a comment explaining what these stats are trying to achieve. We're trying to make the (y,v)
index more appetizing for finding (10,10,10), yes? Surprised why the histogram on x
doesn't contain 10, doesn't this make the index on (x,y)
just as good (in that we expect 0 rows)? (maybe the multi-col stats override that? though the test would be less effective if we improved stats to consider the histogram on x
too)
pkg/sql/opt/xform/testdata/coster/scan, line 423 at r2 (raw file):
opt UPSERT INTO t64570 VALUES (10, 10, 10)
[nit] add a comment explaining what we're looking for - a scan of the PK with cardinality 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, 12 of 41 files at r2, 3 of 3 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 29 of 41 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
This commit adds a new cost function, largeCardinalityRowCountPenalty, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes cockroachdb#64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach and @RaduBerinde)
pkg/sql/opt/xform/coster.go, line 146 at r2 (raw file):
Previously, RaduBerinde wrote…
For full table scans, this is added on top of the
fullScanRowPenalty
? That should be clarified here.
Done.
pkg/sql/opt/xform/testdata/coster/scan, line 300 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] add a comment explaining what these stats are trying to achieve. We're trying to make the
(y,v)
index more appetizing for finding (10,10,10), yes? Surprised why the histogram onx
doesn't contain 10, doesn't this make the index on(x,y)
just as good (in that we expect 0 rows)? (maybe the multi-col stats override that? though the test would be less effective if we improved stats to consider the histogram onx
too)
These were the stats created from the example in the issue. The multi-column stats aren't important here, so I've removed them. The reason this test case works is that we don't use histograms when the cardinality is less than 100, so we estimate 1 row for the primary index scan based on the distinct count and cardinality. But we do use the histograms when the cardinality is unbounded, so we estimate 0 rows for the secondary index scan. Here's the comment from statistics_builder.go
(I completely forgot that we did this...):
// This is the minimum cardinality an expression should have in order to make
// it worth adding the overhead of using a histogram.
minCardinalityForHistogram = 100
pkg/sql/opt/xform/testdata/coster/scan, line 423 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] add a comment explaining what we're looking for - a scan of the PK with cardinality 1.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 39 of 41 files at r2, 3 of 3 files at r3, 2 of 2 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
I'll let this bake for a week-or-so and then backport bors r+ |
Build succeeded: |
The backport to 20.2 is not clean, so I think I will only backport this to 21.1. |
opt: ensure we prefer a reverse scan to sorting a forward scan
This commit fixes an issue where in some edge cases the optimizer would
prefer sorting the output of a forward scan over performing a reverse scan
(when there is no need to sort the output of the reverse scan).
Release note (performance improvement): The optimizer now prefers
performing a reverse scan over a forward scan + sort if the reverse
scan eliminates the need for a sort and the plans are otherwise
equivalent. This was the case before in most cases, but some edge
cases with a small number of rows have been fixed.
opt: add cost penalty for scans with large cardinality
This commit adds a new cost function,
largeCardinalityRowCountPenalty
,which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.
Fixes #64570
Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.