opt: unbounded max cardinality penalty makes number of index columns dominate scan cost #68556

mgartner · 2021-08-06T19:51:31Z

In #66979 we introduced an unboundedMaxCardinalityScanRowCountPenalty to constrained scans that have an unbounded cardinality. This increased the likelihood that, with stale table statistics, the optimizer would choose a plan with a scan on a primary index or unique secondary index with a bounded number of potential rows scanned over a scan on a non-unique secondary index with an unbounded number of potential rows scanned.

This change had a negative side-effect. With stale table statistics that make a query look extremely selective (e.g., filtering for a value greater than the maximum histogram upper-bound), the number of columns in an index dominates the cost of a scan. This causes the optimizer to favor a scan over a few columns over a scan with more columns, even if the former requires an addition filter and/or index join after the scan.

The test below shows that the index with fewer columns is chosen even though an extra select is needed. See the optstepsweb output too.

exec-ddl
CREATE TABLE t (
  k INT PRIMARY KEY,
  a INT,
  b INT,
  c INT,
  INDEX abc (a, b) STORING (c),
  INDEX bc (b) STORING (c)
)
----

# Inject stats for 100k rows where the upper bound of both a and b is 10.
exec-ddl
ALTER TABLE t INJECT STATISTICS '[
    {
        "columns": [
            "k"
        ],
        "created_at": "2021-06-28 14:20:18.273949",
        "distinct_count": 3,
        "histo_col_type": "INT8",
        "null_count": 0,
        "row_count": 100000
    },
    {
        "columns": [
            "a"
        ],
        "created_at": "2021-06-28 14:20:18.273949",
        "distinct_count": 100,
        "histo_buckets": [
            {"distinct_range": 0, "num_eq": 0, "num_range": 0, "upper_bound": "0"},
            {"distinct_range": 100, "num_eq": 0, "num_range": 100000, "upper_bound": "10"}
        ],
        "histo_col_type": "INT8",
        "null_count": 0,
        "row_count": 100000
    },
    {
        "columns": [
            "b"
        ],
        "created_at": "2021-06-28 14:20:18.273949",
        "distinct_count": 100,
        "histo_buckets": [
            {"distinct_range": 0, "num_eq": 0, "num_range": 0, "upper_bound": "0"},
            {"distinct_range": 100, "num_eq": 0, "num_range": 100000, "upper_bound": "10"}
        ],
        "histo_col_type": "INT8",
        "null_count": 0,
        "row_count": 100000
    }
]'
----

# Query for values of a and b that are outside the histogram buckets.
# A scan on the bc index with an additional filter and index join is chosen over
# a scan on the abc index with no additional filter and no index join.
opt format=(show-cost,show-stats)
SELECT * FROM t WHERE a = 11 AND b = 11
----
select
 ├── columns: k:1!null a:2!null b:3!null c:4
 ├── stats: [rows=1e-09, distinct(2)=1e-09, null(2)=0, distinct(3)=1e-09, null(3)=0, distinct(2,3)=1e-09, null(2,3)=0]
 ├── cost: 14.6400007
 ├── index-join t
 │    ├── columns: k:1!null a:2 b:3 c:4
 │    ├── stats: [rows=1e-07]
 │    ├── cost: 14.6200007
 │    └── scan t@bc
 │         ├── columns: k:1!null b:3!null c:4
 │         ├── constraint: /3/1: [/11 - /11]
 │         ├── stats: [rows=1e-07, distinct(3)=1e-07, null(3)=0]
 │         └── cost: 14.6100001
 └── filters
      └── a:2 = 11

The text was updated successfully, but these errors were encountered:

mgartner · 2021-08-07T01:20:06Z

I think the solution might be something like normalizing the large cardinality row cost penalty by the perRowCost:

diff --git a/pkg/sql/opt/xform/coster.go b/pkg/sql/opt/xform/coster.go
index 084d02337e..3ae0340e89 100644
--- a/pkg/sql/opt/xform/coster.go
+++ b/pkg/sql/opt/xform/coster.go
@@ -656,7 +656,8 @@ func (c *coster) computeScanCost(scan *memo.ScanExpr, required *physical.Require
        // Add a penalty if the cardinality exceeds the row count estimate. Adding a
        // few rows worth of cost helps prevent surprising plans for very small tables
        // or for when stats are stale.
-       rowCount += c.largeCardinalityRowCountPenalty(scan.Relational().Cardinality, rowCount)
+       rowCount += c.largeCardinalityRowCountPenalty(scan.Relational().Cardinality, rowCount) /
+               float64(perRowCost)

        if required.LimitHint != 0 {
                rowCount = math.Min(rowCount, required.LimitHint)

This prevents the number of columns in the index from dominating the cost. However when I tried this there were quite a few plans in the tests that changed, so there's likely more to this.

This commit tweaks the application of the unbounded cardinality penalty in the coster to add it directly to the cost of scans and zigzag joins rather than to the row count. This helps prevent an issue in which the number of index columns could dominate the scan cost and result in suboptimal plans. Fixes cockroachdb#68556 Release note (bug fix): Fixed a regression in the optimizer's cost model that could cause it to choose suboptimal plans when choosing between two non-unique index scans with different numbers of columns per index.

68676: opt: adjust cost of scan with unbounded cardinality to avoid bad plans r=rytaft a=rytaft This commit tweaks the application of the unbounded cardinality penalty in the coster to add it directly to the cost of scans and zigzag joins rather than to the row count. This helps prevent an issue in which the number of index columns could dominate the scan cost and result in suboptimal plans. Fixes #68556 Release note (bug fix): Fixed a regression in the optimizer's cost model that could cause it to choose suboptimal plans when choosing between two non-unique index scans with different numbers of columns per index. Co-authored-by: Rebecca Taft <[email protected]>

68447: ui: fix awkward display of statements in transaction detail page r=Azhng a=Azhng Previously, we aggregate the statement texts in transaction detail page. However, this can be misleading in the case when a statement is executed multiple times in a transaction. Also, different transactions with same statements but different execution count and execution order are also incorrectly grouped into a single entry. This commit removes the aggregation when rendering the SQL Statement Box in the transaction detail page and correctly groups the different transactions into different entries in the Transaction Page. This commit also removes the statement statistics table in the transaction detail page since this can be misleading. Release note (ui change): Transaction Detail Page's SQL Box now shows all of transaction's statements in the order that they were executed. Transaction Detail Page also no longer display statement statistics. 68676: opt: adjust cost of scan with unbounded cardinality to avoid bad plans r=rytaft a=rytaft This commit tweaks the application of the unbounded cardinality penalty in the coster to add it directly to the cost of scans and zigzag joins rather than to the row count. This helps prevent an issue in which the number of index columns could dominate the scan cost and result in suboptimal plans. Fixes #68556 Release note (bug fix): Fixed a regression in the optimizer's cost model that could cause it to choose suboptimal plans when choosing between two non-unique index scans with different numbers of columns per index. Co-authored-by: Azhng <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>

This commit tweaks the application of the unbounded cardinality penalty in the coster to add it directly to the cost of scans and zigzag joins rather than to the row count. This helps prevent an issue in which the number of index columns could dominate the scan cost and result in suboptimal plans. Fixes cockroachdb#68556 Release note (bug fix): Fixed a regression in the optimizer's cost model that could cause it to choose suboptimal plans when choosing between two non-unique index scans with different numbers of columns per index.

mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. labels Aug 6, 2021

mgartner assigned rytaft Aug 6, 2021

blathers-crl bot added the T-sql-queries SQL Queries Team label Aug 6, 2021

mgartner self-assigned this Aug 6, 2021

mgartner changed the title ~~opt: unbounded max cardinality makes number of index column dominate scan cost~~ opt: unbounded max cardinality penalty makes number of index column dominate scan cost Aug 6, 2021

mgartner changed the title ~~opt: unbounded max cardinality penalty makes number of index column dominate scan cost~~ opt: unbounded max cardinality penalty makes number of index columns dominate scan cost Aug 6, 2021

shermanCRL mentioned this issue Aug 6, 2021

sql: monotonic id’s + recency bias leads to hot node #68560

Closed

rytaft mentioned this issue Aug 10, 2021

opt: adjust cost of scan with unbounded cardinality to avoid bad plans #68676

Merged

craig bot closed this as completed in 9a3d1ab Aug 11, 2021

rytaft mentioned this issue Aug 16, 2021

release-21.1: opt: adjust cost of scan with unbounded cardinality to avoid bad plans #68991

Merged

exalate-issue-sync bot unassigned rytaft Sep 7, 2021

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: unbounded max cardinality penalty makes number of index columns dominate scan cost #68556

opt: unbounded max cardinality penalty makes number of index columns dominate scan cost #68556

mgartner commented Aug 6, 2021 •

edited

Loading

mgartner commented Aug 7, 2021

opt: unbounded max cardinality penalty makes number of index columns dominate scan cost #68556

opt: unbounded max cardinality penalty makes number of index columns dominate scan cost #68556

Comments

mgartner commented Aug 6, 2021 • edited Loading

mgartner commented Aug 7, 2021

mgartner commented Aug 6, 2021 •

edited

Loading