opt: avoid choosing index with unreliable stats #64570

RaduBerinde · 2021-05-03T14:48:38Z

A customer ran into a case where they were doing a single row UPSERT. Instead of choosing the primary index (which would scan at most 1 row), the optimizer is choosing a secondary index. The index is chosen because according to the histogram, the relevant value would have no rows. But the stats are stale and the query actually reads through 100k+ rows from the index.

We have discussed augmenting the cost value with an "uncertainty range", which would address this problem (primary index has <=1 row with 100% certainty, the secondary index has expected 0 rows but with no upper bound). This would be a big change; but I believe we can also consider a more targeted fix, e.g. we could give a heavy cost "discount" to scan operators which have a known cardinality bound (or a penalty to scans with no cardinality bound).

Below is an illustration. The t_y_v_idx index could in principle return any number of rows.

> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
> create table t (x int, y int, v int, index (y, v), primary key (x,y));
> insert into t values (1, 1, 1), (2, 2, 2), (3, 3, 3);
> create statistics foo from t;
> explain upsert into t values (10, 10, 10);
                                             info
----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • filter
          │ estimated row count: 1
          │ filter: x = 10
          │
          └── • scan
                estimated row count: 0 (<0.01% of the table; stats collected 32 seconds ago)
                table: t@t_y_v_idx
                spans: [/10 - /10]

The correct plan would be:

[email protected]:26257/defaultdb> explain upsert into t values (1, 1, 1);
                                         info
---------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • scan
            estimated row count: 1 (31% of the table; stats collected 15 seconds ago)
            table: t@primary
            spans: [/1/1 - /1/1]
            locking strength: for update

gz#9150

Jira issue: CRDB-7134

Jira issue: CRDB-13889

gz#16142

gz#18109

The text was updated successfully, but these errors were encountered:

mgartner · 2021-05-04T17:57:27Z

Given that statistics are always out of date, and they are an estimate, would it be reasonable to put a limit on how low statistics can reduce an estimated row count? For example, what if statistics could lower the row count estimate to no less than 1; only contradictions are guaranteed to make a row count 0. In this case it might have to be a lower limit of 1 plus some epsilon, so that the cost doesn't match the primary index lookup.

mgartner · 2021-05-04T19:00:11Z

It would be nice to get to this in the 21.2 release. cc @kevin-v-ngo @awoods187 for visibility.

awoods187 · 2021-05-05T14:12:18Z

I'm pro doing this in 21.2 provided the opportunity cost isn't too high. Let's discuss level of effort during the milestone planning meeting

RaduBerinde · 2021-05-05T20:45:40Z

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

This commit adds a new cost function, largeCardinalityRowCountPenalty, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes cockroachdb#64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale.

66973: ui: surface the transaction restarts chart r=matthewtodd a=matthewtodd Resolves #65856 Release note (ui change): The KV transaction restarts chart was moved from the "distributed" metrics to the "sql" metrics page so as to be close to the SQL transactions chart, for more prominent visibility. 66979: opt: add cost penalty for scans with large cardinality r=rytaft a=rytaft **opt: ensure we prefer a reverse scan to sorting a forward scan** This commit fixes an issue where in some edge cases the optimizer would prefer sorting the output of a forward scan over performing a reverse scan (when there is no need to sort the output of the reverse scan). Release note (performance improvement): The optimizer now prefers performing a reverse scan over a forward scan + sort if the reverse scan eliminates the need for a sort and the plans are otherwise equivalent. This was the case before in most cases, but some edge cases with a small number of rows have been fixed. **opt: add cost penalty for scans with large cardinality** This commit adds a new cost function, `largeCardinalityRowCountPenalty`, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes #64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale. Co-authored-by: Matthew Todd <[email protected]> Co-authored-by: Rebecca Taft <[email protected]>

This commit adds a new cost function, largeCardinalityRowCountPenalty, which calculates a penalty that should be added to the row count of scans. It is non-zero for expressions with unbounded maximum cardinality or with maximum cardinality exceeding the row count estimate. Adding a few rows worth of cost helps prevent surprising plans for very small tables or for when stats are stale. Fixes cockroachdb#64570 Release note (performance improvement): When choosing between index scans that are estimated to have the same number of rows, the optimizer now prefers indexes for which it has higher certainty about the maximum number of rows over indexes for which there is more uncertainty in the estimated row count. This helps to avoid choosing suboptimal plans for small tables or if the statistics are stale.

rytaft · 2021-07-09T18:10:54Z

Unfortunately it seems like the fix with the cost penalty is not robust enough to handle some variations on this issue. For example, if the alternative plan has cardinality 10, we may not choose it.

This is going to require some more thought, so I'll reopen this issue for now.

mgartner · 2021-07-21T17:28:35Z

Just saw another variation on this. It's similar to the example from Radu above:

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

However, it's slightly different. There is an updated_at column and a FK-like column other_id. The query fetches recently updated rows with a specific other_id, e.g. WHERE other_id = 1234 AND updated_at > now() - '1 hour'. There's a 12-bucket hash sharded index on updated_at and a regular secondary index on other_id.

The histograms for the updated_at column do not include the most recently updated values, so the optimizer prefers to scan the index on updated_at. It estimates that it will only scan 6 rows, but in reality it scans over 100k.

The better plan would be to scan the index on other_id which is estimated to scan only about 150 rows, and in reality only scans 2 rows.

Unfortunately, cardinality cannot help us here because other_id is not unique.

I've run into problems like this running Postgres in the past. Our solution was to make automated stats collections much more aggressive. If a table gets very large, automatic stats collection is very unlikely to run if it is triggered by some % of rows being mutated.

Imagine a table that gets 100k inserts per day. It's been around for 1000 days so it now has 100m rows. With our default sql.stats.automatic_collection.fraction_stale_rows at 0.2, 25m rows need to be inserted to trigger automatic stats (25m stale rows / 125m total rows = 0.2), which won't be for 250 days. That's 25m rows and 250 days worth of values that are missing from histograms.

To make automatic stats much more aggressive, you can set sql.stats.automatic_collection.fraction_stale_rows to 0 and set sql.stats.automatic_collection.min_stale_rows to some number, say 10k. This ensures that no matter how big the table is, you're guaranteed to collect stats for the most recent rows every so often.

Postgres has the ability adjust these stats knobs at the table level. I don't believe we have that ability yet, but it would be useful for this; a user needs the ability to tune these knobs at a per table level based on the workload.

Here's how to tune auto-stats collection in Postgres for a specific table:

ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE t SET (autovacuum_vacuum_threshold = 10000);

itsbilal · 2024-04-30T19:20:50Z

Ran into this on the DRT drt-chaos cluster, so adding O-testcluster. Internal conversation

ajstorm · 2024-05-02T01:02:22Z

Hit it on drt-large too, which resulted in this duplicate issue.

Informs cockroachdb#64570 Informs cockroachdb#130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

139256: sql/rowenc: reduce index key prefix calls r=annrpom a=annrpom ### sql/rowenc: reduce index key prefix calls This patch removes redundant calls to `MakeIndexKeyPrefix` during the construction of `IndexEntry`s by saving each first-time call in a map that we can later lookup. Previously, we would make this call for each row; however, as the prefix (table id + index id) for a particular index remains the same, we do not need to do any recomputation. Epic: [CRDB-42901](https://cockroachlabs.atlassian.net/browse/CRDB-42901) Fixes: #137798 Release note: None --- ### sql/rowexec: run BenchmarkIndexBackfill on-disk Epic: none Release note: None 139955: crosscluster/logical: use large test pool r=rickystewart a=msbutler This package spins up several TestServers. We've also seen many tests timeout, likely due to resource exhaustion. Informs #138277 Informs #139673 Release note: none 140057: sqlsmith: randomly generate auto partial stats in sqlsmith tests r=rytaft a=rytaft Epic: None Release note: None 140065: opt: add `optimizer_min_row_count` session setting r=mgartner a=mgartner Informs #64570 Informs #130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate. 140203: server: split join_list to its own file r=RaduBerinde a=andrewbaptist Previously StoreSpec and JoinListType were in the same file. This commit moves JoinListType and its test to a separate file. Epic: CRDB-41111 Release note: None Co-authored-by: Annie Pompa <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Andrew Baptist <[email protected]>

Informs cockroachdb#64570 Informs cockroachdb#130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

Informs #64570 Informs #130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

Informs cockroachdb#64570 Informs cockroachdb#130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

Informs #64570 Informs #130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

Informs cockroachdb#64570 Informs cockroachdb#130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

Informs #64570 Informs #130201 Release note (sql change): The `optimizer_min_row_count` session setting has been added which sets a lower bound on row count estimates for relational expressions during query planning. A value of zero, which is the default, indicates no lower bound. Note that if this is set to a value greater than zero, a row count of zero can still be estimated for expressions with a cardinality of zero, e.g., for a contradictory filter. Setting this to a value higher than 0, such as 1, may yield better query plans in some cases, such as when statistics are frequently stale and inaccurate.

RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 3, 2021

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

rytaft self-assigned this Jun 28, 2021

rytaft mentioned this issue Jun 28, 2021

opt: add cost penalty for scans with large cardinality #66979

Merged

craig bot closed this as completed in 215cdcb Jun 28, 2021

rytaft mentioned this issue Jul 8, 2021

release-21.1: opt: add cost penalty for scans with large cardinality #67388

Merged

rytaft reopened this Jul 9, 2021

mgartner closed this as completed Jul 21, 2021

mgartner reopened this Jul 21, 2021

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 1, 2024

mgartner mentioned this issue Apr 30, 2024

queries: unexplained plan oscillation when running MR tpcc #123157

Open

itsbilal added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 30, 2024

mgartner mentioned this issue Jan 29, 2025

opt: add optimizer_min_row_count session setting #140065

Merged

blathers-crl bot mentioned this issue Jan 31, 2025

release-25.1: opt: add optimizer_min_row_count session setting #140263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: avoid choosing index with unreliable stats #64570

opt: avoid choosing index with unreliable stats #64570

RaduBerinde commented May 3, 2021 •

edited by DingDLu

Loading

mgartner commented May 4, 2021

mgartner commented May 4, 2021

awoods187 commented May 5, 2021

RaduBerinde commented May 5, 2021

rytaft commented Jul 9, 2021

mgartner commented Jul 21, 2021 •

edited

Loading

itsbilal commented Apr 30, 2024

ajstorm commented May 2, 2024

opt: avoid choosing index with unreliable stats #64570

opt: avoid choosing index with unreliable stats #64570

Comments

RaduBerinde commented May 3, 2021 • edited by DingDLu Loading

mgartner commented May 4, 2021

mgartner commented May 4, 2021

awoods187 commented May 5, 2021

RaduBerinde commented May 5, 2021

rytaft commented Jul 9, 2021

mgartner commented Jul 21, 2021 • edited Loading

itsbilal commented Apr 30, 2024

ajstorm commented May 2, 2024

RaduBerinde commented May 3, 2021 •

edited by DingDLu

Loading

mgartner commented Jul 21, 2021 •

edited

Loading