Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: avoid choosing index with unreliable stats #64570

Open
RaduBerinde opened this issue May 3, 2021 · 39 comments · Fixed by #66979
Open

opt: avoid choosing index with unreliable stats #64570

RaduBerinde opened this issue May 3, 2021 · 39 comments · Fixed by #66979
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-queries SQL Queries Team

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented May 3, 2021

A customer ran into a case where they were doing a single row UPSERT. Instead of choosing the primary index (which would scan at most 1 row), the optimizer is choosing a secondary index. The index is chosen because according to the histogram, the relevant value would have no rows. But the stats are stale and the query actually reads through 100k+ rows from the index.

We have discussed augmenting the cost value with an "uncertainty range", which would address this problem (primary index has <=1 row with 100% certainty, the secondary index has expected 0 rows but with no upper bound). This would be a big change; but I believe we can also consider a more targeted fix, e.g. we could give a heavy cost "discount" to scan operators which have a known cardinality bound (or a penalty to scans with no cardinality bound).

Below is an illustration. The t_y_v_idx index could in principle return any number of rows.

> SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
> create table t (x int, y int, v int, index (y, v), primary key (x,y));
> insert into t values (1, 1, 1), (2, 2, 2), (3, 3, 3);
> create statistics foo from t;
> explain upsert into t values (10, 10, 10);
                                             info
----------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • filter
          │ estimated row count: 1
          │ filter: x = 10
          │
          └── • scan
                estimated row count: 0 (<0.01% of the table; stats collected 32 seconds ago)
                table: t@t_y_v_idx
                spans: [/10 - /10]

The correct plan would be:

[email protected]:26257/defaultdb> explain upsert into t values (1, 1, 1);
                                         info
---------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • upsert
  │ into: t(x, y, v)
  │ auto commit
  │ arbiter indexes: primary
  │
  └── • cross join (left outer)
      │ estimated row count: 1
      │
      ├── • values
      │     size: 3 columns, 1 row
      │
      └── • scan
            estimated row count: 1 (31% of the table; stats collected 15 seconds ago)
            table: t@primary
            spans: [/1/1 - /1/1]
            locking strength: for update

gz#9150

Jira issue: CRDB-7134

Jira issue: CRDB-13889

gz#16142

gz#18109

@RaduBerinde RaduBerinde added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label May 3, 2021
@mgartner
Copy link
Collaborator

mgartner commented May 4, 2021

Given that statistics are always out of date, and they are an estimate, would it be reasonable to put a limit on how low statistics can reduce an estimated row count? For example, what if statistics could lower the row count estimate to no less than 1; only contradictions are guaranteed to make a row count 0. In this case it might have to be a lower limit of 1 plus some epsilon, so that the cost doesn't match the primary index lookup.

@mgartner
Copy link
Collaborator

mgartner commented May 4, 2021

It would be nice to get to this in the 21.2 release. cc @kevin-v-ngo @awoods187 for visibility.

@awoods187
Copy link
Contributor

I'm pro doing this in 21.2 provided the opportunity cost isn't too high. Let's discuss level of effort during the milestone planning meeting

@RaduBerinde
Copy link
Member Author

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@rytaft rytaft self-assigned this Jun 28, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 28, 2021
This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes cockroachdb#64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 28, 2021
This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes cockroachdb#64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.
craig bot pushed a commit that referenced this issue Jun 28, 2021
66973: ui: surface the transaction restarts chart r=matthewtodd a=matthewtodd

Resolves #65856
    
Release note (ui change): The KV transaction restarts chart was moved from the "distributed" metrics to the "sql" metrics page so as to be close to the SQL transactions chart, for more prominent visibility.

66979: opt: add cost penalty for scans with large cardinality r=rytaft a=rytaft

**opt: ensure we prefer a reverse scan to sorting a forward scan**

This commit fixes an issue where in some edge cases the optimizer would
prefer sorting the output of a forward scan over performing a reverse scan
(when there is no need to sort the output of the reverse scan).

Release note (performance improvement): The optimizer now prefers
performing a reverse scan over a forward scan + sort if the reverse
scan eliminates the need for a sort and the plans are otherwise
equivalent. This was the case before in most cases, but some edge
cases with a small number of rows have been fixed.

**opt: add cost penalty for scans with large cardinality**

This commit adds a new cost function, `largeCardinalityRowCountPenalty`,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes #64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.

Co-authored-by: Matthew Todd <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
@craig craig bot closed this as completed in 215cdcb Jun 28, 2021
rytaft added a commit to rytaft/cockroach that referenced this issue Jul 8, 2021
This commit adds a new cost function, largeCardinalityRowCountPenalty,
which calculates a penalty that should be added to the row count of scans.
It is non-zero for expressions with unbounded maximum cardinality or with
maximum cardinality exceeding the row count estimate. Adding a few rows
worth of cost helps prevent surprising plans for very small tables or for
when stats are stale.

Fixes cockroachdb#64570

Release note (performance improvement): When choosing between index
scans that are estimated to have the same number of rows, the optimizer
now prefers indexes for which it has higher certainty about the maximum
number of rows over indexes for which there is more uncertainty in the
estimated row count. This helps to avoid choosing suboptimal plans for
small tables or if the statistics are stale.
@rytaft
Copy link
Collaborator

rytaft commented Jul 9, 2021

Unfortunately it seems like the fix with the cost penalty is not robust enough to handle some variations on this issue. For example, if the alternative plan has cardinality 10, we may not choose it.

This is going to require some more thought, so I'll reopen this issue for now.

@rytaft rytaft reopened this Jul 9, 2021
@mgartner
Copy link
Collaborator

mgartner commented Jul 21, 2021

Just saw another variation on this. It's similar to the example from Radu above:

Saw another instance of this in the wild. The use case involved a "status" column and a partial index that restricts the status to "pending". The vast majority of rows have "done" status, with occasionally a few thousand rows showing as "pending". The automatic stats can happen to see zero "pending" rows or see a few thousand of them, depending on timing. When stats show zero rows, the partial index is estimated to be empty and becomes an enticing index for the optimizer. In this case, the reliable plan was to use the PK which guaranteed that we scan at most one row.

However, it's slightly different. There is an updated_at column and a FK-like column other_id. The query fetches recently updated rows with a specific other_id, e.g. WHERE other_id = 1234 AND updated_at > now() - '1 hour'. There's a 12-bucket hash sharded index on updated_at and a regular secondary index on other_id.

The histograms for the updated_at column do not include the most recently updated values, so the optimizer prefers to scan the index on updated_at. It estimates that it will only scan 6 rows, but in reality it scans over 100k.

The better plan would be to scan the index on other_id which is estimated to scan only about 150 rows, and in reality only scans 2 rows.

Unfortunately, cardinality cannot help us here because other_id is not unique.

I've run into problems like this running Postgres in the past. Our solution was to make automated stats collections much more aggressive. If a table gets very large, automatic stats collection is very unlikely to run if it is triggered by some % of rows being mutated.

Imagine a table that gets 100k inserts per day. It's been around for 1000 days so it now has 100m rows. With our default sql.stats.automatic_collection.fraction_stale_rows at 0.2, 25m rows need to be inserted to trigger automatic stats (25m stale rows / 125m total rows = 0.2), which won't be for 250 days. That's 25m rows and 250 days worth of values that are missing from histograms.

To make automatic stats much more aggressive, you can set sql.stats.automatic_collection.fraction_stale_rows to 0 and set sql.stats.automatic_collection.min_stale_rows to some number, say 10k. This ensures that no matter how big the table is, you're guaranteed to collect stats for the most recent rows every so often.

Postgres has the ability adjust these stats knobs at the table level. I don't believe we have that ability yet, but it would be useful for this; a user needs the ability to tune these knobs at a per table level based on the workload.

Here's how to tune auto-stats collection in Postgres for a specific table:

ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE t SET (autovacuum_vacuum_threshold = 10000);

@mgartner mgartner reopened this Jul 21, 2021
@mgartner mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023
@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 1, 2024
@itsbilal itsbilal added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 30, 2024
@itsbilal
Copy link
Contributor

Ran into this on the DRT drt-chaos cluster, so adding O-testcluster. Internal conversation

@ajstorm
Copy link
Collaborator

ajstorm commented May 2, 2024

Hit it on drt-large too, which resulted in this duplicate issue.

mgartner added a commit to mgartner/cockroach that referenced this issue Jan 29, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 30, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
craig bot pushed a commit that referenced this issue Jan 31, 2025
139256: sql/rowenc: reduce index key prefix calls r=annrpom a=annrpom

### sql/rowenc: reduce index key prefix calls
This patch removes redundant calls to `MakeIndexKeyPrefix` during
the construction of `IndexEntry`s by saving each first-time call in a
map that we can later lookup. Previously, we would make this call
for each row; however, as the prefix (table id + index id) for a
particular index remains the same, we do not need to do any
recomputation.

Epic: [CRDB-42901](https://cockroachlabs.atlassian.net/browse/CRDB-42901)
Fixes: #137798

Release note: None

---

### sql/rowexec: run BenchmarkIndexBackfill on-disk
Epic: none
Release note: None


139955: crosscluster/logical: use large test pool r=rickystewart a=msbutler

This package spins up several TestServers. We've also seen many tests timeout, likely due to resource exhaustion.

Informs #138277
Informs #139673

Release note: none

140057: sqlsmith: randomly generate auto partial stats in sqlsmith tests r=rytaft a=rytaft

Epic: None

Release note: None

140065: opt: add `optimizer_min_row_count` session setting r=mgartner a=mgartner

Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.


140203: server: split join_list to its own file r=RaduBerinde a=andrewbaptist

Previously StoreSpec and JoinListType were in the same file. This commit moves JoinListType and its test to a separate file.

Epic: CRDB-41111

Release note: None

Co-authored-by: Annie Pompa <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Rebecca Taft <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
blathers-crl bot pushed a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
blathers-crl bot pushed a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
blathers-crl bot pushed a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
blathers-crl bot pushed a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
blathers-crl bot pushed a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 31, 2025
Informs cockroachdb#64570
Informs cockroachdb#130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
mgartner added a commit that referenced this issue Jan 31, 2025
Informs #64570
Informs #130201

Release note (sql change): The `optimizer_min_row_count` session setting
has been added which sets a lower bound on row count estimates for
relational expressions during query planning. A value of zero, which is
the default, indicates no lower bound. Note that if this is set to a
value greater than zero, a row count of zero can still be estimated for
expressions with a cardinality of zero, e.g., for a contradictory
filter. Setting this to a value higher than 0, such as 1, may yield
better query plans in some cases, such as when statistics are frequently
stale and inaccurate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.