sql: always include rows with minimum/maximum histogram values in statistics samples #83730

mgartner · 2022-07-01T21:21:28Z

To collect histograms for columns in a table, we sample up to 10k rows at random. For large tables, this means that there is a significant chance that histograms will not cover a range of minimum or maximum values of a column. Queries for values outside of the minimum and maximum histogram bounds often have poor query plans as a result (see #64570 and #83431).

Ideally, we could retain any rows in the sample that contain a minimum or maximum value for histogram. This would require the sample to keep track of the minimum and maximum values seen of each column and assign a low rank when added to the sampler so that the row is sampled and not dropped. However, the ranking mechanism doesn't seem appropriate because we'd likely end up with a sample containing only rows with values near the minimum and maximum. So, when finding a row with a new minimum or maximum value, we need to evict the previous row with a minimum or maximum from the sample (or assign it a random rank so it can get randomly evicted). Rows containing minimums and maximums of multiple columns complicate this further.

Epic: CRDB-16930

Jira issue: CRDB-17227

michae2 · 2022-08-04T03:55:42Z

This would also let us delete the code causing #76887

github-actions · 2024-01-29T11:04:37Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 1, 2022

exalate-issue-sync bot added the T-sql-queries SQL Queries Team label Aug 3, 2022

michae2 added the A-sql-table-stats Table statistics (and their automatic refresh). label Aug 4, 2022

michae2 mentioned this issue Sep 28, 2022

TestExecBuild_forecast1401 fails on ARM64 #88893

Closed

michae2 mentioned this issue Dec 6, 2022

sql/stats: addOuterBuckets logic breaks CREATE STATISTICS USING EXTREMES #93094

Closed

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Backlog (DO NOT ADD NEW ISSUES) in SQL Queries Jul 24, 2023

github-actions bot added no-issue-activity and removed no-issue-activity labels Jan 29, 2024

mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: always include rows with minimum/maximum histogram values in statistics samples #83730

sql: always include rows with minimum/maximum histogram values in statistics samples #83730

mgartner commented Jul 1, 2022 •

edited by vy-ton

Loading

michae2 commented Aug 4, 2022

github-actions bot commented Jan 29, 2024

sql: always include rows with minimum/maximum histogram values in statistics samples #83730

sql: always include rows with minimum/maximum histogram values in statistics samples #83730

Comments

mgartner commented Jul 1, 2022 • edited by vy-ton Loading

michae2 commented Aug 4, 2022

github-actions bot commented Jan 29, 2024

mgartner commented Jul 1, 2022 •

edited by vy-ton

Loading