Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: always include rows with minimum/maximum histogram values in statistics samples #83730

Open
mgartner opened this issue Jul 1, 2022 · 2 comments
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Jul 1, 2022

To collect histograms for columns in a table, we sample up to 10k rows at random. For large tables, this means that there is a significant chance that histograms will not cover a range of minimum or maximum values of a column. Queries for values outside of the minimum and maximum histogram bounds often have poor query plans as a result (see #64570 and #83431).

Ideally, we could retain any rows in the sample that contain a minimum or maximum value for histogram. This would require the sample to keep track of the minimum and maximum values seen of each column and assign a low rank when added to the sampler so that the row is sampled and not dropped. However, the ranking mechanism doesn't seem appropriate because we'd likely end up with a sample containing only rows with values near the minimum and maximum. So, when finding a row with a new minimum or maximum value, we need to evict the previous row with a minimum or maximum from the sample (or assign it a random rank so it can get randomly evicted). Rows containing minimums and maximums of multiple columns complicate this further.

Epic: CRDB-16930

Jira issue: CRDB-17227

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 1, 2022
@exalate-issue-sync exalate-issue-sync bot added the T-sql-queries SQL Queries Team label Aug 3, 2022
@michae2 michae2 added the A-sql-table-stats Table statistics (and their automatic refresh). label Aug 4, 2022
@michae2
Copy link
Collaborator

michae2 commented Aug 4, 2022

This would also let us delete the code causing #76887

Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants