-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: generate stats for multi-column inverted index columns #59687
sql: generate stats for multi-column inverted index columns #59687
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] in the commit/PR message: "fixes a bug the caused histograms" -> "fixes a bug that caused histograms"
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/create_stats.go, line 382 at r1 (raw file):
for j, n := 0, idx.NumColumns(); j < n; j++ { // Only the last column of an inverted index is an inverted column. isInverted := idx.GetType() == descpb.IndexDescriptor_INVERTED && j == n-1
wouldn't it be better to check if idx.GetColumnID(j) == idx.InvertedColumnID()
?
pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):
[SHOW STATISTICS FOR TABLE multi_col] ORDER BY column_names::STRING, created
Did this test and the one above fail before your change?
This commit allows stats to be created for columns in multi-column inverted indexes. Also, it fixes a bug that caused histograms to not be collected for non-inverted columns referenced in partial inverted index predicates. Release note (bug fix): Statistics are now correctly generated for columns in multi-column inverted indexes and columns referenced in partial inverted index predicates.
26db058
to
59ca9e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the commit message.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/create_stats.go, line 382 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
wouldn't it be better to check if
idx.GetColumnID(j) == idx.InvertedColumnID()
?
Totes. Done.
pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Did this test and the one above fail before your change?
Yes. Without the changes in create_stats.go
we don't created histograms for all columns we should be. In the first we get:
statistics_name column_names row_count null_count has_histogram
s {a} 3 0 true
s {b} 3 0 true
s {c} 3 0 true
s {d} 3 0 false
s {j} 3 0 true
s {rowid} 3 0 true
and in the second we get:
column_names has_histogram
{id} true
{j} true
{s} false
This is because we were passing in isInverted=true
to addIndexColumnStatsIfNotExists
for non-inverted prefix columns and non-inverted columns in partial inverted index predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)
pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Yes. Without the changes in
create_stats.go
we don't created histograms for all columns we should be. In the first we get:statistics_name column_names row_count null_count has_histogram s {a} 3 0 true s {b} 3 0 true s {c} 3 0 true s {d} 3 0 false s {j} 3 0 true s {rowid} 3 0 true
and in the second we get:
column_names has_histogram {id} true {j} true {s} false
This is because we were passing in
isInverted=true
toaddIndexColumnStatsIfNotExists
for non-inverted prefix columns and non-inverted columns in partial inverted index predicates.
Also, without the change in distsql_plan_stats.go
, CREATE STATISTICS
on a table with a multi-column inverted index panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained
pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Also, without the change in
distsql_plan_stats.go
,CREATE STATISTICS
on a table with a multi-column inverted index panics.
Cool - thanks for the explanation
TFTR! bors r+ |
Build succeeded: |
This is a "forward-port" of regression tests from cockroachdb#64226. This commit contains no code changes because the bug was already been fixed on master by cockroachdb#59687. Release note: None
This is a "forward-port" of regression tests from cockroachdb#64226. This commit contains no code changes because the bug was already been fixed on master by cockroachdb#59687. Release note: None
64225: opt: make placeholder fast path conditional on the estimated row count r=RaduBerinde a=RaduBerinde #### opt: show statistics in placeholder-fast-path tests This highlights a problem with the fast path: we use statistics that are derived using placeholders, which are usually terrible. Release note: None #### opt: mark columns as constant for equality with placeholder This commit improves the FD generation for Select when we have filters like `x = $1`. Because these filters have placeholders, they do not generate constraints (which is the normal mechanism for detecting constant columns). This improves the cardinality property and row count estimate. The estimate will be used for the placeholder fast path. Release note: None #### opt: make placeholder fast path conditional on the estimated row count This change is best explained by this comment: ``` // We are dealing with a memo that still contains placeholders. The statistics // for such a memo can be wildly overestimated. Even if our plan is correct, // the estimated row count for a scan is passed to the execution engine which // uses it to make sizing decisions. Passing a very high count can affect // performance significantly (see #64214). So we only use the fast path if the // estimated row count is small; typically this will happen when we constrain // columns that form a key and we know there will be at most one row. ``` Fixes #64214. Release note (bug fix): fixed a performance regression for very simple queries. 64229: sql: add CREATE STATISTICS regression test r=mgartner a=mgartner This is a "forward-port" of regression tests from #64226. This commit contains no code changes because the bug was already been fixed on master by #59687. Release note: None 64235: roachtest: update version map and fixtures r=darinpp a=darinpp This commit adds the recently released 20.1.15 to the version map in PredecessorVersion. Release note: None (testing change) Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Darin Peshev <[email protected]>
This commit allows stats to be created for columns in multi-column
inverted indexes.
Also, it fixes a bug that caused histograms to not be collected for
non-inverted columns referenced in partial inverted index predicates.
Release note (bug fix): Statistics are now correctly generated for
columns in multi-column inverted indexes and columns referenced in
partial inverted index predicates.