sql: generate stats for multi-column inverted index columns #59687

mgartner · 2021-02-02T00:57:31Z

This commit allows stats to be created for columns in multi-column
inverted indexes.

Also, it fixes a bug that caused histograms to not be collected for
non-inverted columns referenced in partial inverted index predicates.

Release note (bug fix): Statistics are now correctly generated for
columns in multi-column inverted indexes and columns referenced in
partial inverted index predicates.

cockroach-teamcity · 2021-02-02T00:57:38Z

This change is

rytaft

[nit] in the commit/PR message: "fixes a bug the caused histograms" -> "fixes a bug that caused histograms"

Reviewed 3 of 3 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

pkg/sql/create_stats.go, line 382 at r1 (raw file):

		for j, n := 0, idx.NumColumns(); j < n; j++ {
			// Only the last column of an inverted index is an inverted column.
			isInverted := idx.GetType() == descpb.IndexDescriptor_INVERTED && j == n-1

wouldn't it be better to check if idx.GetColumnID(j) == idx.InvertedColumnID()?

pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

  [SHOW STATISTICS FOR TABLE multi_col]
ORDER BY
  column_names::STRING, created

Did this test and the one above fail before your change?

This commit allows stats to be created for columns in multi-column inverted indexes. Also, it fixes a bug that caused histograms to not be collected for non-inverted columns referenced in partial inverted index predicates. Release note (bug fix): Statistics are now correctly generated for columns in multi-column inverted indexes and columns referenced in partial inverted index predicates.

mgartner

Fixed the commit message.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

pkg/sql/create_stats.go, line 382 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

wouldn't it be better to check if idx.GetColumnID(j) == idx.InvertedColumnID()?

Totes. Done.

pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Did this test and the one above fail before your change?

Yes. Without the changes in create_stats.go we don't created histograms for all columns we should be. In the first we get:

statistics_name  column_names  row_count  null_count  has_histogram
s                {a}           3          0           true
s                {b}           3          0           true
s                {c}           3          0           true
s                {d}           3          0           false
s                {j}           3          0           true
s                {rowid}       3          0           true

and in the second we get:

column_names  has_histogram
{id}          true
{j}           true
{s}           false

This is because we were passing in isInverted=true to addIndexColumnStatsIfNotExists for non-inverted prefix columns and non-inverted columns in partial inverted index predicates.

mgartner

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)

pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Yes. Without the changes in create_stats.go we don't created histograms for all columns we should be. In the first we get:
statistics_name  column_names  row_count  null_count  has_histogram
s                {a}           3          0           true
s                {b}           3          0           true
s                {c}           3          0           true
s                {d}           3          0           false
s                {j}           3          0           true
s                {rowid}       3          0           true
and in the second we get:
column_names  has_histogram
{id}          true
{j}           true
{s}           false
This is because we were passing in isInverted=true to addIndexColumnStatsIfNotExists for non-inverted prefix columns and non-inverted columns in partial inverted index predicates.

Also, without the change in distsql_plan_stats.go, CREATE STATISTICS on a table with a multi-column inverted index panics.

rytaft

Reviewed 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Also, without the change in distsql_plan_stats.go, CREATE STATISTICS on a table with a multi-column inverted index panics.

Cool - thanks for the explanation

mgartner · 2021-02-04T21:47:43Z

TFTR!

bors r+

craig · 2021-02-04T22:42:57Z

Build succeeded:

GitHub CI (Cockroach)

This is a "forward-port" of regression tests from cockroachdb#64226. This commit contains no code changes because the bug was already been fixed on master by cockroachdb#59687. Release note: None

64225: opt: make placeholder fast path conditional on the estimated row count r=RaduBerinde a=RaduBerinde #### opt: show statistics in placeholder-fast-path tests This highlights a problem with the fast path: we use statistics that are derived using placeholders, which are usually terrible. Release note: None #### opt: mark columns as constant for equality with placeholder This commit improves the FD generation for Select when we have filters like `x = $1`. Because these filters have placeholders, they do not generate constraints (which is the normal mechanism for detecting constant columns). This improves the cardinality property and row count estimate. The estimate will be used for the placeholder fast path. Release note: None #### opt: make placeholder fast path conditional on the estimated row count This change is best explained by this comment: ``` // We are dealing with a memo that still contains placeholders. The statistics // for such a memo can be wildly overestimated. Even if our plan is correct, // the estimated row count for a scan is passed to the execution engine which // uses it to make sizing decisions. Passing a very high count can affect // performance significantly (see #64214). So we only use the fast path if the // estimated row count is small; typically this will happen when we constrain // columns that form a key and we know there will be at most one row. ``` Fixes #64214. Release note (bug fix): fixed a performance regression for very simple queries. 64229: sql: add CREATE STATISTICS regression test r=mgartner a=mgartner This is a "forward-port" of regression tests from #64226. This commit contains no code changes because the bug was already been fixed on master by #59687. Release note: None 64235: roachtest: update version map and fixtures r=darinpp a=darinpp This commit adds the recently released 20.1.15 to the version map in PredecessorVersion. Release note: None (testing change) Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Darin Peshev <[email protected]>

mgartner requested a review from rytaft February 2, 2021 00:57

rytaft approved these changes Feb 3, 2021

View reviewed changes

mgartner force-pushed the multi-column-inverted-idx-stats branch from 26db058 to 59ca9e7 Compare February 4, 2021 20:51

mgartner commented Feb 4, 2021

View reviewed changes

rytaft approved these changes Feb 4, 2021

View reviewed changes

craig bot merged commit 64612f9 into cockroachdb:master Feb 4, 2021

mgartner deleted the multi-column-inverted-idx-stats branch February 4, 2021 22:46

mgartner mentioned this pull request Apr 26, 2021

sql: add CREATE STATISTICS regression test #64229

Merged

mgartner mentioned this pull request Apr 26, 2021

release-21.1: sql: add CREATE STATISTICS regression test #64231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: generate stats for multi-column inverted index columns #59687

sql: generate stats for multi-column inverted index columns #59687

mgartner commented Feb 2, 2021 •

edited

Loading

cockroach-teamcity commented Feb 2, 2021

rytaft left a comment

mgartner left a comment

mgartner left a comment

rytaft left a comment

mgartner commented Feb 4, 2021

craig bot commented Feb 4, 2021

sql: generate stats for multi-column inverted index columns #59687

sql: generate stats for multi-column inverted index columns #59687

Conversation

mgartner commented Feb 2, 2021 • edited Loading

cockroach-teamcity commented Feb 2, 2021

rytaft left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

mgartner commented Feb 4, 2021

craig bot commented Feb 4, 2021

mgartner commented Feb 2, 2021 •

edited

Loading