Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: generate stats for multi-column inverted index columns #59687

Merged
merged 1 commit into from
Feb 4, 2021

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Feb 2, 2021

This commit allows stats to be created for columns in multi-column
inverted indexes.

Also, it fixes a bug that caused histograms to not be collected for
non-inverted columns referenced in partial inverted index predicates.

Release note (bug fix): Statistics are now correctly generated for
columns in multi-column inverted indexes and columns referenced in
partial inverted index predicates.

@mgartner mgartner requested a review from rytaft February 2, 2021 00:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

[nit] in the commit/PR message: "fixes a bug the caused histograms" -> "fixes a bug that caused histograms"

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/create_stats.go, line 382 at r1 (raw file):

		for j, n := 0, idx.NumColumns(); j < n; j++ {
			// Only the last column of an inverted index is an inverted column.
			isInverted := idx.GetType() == descpb.IndexDescriptor_INVERTED && j == n-1

wouldn't it be better to check if idx.GetColumnID(j) == idx.InvertedColumnID()?


pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

  [SHOW STATISTICS FOR TABLE multi_col]
ORDER BY
  column_names::STRING, created

Did this test and the one above fail before your change?

This commit allows stats to be created for columns in multi-column
inverted indexes.

Also, it fixes a bug that caused histograms to not be collected for
non-inverted columns referenced in partial inverted index predicates.

Release note (bug fix): Statistics are now correctly generated for
columns in multi-column inverted indexes and columns referenced in
partial inverted index predicates.
@mgartner mgartner force-pushed the multi-column-inverted-idx-stats branch from 26db058 to 59ca9e7 Compare February 4, 2021 20:51
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the commit message.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)


pkg/sql/create_stats.go, line 382 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

wouldn't it be better to check if idx.GetColumnID(j) == idx.InvertedColumnID()?

Totes. Done.


pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Did this test and the one above fail before your change?

Yes. Without the changes in create_stats.go we don't created histograms for all columns we should be. In the first we get:

statistics_name  column_names  row_count  null_count  has_histogram
s                {a}           3          0           true
s                {b}           3          0           true
s                {c}           3          0           true
s                {d}           3          0           false
s                {j}           3          0           true
s                {rowid}       3          0           true

and in the second we get:

column_names  has_histogram
{id}          true
{j}           true
{s}           false

This is because we were passing in isInverted=true to addIndexColumnStatsIfNotExists for non-inverted prefix columns and non-inverted columns in partial inverted index predicates.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft)


pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Yes. Without the changes in create_stats.go we don't created histograms for all columns we should be. In the first we get:

statistics_name  column_names  row_count  null_count  has_histogram
s                {a}           3          0           true
s                {b}           3          0           true
s                {c}           3          0           true
s                {d}           3          0           false
s                {j}           3          0           true
s                {rowid}       3          0           true

and in the second we get:

column_names  has_histogram
{id}          true
{j}           true
{s}           false

This is because we were passing in isInverted=true to addIndexColumnStatsIfNotExists for non-inverted prefix columns and non-inverted columns in partial inverted index predicates.

Also, without the change in distsql_plan_stats.go, CREATE STATISTICS on a table with a multi-column inverted index panics.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/logictest/testdata/logic_test/distsql_stats, line 991 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Also, without the change in distsql_plan_stats.go, CREATE STATISTICS on a table with a multi-column inverted index panics.

Cool - thanks for the explanation

@mgartner
Copy link
Collaborator Author

mgartner commented Feb 4, 2021

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 4, 2021

Build succeeded:

@craig craig bot merged commit 64612f9 into cockroachdb:master Feb 4, 2021
@mgartner mgartner deleted the multi-column-inverted-idx-stats branch February 4, 2021 22:46
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 26, 2021
This is a "forward-port" of regression tests from cockroachdb#64226. This commit
contains no code changes because the bug was already been fixed on
master by cockroachdb#59687.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 26, 2021
This is a "forward-port" of regression tests from cockroachdb#64226. This commit
contains no code changes because the bug was already been fixed on
master by cockroachdb#59687.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 27, 2021
64225: opt: make placeholder fast path conditional on the estimated row count r=RaduBerinde a=RaduBerinde

#### opt: show statistics in placeholder-fast-path tests

This highlights a problem with the fast path: we use statistics that
are derived using placeholders, which are usually terrible.

Release note: None

#### opt: mark columns as constant for equality with placeholder

This commit improves the FD generation for Select when we have filters
like `x = $1`. Because these filters have placeholders, they do not
generate constraints (which is the normal mechanism for detecting
constant columns).

This improves the cardinality property and row count estimate. The
estimate will be used for the placeholder fast path.

Release note: None

#### opt: make placeholder fast path conditional on the estimated row count

This change is best explained by this comment:

```
// We are dealing with a memo that still contains placeholders. The statistics
// for such a memo can be wildly overestimated. Even if our plan is correct,
// the estimated row count for a scan is passed to the execution engine which
// uses it to make sizing decisions. Passing a very high count can affect
// performance significantly (see #64214). So we only use the fast path if the
// estimated row count is small; typically this will happen when we constrain
// columns that form a key and we know there will be at most one row.
```

Fixes #64214.

Release note (bug fix): fixed a performance regression for very simple
queries.

64229: sql: add CREATE STATISTICS regression test r=mgartner a=mgartner

This is a "forward-port" of regression tests from #64226. This commit
contains no code changes because the bug was already been fixed on
master by #59687.

Release note: None

64235: roachtest: update version map and fixtures r=darinpp a=darinpp

This commit adds the recently released 20.1.15
to the version map in PredecessorVersion.

Release note: None (testing change)

Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Darin Peshev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants