Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add rule to push down agg DISTINCT to input #46899

Closed
andy-kimball opened this issue Apr 1, 2020 · 2 comments · Fixed by #47589
Closed

opt: add rule to push down agg DISTINCT to input #46899

andy-kimball opened this issue Apr 1, 2020 · 2 comments · Fixed by #47589
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@andy-kimball
Copy link
Contributor

We should add a rule to the optimizer which rewrites queries like this:

CREATE TABLE xy (x INT PRIMARY KEY, y INT, INDEX (y));
INSERT INTO xy SELECT x, x % 100 FROM generate_series(1,100000) t(x);
SELECT COUNT(DISTINCT y) FROM xy;

to this:

SELECT COUNT(*) FROM (SELECT DISTINCT y FROM xy);

The first runs in ~80ms on my laptop. The second runs in ~55ms. The biggest reason for the drop is that the second formulation can take advantage of our streaming group-by execution operator, since the y index can be used.

@jordanlewis
Copy link
Member

I think this is only possible when the query has no grouping columns, and all of the aggregates functions are distinct on the same column (or there's just one distinct aggregate function).

If the query has grouping columns, the distinct has to come after the grouping. And if the query has more than one aggregation function, and the functions aren't all distinct on the same column, each function would need to see a different subset of the input - so you couldn't just distinctify the whole input stream.

@andy-kimball
Copy link
Contributor Author

Right this only applies for the ScalarGroupBy operator, which always has exactly 1 group (i.e. grouping column set is empty).

@awoods187 awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 2, 2020
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Apr 17, 2020
Previously, the optimizer could not take advantage of an index on
a variable with a command like the following:

SELECT COUNT(DISTINCT y) FROM xy;

To address this, PushAggDistinctIntoScalarGroupBy pushes the
distinct operation from the aggregate function and into the
input of the ScalarGroupBy.

Fixes cockroachdb#46899

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Apr 17, 2020
Previously, the optimizer could not take advantage of an index on
a variable with a command like the following:

SELECT COUNT(DISTINCT y) FROM xy;

To address this, PushAggDistinctIntoScalarGroupBy pushes the
distinct operation from the aggregate function and into the
input of the ScalarGroupBy.

Fixes cockroachdb#46899

Release note: None
craig bot pushed a commit that referenced this issue Apr 17, 2020
47589: sql: add a rule to push a distinct modifier into a scalargroupby r=andy-kimball a=DrewKimball

Previously, the optimizer could not take advantage of an index on
a variable with a command like the following:

SELECT COUNT(DISTINCT y) FROM xy;

To address this, PushAggDistinctIntoScalarGroupBy pushes the
distinct operation from the aggregate function and into the
input of the ScalarGroupBy.

Fixes #46899

Release note: None

Co-authored-by: Drew Kimball <[email protected]>
@craig craig bot closed this as completed in fd1e57d Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants