-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: add rule to push down agg DISTINCT to input #46899
Comments
I think this is only possible when the query has no grouping columns, and all of the aggregates functions are distinct on the same column (or there's just one distinct aggregate function). If the query has grouping columns, the distinct has to come after the grouping. And if the query has more than one aggregation function, and the functions aren't all distinct on the same column, each function would need to see a different subset of the input - so you couldn't just distinctify the whole input stream. |
Right this only applies for the |
Previously, the optimizer could not take advantage of an index on a variable with a command like the following: SELECT COUNT(DISTINCT y) FROM xy; To address this, PushAggDistinctIntoScalarGroupBy pushes the distinct operation from the aggregate function and into the input of the ScalarGroupBy. Fixes cockroachdb#46899 Release note: None
Previously, the optimizer could not take advantage of an index on a variable with a command like the following: SELECT COUNT(DISTINCT y) FROM xy; To address this, PushAggDistinctIntoScalarGroupBy pushes the distinct operation from the aggregate function and into the input of the ScalarGroupBy. Fixes cockroachdb#46899 Release note: None
47589: sql: add a rule to push a distinct modifier into a scalargroupby r=andy-kimball a=DrewKimball Previously, the optimizer could not take advantage of an index on a variable with a command like the following: SELECT COUNT(DISTINCT y) FROM xy; To address this, PushAggDistinctIntoScalarGroupBy pushes the distinct operation from the aggregate function and into the input of the ScalarGroupBy. Fixes #46899 Release note: None Co-authored-by: Drew Kimball <[email protected]>
We should add a rule to the optimizer which rewrites queries like this:
to this:
The first runs in ~80ms on my laptop. The second runs in ~55ms. The biggest reason for the drop is that the second formulation can take advantage of our streaming group-by execution operator, since the
y
index can be used.The text was updated successfully, but these errors were encountered: