opt: add rule to push down agg DISTINCT to input #46899

andy-kimball · 2020-04-01T22:40:47Z

We should add a rule to the optimizer which rewrites queries like this:

CREATE TABLE xy (x INT PRIMARY KEY, y INT, INDEX (y));
INSERT INTO xy SELECT x, x % 100 FROM generate_series(1,100000) t(x);
SELECT COUNT(DISTINCT y) FROM xy;

to this:

SELECT COUNT(*) FROM (SELECT DISTINCT y FROM xy);

The first runs in ~80ms on my laptop. The second runs in ~55ms. The biggest reason for the drop is that the second formulation can take advantage of our streaming group-by execution operator, since the y index can be used.

The text was updated successfully, but these errors were encountered:

jordanlewis · 2020-04-01T22:48:54Z

I think this is only possible when the query has no grouping columns, and all of the aggregates functions are distinct on the same column (or there's just one distinct aggregate function).

If the query has grouping columns, the distinct has to come after the grouping. And if the query has more than one aggregation function, and the functions aren't all distinct on the same column, each function would need to see a different subset of the input - so you couldn't just distinctify the whole input stream.

andy-kimball · 2020-04-01T23:08:15Z

Right this only applies for the ScalarGroupBy operator, which always has exactly 1 group (i.e. grouping column set is empty).

Previously, the optimizer could not take advantage of an index on a variable with a command like the following: SELECT COUNT(DISTINCT y) FROM xy; To address this, PushAggDistinctIntoScalarGroupBy pushes the distinct operation from the aggregate function and into the input of the ScalarGroupBy. Fixes cockroachdb#46899 Release note: None

47589: sql: add a rule to push a distinct modifier into a scalargroupby r=andy-kimball a=DrewKimball Previously, the optimizer could not take advantage of an index on a variable with a command like the following: SELECT COUNT(DISTINCT y) FROM xy; To address this, PushAggDistinctIntoScalarGroupBy pushes the distinct operation from the aggregate function and into the input of the ScalarGroupBy. Fixes #46899 Release note: None Co-authored-by: Drew Kimball <[email protected]>

andy-kimball assigned RaduBerinde Apr 1, 2020

jordanlewis mentioned this issue Apr 2, 2020

exec: add distinct aggregation support #39242

Closed

awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 2, 2020

DrewKimball mentioned this issue Apr 17, 2020

sql: add a rule to push a distinct modifier into a scalargroupby #47589

Merged

craig bot closed this as completed in fd1e57d Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: add rule to push down agg DISTINCT to input #46899

opt: add rule to push down agg DISTINCT to input #46899

andy-kimball commented Apr 1, 2020

jordanlewis commented Apr 1, 2020

andy-kimball commented Apr 1, 2020

opt: add rule to push down agg DISTINCT to input #46899

opt: add rule to push down agg DISTINCT to input #46899

Comments

andy-kimball commented Apr 1, 2020

jordanlewis commented Apr 1, 2020

andy-kimball commented Apr 1, 2020