-
Notifications
You must be signed in to change notification settings - Fork 613
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(stream,agg): add distinct deduplicater (#7797)
This PR adds a `DistinctDeduplicater` in streaming backend, to support distinct agg in HashAgg and GlobalSimpleAgg. It depends on the state tables inferred in frontend, with one state table for each distinct column. The dedup table schema is like: ``` group key | distinct key | count for agg call 1 | count for agg call 2 | ... ``` Let me explain by an example: ```sql select count(*), -- count star, no need for a dedup table count(distinct a), -- agg call `W`, share a dedup table for distinct column `a` count(distinct a) filter (where c > 1000), -- agg call `X`, share a dedup table for distinct column `a` count(distinct b), -- agg call `Y`, share a dedup table for distinct column `b` count(distinct b) filter (where c > 1000), -- agg call `Z`, share a dedup table for distinct column `b` from t group by d; ``` There'll be two dedup tables: - Dedup table for column `a`: ``` d | a | count_for_W | count_for_X ``` - Dedup table for column `b`: ``` d | b | count_for_Y | count_for_Z ``` Each aggregation group has a `DistinctDeduplicater`, which counts the occurrence of each distinct key for different agg calls according the `visibility` (already applied agg filter and group filter). For every duplicate item/row, `DistinctDeduplicater` hide it in the returned `visibility`. --- Dedup state table cache is not supported yet due to possible concern for memory consumption, may introduce in later PR. The distinct agg support is not enabled yet (`DistinctAggRule` is still rewriting distinct agg calls to 2-phase agg), may enable in later PR. Approved-By: soundOfDestiny Approved-By: st1page Approved-By: kwannoel
- Loading branch information
Showing
23 changed files
with
1,038 additions
and
46 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.