-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stream,agg): add distinct deduplicater #7797
Conversation
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
This reverts commit 578812a.
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #7797 +/- ##
==========================================
+ Coverage 71.71% 71.78% +0.07%
==========================================
Files 1113 1114 +1
Lines 177694 178366 +672
==========================================
+ Hits 127425 128033 +608
- Misses 50269 50333 +64
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
794bf4b
to
51ce3d5
Compare
Previously in #7797, distinct agg support is added (without cache) but not enabled. This PR enables it by disable 2-phase rewrite rule for streaming distinct agg calls, and also adds an LRU cache in the deduplicater. This will close #7682, and possibly resolve or at least mitigate the performance issue in #7350 and #7271. Approved-By: st1page
Signed-off-by: Richard Chien [email protected]
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR adds a
DistinctDeduplicater
in streaming backend, to support distinct agg in HashAgg and GlobalSimpleAgg. It depends on the state tables inferred in frontend, with one state table for each distinct column. The dedup table schema is like:Let me explain by an example:
There'll be two dedup tables:
a
:b
:Each aggregation group has a
DistinctDeduplicater
, which counts the occurrence of each distinct key for different agg calls according thevisibility
(already applied agg filter and group filter). For every duplicate item/row,DistinctDeduplicater
hide it in the returnedvisibility
.Dedup state table cache is not supported yet due to possible concern for memory consumption, may introduce in later PR.
The distinct agg support is not enabled yet (
DistinctAggRule
is still rewriting distinct agg calls to 2-phase agg), may enable in later PR.Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#7682