-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hash join): trivial interval join (close #9228) #9229
Conversation
c51020f
to
4ec45f6
Compare
d46a3d1
to
201e125
Compare
Codecov Report
@@ Coverage Diff @@
## main #9229 +/- ##
==========================================
- Coverage 70.82% 70.78% -0.05%
==========================================
Files 1218 1218
Lines 202489 202886 +397
==========================================
+ Hits 143419 143617 +198
- Misses 59070 59269 +199
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
micro benchmark result: (nexmark q7) PR:
main:
|
393eaf1
to
6090a30
Compare
src/stream/src/executor/hash_join.rs
Outdated
// TODO: We can use binary search to start matching with | ||
// `match_band_res==true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this PR changes the join state's pk encoding, but it doesn't seem to utilize it properly. In my mind, we should use the range()
interface of the BTreeMap
to fetch the exact data according to the band condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this PR changes the join state's pk encoding, but it doesn't seem to utilize it properly. In my mind, we should use the
range()
interface of theBTreeMap
to fetch the exact data according to the band condition.
fixed ugly.
} else if self.clean_left_state_conjunction_idx.is_some() | ||
&& self.clean_right_state_conjunction_idx.is_some() | ||
{ | ||
} else if self.band_condition.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the user's perspective, StreamIntervalJoin
means it can utilize the input's watermark by band conditions. If there are no watermarks, no difference for users, so let's change it back to StreamHashJoin
. BandJoin
is just an optimization for our implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the user's perspective,
StreamIntervalJoin
means it can utilize the input's watermark by band conditions. If there are no watermarks, no difference for users, so let's change it back toStreamHashJoin
.BandJoin
is just an optimization for our implementation.
fixed.
@@ -306,6 +313,7 @@ message HashJoinNode { | |||
// Whether to optimize for append only stream. | |||
// It is true when the input is append-only | |||
bool is_append_only = 14; | |||
BandJoinCondition band_condition = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another field inequality_pairs
which has similar functionality to band_condition
. Is it possible to merge them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another field
inequality_pairs
which has similar functionality toband_condition
. Is it possible to merge them?
too hard for me 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭 Will take a look in detail later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😭 Will take a look in detail later.
better not. The code is too ugly 😭
d0c2ea8
to
692d9b8
Compare
Actually, I haven't seen a performance improvement in the nexmark q7. Maybe we need another case to make this feature more convincing. |
The equal condition in q7 is |
e6257e1
to
3fd9be3
Compare
closed due to conversation with chenzl25 and the talk in all-hand meeting |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Leverage band conditions to speed up interval join.
View risingwavelabs/rfcs#32 for details.
Refactor pk of state table and degree table in hash join:
join key || band key || input pk
We can then skip rows in join matching process in
HashJoinExecutor
.Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note