-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage] [feature] Support Right join and Full join and inEqui mix. #9907
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9907 +/- ##
============================================
+ Coverage 64.23% 70.29% +6.06%
- Complexity 5127 5682 +555
============================================
Files 1934 1991 +57
Lines 104748 107286 +2538
Branches 15998 16315 +317
============================================
+ Hits 67281 75417 +8136
+ Misses 32597 26586 -6011
- Partials 4870 5283 +413
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please rebase and trigger CI?
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlock.java
Outdated
Show resolved
Hide resolved
public BroadcastRows(List<Object[]> rows, boolean hasMatch) { | ||
_rows = rows; | ||
_hasMatch = hasMatch; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure what's the goal here. but higher level question is do we plan to support inequality outer join conditions.
- if we do not (e.g. only equality outer join): then there's no need to encapsulate these, simply create another hashmap of HashMap<Key, Boolean> might be better.
- if the join type is LEFT, there' no need to maintain this boxed object and the other HashMap<Key, Boolean> will be empty
- if the join type is RIGHT/FULL the sole addition is on the HashMap
- if we want to support inequality join condition, then the hasMap should be attached to every row otherwise the results wont be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively we can also maintain 2 hashmaps
private final HashMap<Key, List<Object[]>> _unmatchedHashTable;
private final HashMap<Key, List<Object[]>> _matchedHashTable;
and for the hash join, simply loop through both
- for LEFT and INNER join, remain the same logic (the _matchHashTable will always be empty and thus no extra cost)
- for RIGHT and FULL join, move a row from _unmatch to _match onc the equality and inequality conditions are applied successfully.
- at the very end when a EOS block is reached on the left. simply output the right _umatched rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work since left join has a bug when there is inequi condition. we have to record _unmatchedLeftRows for left join and full join.
For right join and full join, we have to remember matched rows to output unmatched ones.
I used HashMap<Key, HashSet> _matchedRightRows because we have to use this to look up unmatched rows.
PTAL at the new implementation.
"Join type: " + joinType + " is not supported!"); | ||
_leftKeySelector = joinKeys.getLeftJoinKeySelector(); | ||
_rightKeySelector = joinKeys.getRightJoinKeySelector(); | ||
DataSchema leftSchema, JoinNode node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary. but I made the change to make the ctor simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 got it
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java
Outdated
Show resolved
Hide resolved
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java
Outdated
Show resolved
Hide resolved
return leftBlock; | ||
} | ||
if (leftBlock.isSuccessfulEndOfStreamBlock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest putting the logic of the end of stream handling into a separate function for better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a TODO to follow up.
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java
Show resolved
Hide resolved
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java
Outdated
Show resolved
Hide resolved
@@ -18,12 +20,13 @@ | |||
*/ | |||
package org.apache.pinot.query.runtime.operator; | |||
|
|||
import com.google.common.base.Preconditions; | |||
import com.clearspring.analytics.util.Preconditions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not change this. we widely use guava Precondition checks already. why do we need to include another dependency that does exact same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think it is an accidental change... I am not sure how it happens. updated.
return leftBlock; | ||
} | ||
// TODO: Moved to a different function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 thanks for recording this
if (matchedRightRows == null) { | ||
if (needUnmatchedLeftRows()) { | ||
rows.add(joinRow(leftRow, null)); | ||
} | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest clean up for the logic. no need to add join rows separately for equality or inequality in 2 places
if (matchedRightRows == null) { | |
if (needUnmatchedLeftRows()) { | |
rows.add(joinRow(leftRow, null)); | |
} | |
continue; | |
} | |
boolean hasMatchForLeftRow = false; | |
if (matchedRightRows != null) { | |
// ... | |
} | |
if (!hasMatchForLeftRow && needUnmatchedLeftRows()) { | |
rows.add(joinRow(leftRow, null)); | |
} |
private Object[] joinRow(@Nullable Object[] leftRow, @Nullable Object[] rightRow) { | ||
Object[] resultRow = new Object[_resultRowSize]; | ||
int idx = 0; | ||
for (Object obj : leftRow) { | ||
resultRow[idx++] = obj; | ||
if (leftRow != null) { | ||
for (Object obj : leftRow) { | ||
resultRow[idx++] = obj; | ||
} | ||
} | ||
// This is needed since left row can be null and we need to advance the idx to the beginning of right row. | ||
idx = _leftRowSize; | ||
if (rightRow != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest making this into 3 utils. but can be done in follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. other than the import change the rest of the comments can be address separately.
thank you for addressing the performance issues
merged. please follow up with the perf improvement comments in separate PR. thank you @61yao |
Add full join and right join support.
This PR assumes that the broadcast partitioned right join table can fit in memory.
This PR also modifies the test framework to allow specifying empty output to skip h2 comparison. (needed since h2 doesn't support full outer join)
This PR also fixes the bug for left join and inequi join:
#9986