Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage] [feature] Support Right join and Full join and inEqui mix. #9907

Merged
merged 10 commits into from
Dec 16, 2022

Conversation

61yao
Copy link
Contributor

@61yao 61yao commented Dec 4, 2022

Add full join and right join support.

This PR assumes that the broadcast partitioned right join table can fit in memory.
This PR also modifies the test framework to allow specifying empty output to skip h2 comparison. (needed since h2 doesn't support full outer join)

This PR also fixes the bug for left join and inequi join:
#9986

@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2022

Codecov Report

Merging #9907 (243a9c5) into master (e092140) will increase coverage by 6.06%.
The diff coverage is 95.23%.

@@             Coverage Diff              @@
##             master    #9907      +/-   ##
============================================
+ Coverage     64.23%   70.29%   +6.06%     
- Complexity     5127     5682     +555     
============================================
  Files          1934     1991      +57     
  Lines        104748   107286    +2538     
  Branches      15998    16315     +317     
============================================
+ Hits          67281    75417    +8136     
+ Misses        32597    26586    -6011     
- Partials       4870     5283     +413     
Flag Coverage Δ
integration1 25.00% <0.00%> (?)
integration2 24.36% <0.00%> (?)
unittests1 67.87% <95.23%> (+<0.01%) ⬆️
unittests2 13.60% <0.00%> (-2.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/apache/pinot/query/service/QueryDispatcher.java 84.37% <ø> (ø)
...pinot/query/runtime/operator/HashJoinOperator.java 92.79% <95.08%> (-0.63%) ⬇️
.../pinot/query/runtime/blocks/TransferableBlock.java 72.91% <100.00%> (+0.57%) ⬆️
.../pinot/query/runtime/plan/PhysicalPlanVisitor.java 97.05% <100.00%> (-0.09%) ⬇️
.../mailbox/channel/MailboxContentStreamObserver.java 61.40% <0.00%> (-8.60%) ⬇️
...operator/blocks/results/ExceptionResultsBlock.java 66.66% <0.00%> (-8.34%) ⬇️
...e/operator/blocks/results/ExplainResultsBlock.java 70.37% <0.00%> (-2.71%) ⬇️
...pache/pinot/spi/env/CommonsConfigurationUtils.java 73.07% <0.00%> (-1.93%) ⬇️
.../aggregation/function/ModeAggregationFunction.java 86.75% <0.00%> (-1.90%) ⬇️
.../columnminmaxvalue/ColumnMinMaxValueGenerator.java 72.60% <0.00%> (-1.09%) ⬇️
... and 461 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@walterddr walterddr added the multi-stage Related to the multi-stage query engine label Dec 8, 2022
@61yao 61yao changed the title [multistage] [feature] Support Full join [multistage] [feature] Support Right join and Full join Dec 9, 2022
Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please rebase and trigger CI?

Comment on lines 57 to 60
public BroadcastRows(List<Object[]> rows, boolean hasMatch) {
_rows = rows;
_hasMatch = hasMatch;
}
Copy link
Contributor

@walterddr walterddr Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure what's the goal here. but higher level question is do we plan to support inequality outer join conditions.

  1. if we do not (e.g. only equality outer join): then there's no need to encapsulate these, simply create another hashmap of HashMap<Key, Boolean> might be better.
    • if the join type is LEFT, there' no need to maintain this boxed object and the other HashMap<Key, Boolean> will be empty
    • if the join type is RIGHT/FULL the sole addition is on the HashMap
  2. if we want to support inequality join condition, then the hasMap should be attached to every row otherwise the results wont be correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively we can also maintain 2 hashmaps

private final HashMap<Key, List<Object[]>> _unmatchedHashTable;
private final HashMap<Key, List<Object[]>> _matchedHashTable;

and for the hash join, simply loop through both

  1. for LEFT and INNER join, remain the same logic (the _matchHashTable will always be empty and thus no extra cost)
  2. for RIGHT and FULL join, move a row from _unmatch to _match onc the equality and inequality conditions are applied successfully.
    • at the very end when a EOS block is reached on the left. simply output the right _umatched rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work since left join has a bug when there is inequi condition. we have to record _unmatchedLeftRows for left join and full join.

For right join and full join, we have to remember matched rows to output unmatched ones.
I used HashMap<Key, HashSet> _matchedRightRows because we have to use this to look up unmatched rows.

PTAL at the new implementation.

"Join type: " + joinType + " is not supported!");
_leftKeySelector = joinKeys.getLeftJoinKeySelector();
_rightKeySelector = joinKeys.getRightJoinKeySelector();
DataSchema leftSchema, JoinNode node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary. but I made the change to make the ctor simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 got it

@61yao 61yao changed the title [multistage] [feature] Support Right join and Full join [multistage] [feature] Support Right join and Full join and inEqui mix. Dec 14, 2022
return leftBlock;
}
if (leftBlock.isSuccessfulEndOfStreamBlock()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggest putting the logic of the end of stream handling into a separate function for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a TODO to follow up.

@61yao 61yao requested a review from walterddr December 14, 2022 21:50
@@ -18,12 +20,13 @@
*/
package org.apache.pinot.query.runtime.operator;

import com.google.common.base.Preconditions;
import com.clearspring.analytics.util.Preconditions;
Copy link
Contributor

@walterddr walterddr Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not change this. we widely use guava Precondition checks already. why do we need to include another dependency that does exact same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think it is an accidental change... I am not sure how it happens. updated.

return leftBlock;
}
// TODO: Moved to a different function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks for recording this

Comment on lines +211 to +216
if (matchedRightRows == null) {
if (needUnmatchedLeftRows()) {
rows.add(joinRow(leftRow, null));
}
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest clean up for the logic. no need to add join rows separately for equality or inequality in 2 places

Suggested change
if (matchedRightRows == null) {
if (needUnmatchedLeftRows()) {
rows.add(joinRow(leftRow, null));
}
continue;
}
boolean hasMatchForLeftRow = false;
if (matchedRightRows != null) {
// ...
}
if (!hasMatchForLeftRow && needUnmatchedLeftRows()) {
rows.add(joinRow(leftRow, null));
}

Comment on lines +240 to 250
private Object[] joinRow(@Nullable Object[] leftRow, @Nullable Object[] rightRow) {
Object[] resultRow = new Object[_resultRowSize];
int idx = 0;
for (Object obj : leftRow) {
resultRow[idx++] = obj;
if (leftRow != null) {
for (Object obj : leftRow) {
resultRow[idx++] = obj;
}
}
// This is needed since left row can be null and we need to advance the idx to the beginning of right row.
idx = _leftRowSize;
if (rightRow != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest making this into 3 utils. but can be done in follow up

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. other than the import change the rest of the comments can be address separately.

thank you for addressing the performance issues

@walterddr walterddr merged commit 54046e1 into apache:master Dec 16, 2022
@walterddr
Copy link
Contributor

merged. please follow up with the perf improvement comments in separate PR. thank you @61yao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants