-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: fix correctness of hash join v2 when there are multiple var-length keys #55080
executor: fix correctness of hash join v2 when there are multiple var-length keys #55080
Conversation
Signed-off-by: xufei <[email protected]>
Signed-off-by: xufei <[email protected]>
Hi @windtalker. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #55080 +/- ##
================================================
+ Coverage 72.8125% 73.8921% +1.0795%
================================================
Files 1561 1568 +7
Lines 438809 447834 +9025
================================================
+ Hits 319508 330914 +11406
+ Misses 99605 96605 -3000
- Partials 19696 20315 +619
Flags with carried forward coverage won't be shown. Click here to find out more.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wshwsh12, XuHuaiyu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: close #55016
Problem Summary:
HashJoinV2, it will first pack all the join keys into a byte array named
serializedKey
, and during the probe stage, it will compareserializedKey
directly, if theserializedKey
matches, the join think its original key matches.codec.SerializeKeys
is used to generateserializedKey
, for var-length column, ifSerializeMode
is notkeepStringLength
, it will not record string length inserializedKey
.Currently, only a string key is inlined, it need to record string length.
The problem is if the join key is 2 strings like
a = a and b = b
, if string length is not kept inserializedKey
, then the following two pair of keys will generate sameserializedKey
{a:
aa
, b:a
}{a:
a
, b:aa
}which is not expected.
In this pr, it always record var-column length into serialized key except all the following condition is true
In the above case, since there is only 1 var length column, it is ok to discard the var-column length in
serializedKey
because we will record the total length ofserializedKey
.What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.