Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: fix correctness of hash join v2 when there are multiple var-length keys #55080

Conversation

windtalker
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #55016

Problem Summary:

HashJoinV2, it will first pack all the join keys into a byte array named serializedKey, and during the probe stage, it will compare serializedKey directly, if the serializedKey matches, the join think its original key matches. codec.SerializeKeys is used to generate serializedKey, for var-length column, if SerializeMode is not keepStringLength, it will not record string length in serializedKey.
Currently, only a string key is inlined, it need to record string length.
The problem is if the join key is 2 strings like a = a and b = b, if string length is not kept in serializedKey, then the following two pair of keys will generate same serializedKey
{a: aa, b: a}
{a: a, b: aa}
which is not expected.
In this pr, it always record var-column length into serialized key except all the following condition is true

  • the join key only contains 1 var length column
  • the join key is not inlined

In the above case, since there is only 1 var length column, it is ok to discard the var-column length in serializedKey because we will record the total length of serializedKey.

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: xufei <[email protected]>
Signed-off-by: xufei <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 31, 2024
Copy link

tiprow bot commented Jul 31, 2024

Hi @windtalker. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 63.46154% with 19 lines in your changes missing coverage. Please review.

Project coverage is 73.8921%. Comparing base (0ad2ca6) to head (220a896).
Report is 25 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #55080        +/-   ##
================================================
+ Coverage   72.8125%   73.8921%   +1.0795%     
================================================
  Files          1561       1568         +7     
  Lines        438809     447834      +9025     
================================================
+ Hits         319508     330914     +11406     
+ Misses        99605      96605      -3000     
- Partials      19696      20315       +619     
Flag Coverage Δ
integration 70.7090% <63.4615%> (?)
unit 71.6845% <63.4615%> (-0.1270%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9567% <ø> (ø)
parser ∅ <ø> (∅)
br 52.5488% <ø> (+6.6521%) ⬆️

@windtalker windtalker changed the title fix correctness of hash join v2 when there are multiple var-length keys executor: fix correctness of hash join v2 when there are multiple var-length keys Jul 31, 2024
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 31, 2024
Copy link

ti-chi-bot bot commented Aug 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wshwsh12, XuHuaiyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Aug 2, 2024
Copy link

ti-chi-bot bot commented Aug 2, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-31 05:01:04.906815747 +0000 UTC m=+331981.186863818: ☑️ agreed by wshwsh12.
  • 2024-08-02 03:37:32.69615505 +0000 UTC m=+60649.914916658: ☑️ agreed by XuHuaiyu.

@ti-chi-bot ti-chi-bot bot merged commit f85273b into pingcap:master Aug 2, 2024
23 checks passed
@windtalker windtalker deleted the record_var_column_length_for_hash_join_v2_when_there_are_multiple_var_length_key branch September 5, 2024 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

character set correctness
3 participants