Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global sort: check duplicate key at client side #59659

Merged
merged 3 commits into from
Feb 27, 2025

Conversation

lance6716
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #59650

Problem Summary:

What changed and how does it work?

server side may return duplicate key error in IMPORT INTO + UK use case, but it's better to handle them at client side

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 20, 2025
Copy link

tiprow bot commented Feb 20, 2025

Hi @lance6716. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: lance6716 <[email protected]>
Copy link

codecov bot commented Feb 20, 2025

Codecov Report

Attention: Patch coverage is 23.07692% with 10 lines in your changes missing coverage. Please review.

Project coverage is 73.5826%. Comparing base (97d861e) to head (a43e6f7).
Report is 50 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #59659        +/-   ##
================================================
+ Coverage   72.9805%   73.5826%   +0.6021%     
================================================
  Files          1694       1729        +35     
  Lines        468596     481101     +12505     
================================================
+ Hits         341984     354007     +12023     
+ Misses       105568     105270       -298     
- Partials      21044      21824       +780     
Flag Coverage Δ
integration 45.5948% <0.0000%> (?)
unit 72.2194% <23.0769%> (+0.0318%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 44.8555% <ø> (-0.2482%) ⬇️

Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Comment on lines 517 to 522
if needDupCheck {
if lastKey4DupCheck != nil && bytes.Equal(lastKey4DupCheck, k) {
return errors.Errorf("duplicate key found: %s", hex.EncodeToString(lastKey4DupCheck))
}
lastKey4DupCheck = k
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add in loadBatchRegionData where there is a sort phase, more simpler to check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the sorting of loadBatchRegionData there's no place to iterate all keys, so I need to add an iteration. I think it's better for performance to use current iteration here.

Copy link
Contributor

@D3Hunter D3Hunter Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems no performance penalty, we already do a byte compare bytes.Compare(e.memKVsAndBuffers.keys[i], e.memKVsAndBuffers.keys[k]) < 0 when sort, but I'm not sure, sorty's less function have 4 params

Copy link
Contributor Author

@lance6716 lance6716 Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorty do concurrent sorting in multiple goroutines, but we will need a single thread iteration and check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems working

func TestSorty(t *testing.T) {
	arr := make([]int, 0, 1<<20)
	arr = append(arr, 1)
	for i := 0; i < 1<<20; i++ {
		arr = append(arr, rand.Int())
	}
	arr = append(arr, 1)
	var dupFound atomic.Bool
	sorty.Sort(len(arr), func(i, k, r, s int) bool {
		res := cmp.Compare(arr[i], arr[k])
		if res == 0 {
			dupFound.Store(true)
		}
		if res < 0 {
			if r != s {
				arr[r], arr[s] = arr[s], arr[r]
			}
		}
		return false
	})
	fmt.Println(dupFound.Load())
}
=== RUN   TestSorty
true
--- PASS: TestSorty (1.04s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check the guarantee of sorty. Not sure if it will split [1,2,3,3,4,5] into [1,2,3] and [3,4,5], the duplicate value is split into 2 sorting group so the sorting function can't find duplication

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess sorty still need to compare or merge those 2 parts, else how would it keep the whole slice sorted?

Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 26, 2025
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

ti-chi-bot bot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, GMHDBJD

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 27, 2025
Copy link

ti-chi-bot bot commented Feb 27, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-02-26 02:18:14.068359717 +0000 UTC m=+408642.021517979: ☑️ agreed by D3Hunter.
  • 2025-02-27 10:30:28.108825085 +0000 UTC m=+524576.061983352: ☑️ agreed by GMHDBJD.

@ti-chi-bot ti-chi-bot bot merged commit da70281 into pingcap:master Feb 27, 2025
20 of 25 checks passed
@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Feb 27, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #59823.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #59824.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IMPORT INTO + global sort didn't handle UK conflict correctly
4 participants