-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddl: Fix issue with concurrent update getting reverted by BackfillData #58229
ddl: Fix issue with concurrent update getting reverted by BackfillData #58229
Conversation
Hi @mjonss. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 2 changed files in this pull request and generated no suggestions.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #58229 +/- ##
================================================
+ Coverage 73.1936% 77.5686% +4.3750%
================================================
Files 1681 1730 +49
Lines 463050 503461 +40411
================================================
+ Hits 338923 390528 +51605
+ Misses 103344 90685 -12659
- Partials 20783 22248 +1465
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/retest |
@mjonss: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that the cluster index will not be duplicated during reorg? This is because the primary key check has ensured this.
pkg/ddl/partition.go
Outdated
return errors.Trace(err) | ||
} | ||
// Also don't actually write it :) | ||
err = txn.Delete(w.oldKeys[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it duplicate with L3878? Maybe keep assertion
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to only force the transaction to fail if a key already exists, regardless of settings like tidb_txn_assertion_level
, so by involving the key in the transaction, it will fail if another concurrent transaction has modified it (like insert it due to UPDATE). Just having the SetAssertion()
is not enough, we could have the txn.Delete()
or txn.Set()
only, not sure if Set+Assert+Delete is better though, what is your thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this code. But why does SetAssertNotExist
succeed after txn.Set
it first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that only lock calls are directly checked/forwarded to the KV store, while SetAssertion/Set/Delete is only applied during Commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is simply to prevent the row_key from being modified, using txn.LockKeys to apply a pessimistic lock on the corresponding row is the appropriate approach.
The purpose of Assertion is fundamentally different:
- It is designed to validate invariants or constraints to ensure correctness is not violated.
- It is not intended for concurrency control or preventing concurrent modifications.
Using LockKeys in a pessimistic transaction explicitly handles concurrency by preventing conflicting writes, while assertions act as safeguards to check assumptions about the state after operations are performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the use of internal transactions during the execution of DDL tasks and their concurrency control with DML, it is recommended to consult DDL-related colleagues to confirm whether the logic complies with DDL constraint requirements.
There should be similar code references for regular DDL backfill.
/cc @wjhuang2016 @tangenta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference with regular DDL it uses the version/lock on a key to control another unrelated key. Looks too strange here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After changing to txn.LockKeys()
it is now same as what (*addIndexTxnWorker) BackfillData()
does here.
} | ||
|
||
// tablecodec.prefixLen is not exported, but is just TableSplitKeyLen + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not get the comments point. Why +2
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableSplitKeyLen is t_<encoded tableID>
only, and we want to include the r_
as well, so that is where the +2 comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use len(recordPrefixSep)
instead of 2? Or prefixLen
in tablecodec.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, all names that start with lower case is not exported :(
I can change those names, but then the PR grows a bit with unrelated changes...
Is it OK if I create a follow-up issue+PR for exporting PrefixLen/RecordPrefixSepLength later?
/retest |
@mjonss: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[LGTM Timeline notifier]Timeline:
|
/retest |
@mjonss: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…part-backfill-dml-58226
/retest |
@mjonss: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bb7133, Defined2014, tangenta The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@Defined2014: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #58226, close #58692
Problem Summary:
A concurrency test showed that when REORGANIZE PARTITION are copying non-clustered table rows in batches, if an update happens during such batch for the same rows included in the batch, then the batch will overwrite the updates with what the batch originally read.
What changed and how does it work?
Reverted the use of table.AddRecord() for non-clustered tables and added the old row into the batch transaction, so the transaction would fail if the old row have been touched (if it already has been copied/double written and has not been touched, it will be skipped from being copied).
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.