-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent errors when running tpc-c on six node cluster #34228
Labels
S-3-ux-surprise
Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Comments
I'm also seeing
frequently as well |
Ran into this as well:
Update: this killed a node. #34241 |
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Feb 5, 2019
Fixes cockroachdb#34025. Fixes cockroachdb#33624. Fixes cockroachdb#33335. Fixes cockroachdb#33151. Fixes cockroachdb#33149. Fixes cockroachdb#34159. Fixes cockroachdb#34293. Fixes cockroachdb#32813. Fixes cockroachdb#30886. Fixes cockroachdb#34228. Fixes cockroachdb#34321. It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder. Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder, allowing write-skew like we saw in cockroachdb#34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped. This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrated that without this fix, the serializable violation speculated about in the issue was possible. Release note (bug fix): Fix bug where lease transfers passed through Snapshots could forget to update in-memory state on the new leaseholder, allowing write-skew between read-modify-write operations.
craig bot
pushed a commit
that referenced
this issue
Feb 5, 2019
34548: storage: apply lease change side-effects on snapshot recipients r=nvanbenschoten a=nvanbenschoten Fixes #34025. Fixes #33624. Fixes #33335. Fixes #33151. Fixes #33149. Fixes #34159. Fixes #34293. Fixes #32813. Fixes #30886. Fixes #34228. Fixes #34321. It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder. Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder's cache, allowing write-skew like we saw in #34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped (as theorized by #34025 (comment)). This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrates that without this fix, the serializable violation speculated about in the issue was possible. Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
S-3-ux-surprise
Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Describe the problem
While running tpc-c on six node clusters, I see repeated failures of:
Note, I've observed this on multiple separate nodes.
To Reproduce
Expected behavior
TPC-C compelte without an error.
Environment:
v2.2.0-alpha.20181217-820-g645c0c9
The text was updated successfully, but these errors were encountered: