-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: prevent crash migrating from 19.1-beta into 19.1-rcX #36714
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5b56144
to
76d9e97
Compare
23 tasks
bdarnell
approved these changes
Apr 10, 2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/settings/cluster/cockroach_versions.go, line 449 at r1 (raw file):
}, { // VersionSnapshotsWithoutLog is XXX.
Don't forget to fill this in.
When I landed the change to stop sending the Raft log in snapshots, I gated this on whether the truncated state had already been unreplicated for the range. However, this wasn't enough because older 19.1 betas knew about unreplicated truncated state and yet couldn't handle a regressing truncated state, which sending these snapshots could introduce. As a result, 19.1-beta nodes could crash while running mixed with 19.1-rcX. (Simply restarting those nodes with the upgraded binary should fix the problem). This PR breaks one of our rules around not introducing historical cluster versions, but in this case it's necessary and also shouldn't have any adverse effects. See cockroachdb#36680. Release note (bug fix): prevent a crash that could occur when running a cluster mixed between 19.1-beta and 19.1-rcX nodes. The crash would manifest with a fatal error stating "TruncatedState regressed". Moving all nodes to the new binary (19.1-rcX or newer) rectifies this situation. This wouldn't affect anyone migrating directly from 2.1.x into 19.1.x, as the majority of our users are expected to.
76d9e97
to
24347d6
Compare
Done, TFTR. bors r=bdarnell |
craig bot
pushed a commit
that referenced
this pull request
Apr 10, 2019
36714: storage: prevent crash migrating from 19.1-beta into 19.1-rcX r=bdarnell a=tbg When I landed the change to stop sending the Raft log in snapshots, I gated this on whether the truncated state had already been unreplicated for the range. However, this wasn't enough because older 19.1 betas knew about unreplicated truncated state and yet couldn't handle a regressing truncated state, which sending these snapshots could introduce. As a result, 19.1-beta nodes could crash while running mixed with 19.1-rcX. (Simply restarting those nodes with the upgraded binary should fix the problem). This PR breaks one of our rules around not introducing historical cluster versions, but in this case it's necessary and also shouldn't have any adverse effects. See #36680. Release note (bug fix): prevent a crash that could occur when running a cluster mixed between 19.1-beta and 19.1-rcX nodes. The crash would manifest with a fatal error stating "TruncatedState regressed". Moving all nodes to the new binary (19.1-rcX or newer) rectifies this situation. This wouldn't affect anyone migrating directly from 2.1.x into 19.1.x, as the majority of our users are expected to. Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When I landed the change to stop sending the Raft log in snapshots, I
gated this on whether the truncated state had already been unreplicated
for the range. However, this wasn't enough because older 19.1 betas knew
about unreplicated truncated state and yet couldn't handle a regressing
truncated state, which sending these snapshots could introduce. As a
result, 19.1-beta nodes could crash while running mixed with 19.1-rcX.
(Simply restarting those nodes with the upgraded binary should fix the
problem).
This PR breaks one of our rules around not introducing historical
cluster versions, but in this case it's necessary and also shouldn't
have any adverse effects.
See #36680.
Release note (bug fix): prevent a crash that could occur when running
a cluster mixed between 19.1-beta and 19.1-rcX nodes. The crash would
manifest with a fatal error stating "TruncatedState regressed". Moving
all nodes to the new binary (19.1-rcX or newer) rectifies this
situation. This wouldn't affect anyone migrating directly from 2.1.x
into 19.1.x, as the majority of our users are expected to.