Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.2: backport fixes for "wedged replicas in need of snapshot" #62204

Merged
merged 4 commits into from
Mar 22, 2021

Conversation

andreimatei
Copy link
Contributor

Backport #55148 (the first commit) and #58722 (the next 3 commits).

#55148 had been previously backported and then reverted in #58500. #58722 is the fix.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

…ts""

This reverts commit 49c8337.

This is a revert of a revert, reintroducing the backport of the main
commit from cockroachdb#55148. The backport was reverted because of a deadlock,
which is not being fixed by the next commit.

Release note (bug fix): A bug causing queries sent to a
freshly-restarted node to sometimes hang for a long time while the node
catches up with replication has been fixed.
This assertion had become nonsensical in #9915f0d0f104aa918c94340e9e47129b90421999.
It's from a time where the surrounding function was dealing with the
local store only.

Release note: None
Make checkCanReceiveLease() more nimble, anticipating broader use.

Release note: None
This patch backpedals a little bit on the logic introduced in cockroachdb#55148.
That patch said that, if a leader is known, every other replica refuses
to propose a lease acquisition. Instead, the replica in question
redirects whomever was triggering the lease acquisition to the leader,
thinking that the leader should take the lease.
That patch introduced a deadlock: some replicas refuse to take the lease
because they are not VOTER_FULL (see CheckCanReceiveLease()). To fix the
deadlock, this patch incorporates that check in the proposal buffer's
decision about whether or not to reject a proposal: if the leader is
believed to refuse to take the lease, then we again forward our own
lease request.

An edge case to the edge case is when the leader is not even part of the
proposer's range descriptor. This can happen if the proposer is far
behind. In this case, we assume that the leader is eligible. If it
isn't, the deadlock will resolve once the proposer catches up.

A future patch will relax the conditions under which a replica agrees to
take the lease. VOTER_INCOMING replicas should take the lease.
VOTER_DEMOTING are more controversial.

Fixes cockroachdb#57798

Release note: None
@andreimatei andreimatei force-pushed the 20.2-backport-wedge branch from 3371b5d to 6fd5b71 Compare March 22, 2021 16:40
@andreimatei andreimatei merged commit 3a6b987 into cockroachdb:release-20.2 Mar 22, 2021
@andreimatei andreimatei deleted the 20.2-backport-wedge branch March 22, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants