Skip to content

Commit

Permalink
raft: send MsgAppResp after applying snapshot to both current leader …
Browse files Browse the repository at this point in the history
…and original sender

Previously, in the special scenario of a leader sending a snapshot to a follower, followed by a leadership change, the receiver of the snapshot does not send MsgAppResp to the new leader. This can cause delay in the new leader catching up this follower.

This commit resolves that issue by having the follower(snapshot receiver) send MsgAppResp to both the original sender of the snapshot and the new leader.

References: #134257

Epic: None
Release note: None
  • Loading branch information
hakuuww committed Feb 20, 2025
1 parent 8033b96 commit c8c1b56
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 23 deletions.
32 changes: 32 additions & 0 deletions pkg/raft/raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -2467,8 +2467,40 @@ func (r *raft) handleSnapshot(m pb.Message) {
if r.restore(s) {
r.logger.Infof("%x [commit: %d] restored snapshot [index: %d, term: %d]",
r.id, r.raftLog.committed, id.index, id.term)

// To send MsgAppResp to any leader, we must be sure that our log is
// consistent with that leader's log.
//
// After restore(s), our log ends at the entryID of this snapshot.
// The snapshot came from a node at m.Term (typically the leader).
// Everything in this snapshot has been committed during m.Term or earlier,
// and will be present in all future term logs. It is thus safe to send
// MsgAppResp to any leader at term >= m.Term.
//
// From section 5.4 of the Raft paper:
// if a log entry is committed in a given term, then that entry will be
// present in the logs of the leaders for all higher-numbered terms.
r.send(pb.Message{To: m.From, Type: pb.MsgAppResp, Index: r.raftLog.lastIndex(),
Commit: r.raftLog.committed})

// A leadership change may have happened while the snapshot was in flight.
// Therefore we need to send the MsgAppResp to the new leader as well.
//
// If we don't send response to the new leader, the new leader will not
// know that we have committed up to the snapshot index. This will cause
// delay for the new leader to transfer this follower to stateReplicate.
// (Since the new leader may send its own snapshot to this follower and
// wait for the MsgAppResp of that snapshot).
// Which may ultimately cause unwanted commit delay for client.
//
// Sending this response to the new leader allows the new leader to know
// this follower is caught up ASAP.
//
// TODO(pav-kv): consider if we can only send to r.lead.
if r.lead != None && r.lead != m.From {
r.send(pb.Message{To: r.lead, Type: pb.MsgAppResp, Index: r.raftLog.lastIndex(),
Commit: r.raftLog.committed})
}
} else {
r.logger.Infof("%x [commit: %d] ignored snapshot [index: %d, term: %d]",
r.id, r.raftLog.committed, id.index, id.term)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,12 @@
# 2. Leadership changes from Node 1 → Node 2.
# 3. Node 2 starts sending MsgApp to all followers, including Node 3.
# 4. Node 3, now aware of the leader change, responds with MsgAppResp to Node 2.
# A hint index is included in MsgAppResp to indicate node 3's progress
# 5. If Node 2 has already compacted past Node 3's hint index X, it sends a
# snapshot to Node 3.
# 5. Node 2 discovers it has already compacted past Node 3's match index, so it
# sends a new snapshot to Node 3.
# 6. Node 3 receives the snapshot from previous term leader first.
# It sends MsgAppResp to both Node 1 and Node 2.
# 7. Node 2 had previously marked Node 3 as StateSnapshot when sending its
# own snapshot 2->3. If Node 3 responds to Snapshot 1->3, Node 2 should revert
# own snapshot 2->3. If Node 3 responds to Snapshot 1->3, Node 2 transitions
# Node 3 back to StateReplicate.
# Even though its own snapshot is still inflight.

Expand Down Expand Up @@ -138,6 +137,7 @@ stabilize 3
Messages:
3->2 MsgAppResp Term:2 Log:1/15 Rejected (Hint: 11) Commit:11
3->1 MsgAppResp Term:2 Log:0/15 Commit:15
3->2 MsgAppResp Term:2 Log:0/15 Commit:15
> 3 receiving messages
2->3 MsgVote Term:2 Log:1/15
INFO 3 [logterm: 1, index: 15, vote: 0] rejected MsgVote from 2 [logterm: 1, index: 15] at term 2
Expand Down Expand Up @@ -177,34 +177,20 @@ stabilize 2
DEBUG 2 decreased progress of 3 to [StateProbe match=0 next=12 sentCommit=11 matchCommit=11]
DEBUG 2 [firstindex: 16, commit: 16] sent snapshot[index: 16, term: 2] to 3 [StateProbe match=0 next=12 sentCommit=11 matchCommit=11]
DEBUG 2 paused sending replication messages to 3 [StateSnapshot match=0 next=17 sentCommit=16 matchCommit=11 paused pendingSnap=16]
3->2 MsgAppResp Term:2 Log:0/15 Commit:15
DEBUG 2 recovered from needing snapshot, resumed sending replication messages to 3 [StateSnapshot match=15 next=17 sentCommit=16 matchCommit=15 paused pendingSnap=16]
3->2 MsgVoteResp Term:2 Log:0/0 Rejected (Hint: 0)
3->2 MsgFortifyLeaderResp Term:2 Log:0/0 LeadEpoch:1
> 2 handling Ready
Ready MustSync=false:
Messages:
2->3 MsgSnap Term:2 Log:0/0
Snapshot: Index:16 Term:2 ConfState:Voters:[1 2 3] VotersOutgoing:[] Learners:[] LearnersNext:[] AutoLeave:false
2->3 MsgApp Term:2 Log:1/15 Commit:16 Entries:[2/16 EntryNormal ""]
2->3 MsgApp Term:2 Log:2/16 Commit:16

status 2
----
1: StateReplicate match=16 next=17 sentCommit=16 matchCommit=16
2: StateReplicate match=16 next=17 sentCommit=15 matchCommit=15
3: StateSnapshot match=0 next=17 sentCommit=16 matchCommit=11 paused pendingSnap=16

deliver-msgs type=MsgSnap 3
----
2->3 MsgSnap Term:2 Log:0/0
Snapshot: Index:16 Term:2 ConfState:Voters:[1 2 3] VotersOutgoing:[] Learners:[] LearnersNext:[] AutoLeave:false
INFO log [committed=15, applied=15, applying=15, unstable.offset=16, unstable.offsetInProgress=16, len(unstable.Entries)=0] starts to restore snapshot [index: 16, term: 2]
INFO 3 switched to configuration voters=(1 2 3)
INFO 3 [commit: 16, lastindex: 16, lastterm: 2] restored snapshot [index: 16, term: 2]
INFO 3 [commit: 16] restored snapshot [index: 16, term: 2]

stabilize 3
----
> 3 handling Ready
Ready MustSync=true:
HardState Term:2 Commit:16 Lead:2 LeadEpoch:1
Snapshot Index:16 Term:2 ConfState:Voters:[1 2 3] VotersOutgoing:[] Learners:[] LearnersNext:[] AutoLeave:false
Messages:
3->2 MsgAppResp Term:2 Log:0/16 Commit:16
3: StateReplicate match=15 next=17 sentCommit=16 matchCommit=15 inflight=1

0 comments on commit c8c1b56

Please sign in to comment.