-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply term should be assigned when applying snapshot #10225
Labels
affects-4.0
This bug affects 4.0.x versions.
affects-5.0
This bug affects 5.0.x versions.
affects-5.1
This bug affects 5.1.x versions.
severity/critical
sig/raft
Component: Raft, RaftStore, etc.
type/bug
The issue is confirmed as a bug.
Comments
/severity critical |
BusyJay
added a commit
to BusyJay/tikv
that referenced
this issue
Jun 28, 2021
After applying snapshot applied_term should also be updated, otherwise it can produce snapshot with wrong term and cause panic in follower. Close tikv#10225 Signed-off-by: Jay Lee <[email protected]>
ti-chi-bot
added a commit
that referenced
this issue
Jun 29, 2021
* add test case Signed-off-by: Jay Lee <[email protected]> * raftstore: update applied_term after snapshot After applying snapshot applied_term should also be updated, otherwise it can produce snapshot with wrong term and cause panic in follower. Close #10225 Signed-off-by: Jay Lee <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]>
This was referenced Jun 29, 2021
tiancaiamao
pushed a commit
to tiancaiamao/tikv
that referenced
this issue
Aug 11, 2021
* add test case Signed-off-by: Jay Lee <[email protected]> * raftstore: update applied_term after snapshot After applying snapshot applied_term should also be updated, otherwise it can produce snapshot with wrong term and cause panic in follower. Close tikv#10225 Signed-off-by: Jay Lee <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]> Signed-off-by: tiancaiamao <[email protected]>
BusyJay
added a commit
to BusyJay/tikv
that referenced
this issue
Dec 1, 2021
Ref tikv#10225 Signed-off-by: Jay Lee <[email protected]>
zhouqiang-cl
pushed a commit
that referenced
this issue
Dec 1, 2021
Ref #10225 Signed-off-by: Jay Lee <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects-4.0
This bug affects 4.0.x versions.
affects-5.0
This bug affects 5.0.x versions.
affects-5.1
This bug affects 5.1.x versions.
severity/critical
sig/raft
Component: Raft, RaftStore, etc.
type/bug
The issue is confirmed as a bug.
Bug Report
What version of TiKV are you using?
v5.0.1, though v4.0.x should also be affected.
What operating system and CPU are you using?
Doesn't matter.
Steps to reproduce
Suppose a region contains 3 replicas a, b and c, a is leader, c is isolated and has not been initialized. a and b decides to add a new replica d, and promote it as voter before it's initialized. After a has applied to last log, it sends a snapshot to d and initialize it with latest data. And then a transfer leader to d. Right after d wins the election, c is recovered and requests snapshot from d before d commits any entry.
What did you expect?
c will be initialized properly.
What did happened?
c will panic with following stack.
After v4.0.0, snapshot are generated using apply index and apply term from apply worker. Apply worker initializes these fields when a peer fsm has applied the snapshot. But applying snapshot only updates apply index and leave apply term untouched, so apply worker will set apply term to a stale value until it applies next entry. If a snapshot is generated at that time, the snapshot will be set to a wrong log term.
Because raft-rs return 0 for any term query on entries beyond its logs, so term check can succeed. Receiver will fast-forward the snapshot and commit the index. Hence panic as receiver has no such log at all. Generally, PD only promotes initialized learner to voter, so the learner has to apply at least one log before it starts to campaign, apply term will be assigned to correct value before sending snapshots.
But if a long time isolated follower becomes leader right after accepting snapshot, it can still generate snapshots with wrong metadata. In this case, the stale term may not be zero and receiver can apply the data with wrong metadata. Though receiver should be able to find conflict when accepting next empty entry and request a new snapshot again. There seems to be no harm except wrong metadata itself.
The text was updated successfully, but these errors were encountered: