Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: group gets stuck when leader is removed #11038

Closed
tbg opened this issue Aug 15, 2019 · 5 comments
Closed

raft: group gets stuck when leader is removed #11038

tbg opened this issue Aug 15, 2019 · 5 comments

Comments

@tbg
Copy link
Contributor

tbg commented Aug 15, 2019

See the "document a problem" commit in #11037.

When a leader is removed via a conf change, it will retain its leadership indefinitely (continues to heartbeat followers). However, it does not accept incoming proposals.

@xiang90
Copy link
Contributor

xiang90 commented Aug 15, 2019

The expected behavior is what? The removed leader should stop heartbeating and keep silence?

@tbg
Copy link
Contributor Author

tbg commented Aug 15, 2019

At the very least, the leader should immediately step down when it applies the configuration change. But I think even more so, we probably want to send an MsgTimeoutNow to one follower that has helped commit the config change (i.e. the one with the largest Match), which in the very common case would minimize the disruption.

@xiang90
Copy link
Contributor

xiang90 commented Aug 15, 2019

@tbg

send an MsgTimeoutNow to one follower that has helped commit the config change (i.e. the one with the largest Match), which in the very common case would minimize the disruption.

For etcd, we did this in the application layer I believe.

@jingyih
Copy link
Contributor

jingyih commented Aug 15, 2019

Yes, this is handled in application layer. When leader is removed, during its stop process, it will transfer leadership to the longest connected voting member in cluster.

Ref:

etcd/etcdserver/server.go

Lines 1502 to 1503 in 9b29151

func (s *EtcdServer) Stop() {
if err := s.TransferLeadership(); err != nil {

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot closed this as completed Apr 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants