-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: send MsgAppResp to replica co-located with client #325
Comments
This seems risky when translated from Paxos to Raft. If a leader broadcasts MsgApp and then dies, but the MsgAppResps are sent to the other followers, then those followers may consider the entry committed even though it was not committed by the leader of that term. This may turn out to be safe (if a quorum have the new log entry then no node without it can be elected), but it's very subtle. |
I agree that it's subtle. Yet, as you say, if an entry is acknowledged by a majority of the nodes (within the same term - otherwise see raft.pdf, Figure 8), then, no matter what the leader does, that entry is guaranteed to be committed as nobody can be elected leader without it. So it should be sufficient to do the following:
Does that sound sound? |
It should be "once the counter is equal to the quorum size, commit the entry locally". Unless I'm missing something. |
Almost. Changed to "equals or exceeds". Since we're incrementing by two in one of the cases, we may not exactly hit the quorum. |
This seems sound, although figuring out which entries are covered by another node's |
I hadn't thought about that. The sending node is at the discretion to put whatever they want in that message, so in the worst case they could hash the data of the log entry (or whatever other unique identifier we can come up with) and send that along as an Entry when CC'ing. Also maybe the Furthermore, the CC'ed MsgAppResp shouldn't look exactly like the original MsgAppResp as there's a chance the recipient CC node will have turned into a leader in the meantime and doesn't know this isn't a real MsgAppResp to a MsgApp which the new leader may have sent (and while that shouldn't hinder correctness, it needs to be taken into account to avoid unwanted side effects). We could special-case
|
If the CC'd node has become a leader, then the term number will have increased, so the old MsgAppResp will be rejected on that basis. We don't need to make the CC messages look distinct for safety purposes, although doing so to pass along extra information might help. This looks like a lot of subtle work for a small gain, though. It only applies in cases where the client is talking to a follower which is proxying to the leader, and per discussion in #326 we want to minimize that behavior (if we do it at all). |
Yes, if we don't want that, let's not waste time with it. It does look pretty straightforward now though. CC @xiang90 in case someone wants it for upstream raft. |
@tschottdorf I will take a close look tonight. |
@tschottdorf @bdarnell |
@xiang90 don't sweat it, I just wanted you to be aware of the discussion we had about it in case you wanted something like that in |
Doesn't seem that this is anywhere on the roadmap, and if so, it would probably have to be in |
@spencerkimball mentioned this in #230:
The paper is Miguel Castro and Barbara Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems, 20(4):398–461, November 2002.
This is certainly something that would be useful.
The text was updated successfully, but these errors were encountered: