-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: remove apply pacing #133175
base: master
Are you sure you want to change the base?
raft: remove apply pacing #133175
Conversation
Epic: none Release note: none
Epic: none Release note: none
This commit removes the committed entries application pacing from the raftLog type. This is done for a few reasons: - CRDB does not use this capability. Entries are applied immediately in the Ready handler, so there are no times when two MsgStorageApply messages are in-flight. - If we re-introduce this pacing, it will be done outside the raft package. Instead of having the per-range in-flight limit, we are going to need a node/store level limit, to help avoid OOMs. - We are moving away from doing IO from within the RawNode. Fetching committed entries from log storage will be done via the LogSnapshot at some point. It will be natural for the caller to implement the size policies at its level. - This change separates the concerns of raftLog being a data structure, vs flow control / pacing. Before this change, the two were coupled in the raftLog type. Epic: none Release note: none
4cb98c4
to
7d26089
Compare
Epic: none Release note: none
7d26089
to
190baa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this code that you deleted would've been useful only in cases in which entry application is delegated to an async thread, right? I'm surprised that you didn't need to remove/change any tests, meaning that there wasn't any coverage. Looks like most of the code you're removing was added in etcd-io/raft@2e0653d.
I agree that we're moving toward a world in which raft communicates what can be applied in principle (i.e. the commit index) rather than doling it out piecemeal, so I'm on board with stripping this functionality down to the bare minimum (don't emit entries to apply unless the previous batch is acked). Is there a reason you didn't strip it further to the point where entries have to be applied before acking the ready?
@@ -505,8 +505,6 @@ process-ready 1 2 3 | |||
HardState Term:1 Vote:1 Commit:13 Lead:1 LeadEpoch:1 | |||
Entries: | |||
1/15 EntryNormal "prop_4" | |||
CommittedEntries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to audit it and confirm, but I think: there can only be one MsgStorageApply
in flight now, so some commits migrated between Ready
s. The test probably exercised > 1 message in flight.
Yes. We don't do it today. If we did, we would have a risk of OOM because there is no limit on the total volume of entries in the apply thread's queue across all ranges. So if we ever want an asynchronous append thread, it should come with some store/node-scoped flow control outside
There was coverage. I had to fix ~5 unit tests which were exercising this flow control capability, and also in some cases assumed more than one
Just making a minimal change scoped to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM, thanks for the explanation. I missed the tests, but see them now.
storage: storage, | ||
unstable: newUnstable(last, logger), | ||
|
||
// Initialize our committed and applied pointers to the time of the last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
This commit removes the committed entries application pacing from the
raftLog
type. This is done for a few reasons:Ready
handler, so there are no times when twoMsgStorageApply
messages are in-flight.raft
package. Instead of having the per-range in-flight limit, we are going to need a node/store level limit, to help avoid OOMs.RawNode
. Fetching committed entries from log storage will be done via theLogSnapshot
at some point. It will be natural for the caller to implement the size policies at its level.raftLog
being a data structure, vs flow control / pacing policies. Before this change, the two were coupled in theraftLog
type.Related to #105850