-
-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applier: Improve reconciler reschedule context to avoid deadlocking on full channel #932
Conversation
Signed-off-by: Teo Klestrup Röijezon <[email protected]>
Signed-off-by: Teo Klestrup Röijezon <[email protected]>
This fixes kube-rs#926, since we already run multiple reconcilers in parallel. Signed-off-by: Teo Klestrup Röijezon <[email protected]>
Signed-off-by: Teo Klestrup Röijezon <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #932 +/- ##
==========================================
+ Coverage 70.77% 72.15% +1.37%
==========================================
Files 64 64
Lines 4332 4356 +24
==========================================
+ Hits 3066 3143 +77
+ Misses 1266 1213 -53
|
@@ -599,6 +599,6 @@ mod tests { | |||
struct FooSpec { foo: String } | |||
}; | |||
let input = syn::parse2(input).unwrap(); | |||
let kube_attrs = KubeAttrs::from_derive_input(&input).unwrap(); | |||
let _kube_attrs = KubeAttrs::from_derive_input(&input).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is fixed in master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it, the only PR that wasn't included in this branch was #931, which didn't touch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, my bad. turns out i had a different fix for it (by testing more) in https://github.com/kube-rs/kube-rs/pull/924/files but it hasn't been reviewed and thus not made it into master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh
Signed-off-by: Teo Klestrup Röijezon <[email protected]>
let err_context = context.clone(); | ||
let (scheduler_tx, scheduler_rx) = channel::mpsc::unbounded::<ScheduleRequest<ReconcileRequest<K>>>(); | ||
let (scheduler_tx, scheduler_rx) = | ||
channel::mpsc::channel::<ScheduleRequest<ReconcileRequest<K>>>(APPLIER_REQUEUE_BUF_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some points that i wanted to lift regarding the unbounded channels here because i'm not sure about the full reasoning here. it feels partly defensible to have unbounded queues:
- there's generally a limit to how many items people put inside clusters because of how impractical it is to have a large-scale one-to-many relations with common items that have realistic cluster bounds (pods generally do not grow forever, people often make new clusters at some point after 10k or higher)
- large numbers of objects make global reconcilers an O(n^2) problem. constant requeuing, and retriggering in such cases are also likely to waste of resources (node affecting controllers in controller manager are a huge IO hogs in particular)
- if we are in such a large-scale case where we have 10k+ items in our reflector cache, and we want to reconcile all of them, we first need enough memory to house 10k+ of these specs, and an unbounded queue would at most double resources
but on the other hand. if i understand this queue correctly, it also serves as a limiter of the amount of parallelism in a controller? in that case, limiting it actually makes a lot of sense, because 10k objects being reconciled at once might DoS a third party service. is that a correct understanding of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, there are actually four different "queues" going on in applier
:
- The executor itself, this has no hard limit but is de-duped (so there will only be one task running at any one moment per K8s object)
- The "pending reconciliations" queue (for objects that are already reconciling but should be retried as soon as their current job is done) queue, this is also unbounded but deduped
- The scheduler (for objects that are scheduled to be reconciled again at some point in the future because they requested it via
Action
), this is also unbounded but deduped - The "pending scheduling requests" queue, for objects that haven't been organized into one of the previous three queues yet
This PR is only concerned with queue 4, where we have no practical way to implement deduping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand it right:
- the pending reconciliations queue is
pending
inScheduler
(i.e. 2) - the internal
DelayQueue
onScheduler
is 3 - 4 is effectively the queuestream merged with unclassified requeues (
scheduler_tx
)
and the executor is going to work at its regular pace. ..so this means it is actually possible to reconcile 1000s of reconciles at the same time on re-lists currently? Is that something that is viable to bound at some point, somehow? In the Runner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand it right:
Yes.
and the executor is going to work at its regular pace. ..so this means it is actually possible to reconcile 1000s of reconciles at the same time on re-lists currently? 😬
Yes.
Is that something that is viable to bound at some point, somehow? In the Runner?
Well, we could add additional constraints for when we actually start running a pending reconciliation. That's not implemented at the moment, on the assumption that you could "just" use a semaphore in your reconciler function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing the semaphoring in the executor (which has other benefits, like not having to allocate the task before it actually has a semaphore permit) shouldn't be too difficult either, the main problem there would be that applier
's configuration is already getting pretty unwieldy as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to refactor a lot of appliers configuration into some kind of ApplierParams
struct that can be heavily defaulted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, yeah..
really nice job getting the first |
As suggested by @clux Signed-off-by: Teo Klestrup Röijezon <[email protected]>
Ok, looking good to me now, will approve. Btw, can we maybe change the title of the PR slightly? It's not clear at a glance how this title is a fix for the buffer block unless you really dig into it. |
How about "Reschedule reconciliations inside the reconciler tasks"? Either way this is a fairly insidery change, #925 already "fixed" the issue from the immediate user's perspective. |
I would at least try to add something like "to avoid deadlocking on full channel" "Improve reconciler reschedule context to avoid deadlocking on full channel" ? |
Sure. |
Fixes #926/#925, and adds a regression test.
This is slightly breaking, since it changes
error_policy
from aFnMut
toFn
. We could work around that by running it inside aMutex
, but I'd rather let clients decide whether that overhead is worth it./cc @moustafab