Provision to specify the max Drain / Pod evictions attempts in Provisioner #3052

ra-grover · 2022-12-16T16:55:03Z

Tell us about your request

We have hundreds on nodes in our clusters which are just hung on Unschedulable state because Karpenter is not able to drain them, might be a bad Application PDB or a Pod on that node stuck in ImagePullBackoff and Karpenter tries forever to evict it.
I think a flag in Consolidation stanza of the provisioner spec would be good option to specify maxDrainAttempts/maxPodEvictions for Karpenter to not stuck indefinitely.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

The nodes are stuck in un-schedulable state which are impacting costs.

Are you currently working around this issue?

We have manually written a cron job to Delete the pods stuck in those state, preventing Karpenter to terminate gracefully.

Additional Context

No response

Attachments

No response

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

ellistarn · 2022-12-16T17:30:35Z

Hey @ra-grover. Not saying no to this request, but I'm curious why it isn't easier for you folks to fix the underlying problem of broken PDBs. If karpenter deprovisions nodes, it's just going to move around the misconfigured pods. I worry that violating fundamental mechanisms of kubernetes (e.g. pdbs) may have unforeseen side effects.

ra-grover · 2022-12-16T17:44:29Z

Hey @ellistarn , We have almost fixed the bad PDBs by implementing Gatekeeper rules and keeping them clean. But still there are exceptions such as Pods stuck in ImagePullBackOff on which Karpenter gets stuck.
The configurable attempts that I am proposing, will atleast help us to keep the test environments cost effective. And we can set high values for production to make sure that Karpenter has made enough attempts to Evict it gracefully.

If you still insists on not providing the option for the right reasons you mentioned :) Is there any chance for Karpenter to ignore graceful evictions if any of the container is stuck in ImagePullBackOff or is failing to start ?

ellistarn · 2022-12-16T17:55:15Z

Pods stuck in ImagePullBackOff on which Karpenter gets stuck.

Why doesn't karpenter evict these? Is there a PDB in the way?

ra-grover · 2022-12-21T17:42:09Z

Apologies for replying late @ellistarn , I think the PDB gets in the way because the pod is already in a failed state (due to whatever reason). I found this long standing issue which is resolved just recently in kubernetes/kubernetes#113375.

I think a work-around that I proposed to specify a configurable max eviction tries will be really helpful.

ellistarn · 2022-12-21T18:43:12Z

That upstream issue seems like a great fit. If you enabled that policy, would you still need the karpenter override?

ra-grover · 2022-12-21T20:11:04Z

I think we would still need the Karpenter override, we are running 1.22 and wont be able to just upgrade to 1.26 (EKS is supported till 1.24) for that feature. Also thats behind a feature gate, It will be really helpful to have this override in Karpenter.

ellistarn · 2022-12-21T23:44:57Z

Sure -- I'm thinking about the future state though. Do we need a long term solution in Karpenter, or do we need a short term fix to tide you over until 1.26. Once that feature is available, is there any value in karpenter also having an override?

ra-grover · 2022-12-22T00:41:02Z

I think it's more of a short term fix only. We wont need Karpenter override past 1.26.

ellistarn · 2022-12-22T04:32:32Z

Let me think on what we can do to help in the near term. One idea:

We could ignore imagepullbackoff pods in termination decisions
It's possible the pod could get stuck evicting due to PDBs
You could implement a cronjob that gets and deletes pods that are in imgpullbackoff
wdyt?

mhulscher · 2022-12-28T15:00:52Z

In the case where nodes are removed for consolidation purposes, does Karpenter only select nodes for scale-down if it is sure that all pods on those nodes can be scheduled somewhere else? AFAIK that is the behavior of the cluster-autoscaler, which I personally think is an acceptable behavior.

Not saying no to this request, but I'm curious why it isn't easier for you folks to fix the underlying problem of broken PDBs.

In several companies I have seen scenarios where the (platform) team building the cluster(s) was not the same team that was deploying the workload(s); these were development teams. The platform-team would be responsible for configuring scheduling of the nodes (with karpenter, ca, whatever) and dev teams would deploy applications. Sometimes teams would deploy broken PDBs that did not allow disruptions, which is undesirable, but not necessarily easy to prevent w/ policies (kyverno, opa, etc).

Because of this, I personally don't think Karpenter can make a safe assumption that PDBs will eventually allow disruptions. Either it should not consider nodes for scale-down if PDBs don't allow disruptions. Or it should forcefully evict pods, disrespecting the PDB, if desired (or required, after a certain time, to deal w/ EC2 shutdowns or spot-interruptions)

tzneal · 2022-12-28T15:06:44Z

In the case where nodes are removed for consolidation purposes, does Karpenter only select nodes for scale-down if it is sure that all pods on those nodes can be scheduled somewhere else? AFAIK that is the behavior of the cluster-autoscaler, which I personally think is an acceptable behavior.

No, we attempt this first but if we can replace a node with a cheaper node, we will also launch a cheaper replacement and then drain/shutdown the more expensive node.

njtran · 2023-01-03T18:25:19Z

@ra-grover looking at this issue again, is it feasible to do what Ellis has mentioned above here? #3052 (comment)

ra-grover · 2023-01-03T18:34:50Z

You could implement a cronjob that gets and deletes pods that are in imgpullbackoff

We can implement for sure on our end, but I think since Karpenter is the one initiating the Cordoning/Pod eviction, it makes much more sense(atleast for me) to have this in Karpenter.

jonathan-innis · 2023-05-24T18:33:14Z

Similar to kubernetes-sigs/karpenter#752 in its use-case

njtran · 2023-08-14T19:28:50Z

Sounds like the linked upstream issue is beta in 1.27 now in the above comments. Considering EKS supports both 1.26 and 1.27, which has the feature in alpha and in beta, I'll close this one. Feel free to re-open if you disagree.

ra-grover added the feature New feature or request label Dec 16, 2022

ra-grover changed the title ~~Provision to specify the max Drain attempts in Provisioner~~ Provision to specify the max Drain / Pod evictions attempts in Provisioner Dec 16, 2022

spring1843 assigned njtran Dec 19, 2022

billrayburn added the deprovisioning Issues related to node deprovisioning label May 24, 2023

njtran mentioned this issue Jun 5, 2023

Mega Issue: Deprovisioning Controls #1738

Closed

18 tasks

njtran removed their assignment Jun 27, 2023

njtran closed this as completed Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provision to specify the max Drain / Pod evictions attempts in Provisioner #3052

Provision to specify the max Drain / Pod evictions attempts in Provisioner #3052

ra-grover commented Dec 16, 2022 •

edited

Loading

ellistarn commented Dec 16, 2022

ra-grover commented Dec 16, 2022

ellistarn commented Dec 16, 2022 •

edited

Loading

ra-grover commented Dec 21, 2022

ellistarn commented Dec 21, 2022

ra-grover commented Dec 21, 2022

ellistarn commented Dec 21, 2022

ra-grover commented Dec 22, 2022

ellistarn commented Dec 22, 2022

mhulscher commented Dec 28, 2022

tzneal commented Dec 28, 2022

njtran commented Jan 3, 2023

ra-grover commented Jan 3, 2023

jonathan-innis commented May 24, 2023

njtran commented Aug 14, 2023

Provision to specify the max Drain / Pod evictions attempts in Provisioner #3052

Provision to specify the max Drain / Pod evictions attempts in Provisioner #3052

Comments

ra-grover commented Dec 16, 2022 • edited Loading

Tell us about your request

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Are you currently working around this issue?

Additional Context

Attachments

Community Note

ellistarn commented Dec 16, 2022

ra-grover commented Dec 16, 2022

ellistarn commented Dec 16, 2022 • edited Loading

ra-grover commented Dec 21, 2022

ellistarn commented Dec 21, 2022

ra-grover commented Dec 21, 2022

ellistarn commented Dec 21, 2022

ra-grover commented Dec 22, 2022

ellistarn commented Dec 22, 2022

mhulscher commented Dec 28, 2022

tzneal commented Dec 28, 2022

njtran commented Jan 3, 2023

ra-grover commented Jan 3, 2023

jonathan-innis commented May 24, 2023

njtran commented Aug 14, 2023

ra-grover commented Dec 16, 2022 •

edited

Loading

ellistarn commented Dec 16, 2022 •

edited

Loading