-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provision to specify the max Drain / Pod evictions attempts in Provisioner #3052
Comments
Hey @ra-grover. Not saying no to this request, but I'm curious why it isn't easier for you folks to fix the underlying problem of broken PDBs. If karpenter deprovisions nodes, it's just going to move around the misconfigured pods. I worry that violating fundamental mechanisms of kubernetes (e.g. pdbs) may have unforeseen side effects. |
Hey @ellistarn , We have almost fixed the bad PDBs by implementing Gatekeeper rules and keeping them clean. But still there are exceptions such as Pods stuck in If you still insists on not providing the option for the right reasons you mentioned :) Is there any chance for Karpenter to ignore graceful evictions if any of the container is stuck in |
Why doesn't karpenter evict these? Is there a PDB in the way? |
Apologies for replying late @ellistarn , I think the PDB gets in the way because the pod is already in a failed state (due to whatever reason). I found this long standing issue which is resolved just recently in kubernetes/kubernetes#113375. I think a work-around that I proposed to specify a configurable max eviction tries will be really helpful. |
That upstream issue seems like a great fit. If you enabled that policy, would you still need the karpenter override? |
I think we would still need the Karpenter override, we are running 1.22 and wont be able to just upgrade to 1.26 (EKS is supported till 1.24) for that feature. Also thats behind a feature gate, It will be really helpful to have this override in Karpenter. |
Sure -- I'm thinking about the future state though. Do we need a long term solution in Karpenter, or do we need a short term fix to tide you over until 1.26. Once that feature is available, is there any value in karpenter also having an override? |
I think it's more of a short term fix only. We wont need Karpenter override past 1.26. |
Let me think on what we can do to help in the near term. One idea:
|
In the case where nodes are removed for consolidation purposes, does Karpenter only select nodes for scale-down if it is sure that all pods on those nodes can be scheduled somewhere else? AFAIK that is the behavior of the cluster-autoscaler, which I personally think is an acceptable behavior.
In several companies I have seen scenarios where the (platform) team building the cluster(s) was not the same team that was deploying the workload(s); these were development teams. The platform-team would be responsible for configuring scheduling of the nodes (with karpenter, ca, whatever) and dev teams would deploy applications. Sometimes teams would deploy broken PDBs that did not allow disruptions, which is undesirable, but not necessarily easy to prevent w/ policies (kyverno, opa, etc). Because of this, I personally don't think Karpenter can make a safe assumption that PDBs will eventually allow disruptions. Either it should not consider nodes for scale-down if PDBs don't allow disruptions. Or it should forcefully evict pods, disrespecting the PDB, if desired (or required, after a certain time, to deal w/ EC2 shutdowns or spot-interruptions) |
No, we attempt this first but if we can replace a node with a cheaper node, we will also launch a cheaper replacement and then drain/shutdown the more expensive node. |
@ra-grover looking at this issue again, is it feasible to do what Ellis has mentioned above here? #3052 (comment) |
We can implement for sure on our end, but I think since Karpenter is the one initiating the Cordoning/Pod eviction, it makes much more sense(atleast for me) to have this in Karpenter. |
Similar to kubernetes-sigs/karpenter#752 in its use-case |
Sounds like the linked upstream issue is beta in 1.27 now in the above comments. Considering EKS supports both 1.26 and 1.27, which has the feature in alpha and in beta, I'll close this one. Feel free to re-open if you disagree. |
Tell us about your request
We have hundreds on nodes in our clusters which are just hung on
Unschedulable
state because Karpenter is not able to drain them, might be a bad Application PDB or a Pod on that node stuck inImagePullBackoff
and Karpenter tries forever to evict it.I think a flag in
Consolidation
stanza of the provisioner spec would be good option to specify maxDrainAttempts/maxPodEvictions for Karpenter to not stuck indefinitely.Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
The nodes are stuck in un-schedulable state which are impacting costs.
Are you currently working around this issue?
We have manually written a cron job to Delete the pods stuck in those state, preventing Karpenter to terminate gracefully.
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: