-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cordon Drifted nodes before processing evictions #623
Comments
Hey @sidewinder12s, this is a similar to #622. Would this help in the expiration case as well? |
I think so. I'd generally consider this ask be, if Karpenter knows it is removing a batch of nodes, it should cordon the batch before taking action to ensure we don't reschedule workloads multiple times while it is performing the change. |
My read of that section is it is not clear that Karpenter will cordon all drifted nodes before processing, or only as part of a cordon and drain action. |
That makes sense. This was one of the first considerations we had in the deprovisioning logic a while ago. The idea was to not cordon the nodes to allow the capacity to be utilized in case the nodes couldn't be deprovisioned for an unknown length of time (e.g. All in all, I'm open to making this configurable. Can you share a bit of your requirements on expiry and drift for nodes? How important do you see deprovisioning is for a node that is expired or drifted? |
The one behavior we'd had issues with in a custom batch autocaler was that our preferred use of the One of the worse uses was using it for workloads that cannot handle disruption, because it means we must take manual action to perform cluster maintenance. I think in general we're ok letting things block actions, but I think we'd want something like MachineDisruptionGate to eventually force the controller to take action to bring the cluster back into conformance with config. We run many large clusters with many different users and as we scale we're finding it difficult to coordinate action if we give users too many toggles to block maintenance and basically requires us to move to something like RDS maintenance windows where we basically tell everyone, we'll do work during these hours if your workload cannot handle the normal action of the system. Setting a TTL on nodes as policy is another action we're taking to basically stop letting the cluster keep nodes forever (so through Policy and action making it explicit to customers that you will need to deal with disruption) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
/assign |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
#1314 will address this issue, albeit through a slightly different mechanism. Rather than cordoning the nodes with a |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle stale Going to freeze this issue, #1314 should address this issue but there is a race condition that needs to be fixed first. Hopefully I should be able to prioritize this in the next few weeks. |
#1314 has now been closed off without merging so where does that leave this issue? |
Tell us about your request
We currently use a tool which will roll all nodes in an ASG when the launch template configuration changes. This is usually in response to AMI Upgrades.
One of the features of this tool is that when it detects this change, it will cordon all nodes in the ASG before it starts replacing nodes. This ensures that workloads on evicted nodes do not reschedule onto nodes that are about to get killed again.
Can Karpenter implement this for it's Drift calculations?
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We have quite a few services that do not handle restart well or rapid restarts well and would like to minimize how many times they get restarted.
Are you currently working around this issue?
No feature like this in Karpenter.
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: