Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cordon Drifted nodes before processing evictions #623

Open
sidewinder12s opened this issue Feb 28, 2023 · 15 comments
Open

Cordon Drifted nodes before processing evictions #623

sidewinder12s opened this issue Feb 28, 2023 · 15 comments
Assignees
Labels
deprovisioning Issues related to node deprovisioning kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. v1.x Issues prioritized for post-1.0

Comments

@sidewinder12s
Copy link

Tell us about your request

We currently use a tool which will roll all nodes in an ASG when the launch template configuration changes. This is usually in response to AMI Upgrades.

One of the features of this tool is that when it detects this change, it will cordon all nodes in the ASG before it starts replacing nodes. This ensures that workloads on evicted nodes do not reschedule onto nodes that are about to get killed again.

Can Karpenter implement this for it's Drift calculations?

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

We have quite a few services that do not handle restart well or rapid restarts well and would like to minimize how many times they get restarted.

Are you currently working around this issue?

No feature like this in Karpenter.

Additional Context

No response

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@sidewinder12s sidewinder12s added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 28, 2023
@sidewinder12s sidewinder12s changed the title Cordon Drifted nodes before shutdown Cordon Drifted nodes before processing evictions Feb 28, 2023
@njtran
Copy link
Contributor

njtran commented Mar 2, 2023

Hey @sidewinder12s, this is a similar to #622. Would this help in the expiration case as well?

@sidewinder12s
Copy link
Author

I think so. I'd generally consider this ask be, if Karpenter knows it is removing a batch of nodes, it should cordon the batch before taking action to ensure we don't reschedule workloads multiple times while it is performing the change.

@sidewinder12s
Copy link
Author

Isn't this already the case?

https://github.com/aws/karpenter/blob/39b48fecef170ee725a12031a20cf72da09de860/designs/node-upgrades.md?plain=1#L26-L28

My read of that section is it is not clear that Karpenter will cordon all drifted nodes before processing, or only as part of a cordon and drain action.

@njtran
Copy link
Contributor

njtran commented Mar 2, 2023

That makes sense. This was one of the first considerations we had in the deprovisioning logic a while ago. The idea was to not cordon the nodes to allow the capacity to be utilized in case the nodes couldn't be deprovisioned for an unknown length of time (e.g. do-not-evict pods, blocking PDBS, unable to create a replacement node). If a node could not be deprovisioned, keeping around capacity that couldn't be scheduled to could incur some unwanted costs.

All in all, I'm open to making this configurable. Can you share a bit of your requirements on expiry and drift for nodes? How important do you see deprovisioning is for a node that is expired or drifted?

@sidewinder12s
Copy link
Author

The one behavior we'd had issues with in a custom batch autocaler was that our preferred use of the do-not-evict annotation was to not evict singleton pods that were processing things. However the pod would eventually finish. If we do not cordon the node, it is likely the node will never empty out so that it can be replaced because new work will keep coming in that can use it.

One of the worse uses was using it for workloads that cannot handle disruption, because it means we must take manual action to perform cluster maintenance.

I think in general we're ok letting things block actions, but I think we'd want something like MachineDisruptionGate to eventually force the controller to take action to bring the cluster back into conformance with config. We run many large clusters with many different users and as we scale we're finding it difficult to coordinate action if we give users too many toggles to block maintenance and basically requires us to move to something like RDS maintenance windows where we basically tell everyone, we'll do work during these hours if your workload cannot handle the normal action of the system. Setting a TTL on nodes as policy is another action we're taking to basically stop letting the cluster keep nodes forever (so through Policy and action making it explicit to customers that you will need to deal with disruption)

@ellistarn ellistarn added the v1 Issues requiring resolution by the v1 milestone label Mar 25, 2023
@njtran njtran added v1.x Issues prioritized for post-1.0 and removed v1 Issues requiring resolution by the v1 milestone labels Sep 6, 2023
@njtran njtran added the deprovisioning Issues related to node deprovisioning label Sep 20, 2023
@njtran njtran transferred this issue from aws/karpenter-provider-aws Oct 20, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 30, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 29, 2024
@jukie
Copy link

jukie commented Mar 3, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 3, 2024
@jmdeal
Copy link
Member

jmdeal commented Mar 22, 2024

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2024
@jmdeal
Copy link
Member

jmdeal commented Jun 21, 2024

#1314 will address this issue, albeit through a slightly different mechanism. Rather than cordoning the nodes with a NoSchedule taint, Karpenter will apply a PreferNoSchedule taint to voluntary disruption candidates, and it will apply an in-memory NoSchedule taint to drifted nodes. We don't apply this in-memory taint to underutilized nodes since additional pods scheduling is the signal we use to cancel a consolidation decision. The in-memory taint combined with the PreferNoSchedule taint ensures that Karpenter will pre-spin new capacity for drifted nodes, rather than evicting and rescheduling pods to other drifted nodes.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 21, 2024
@jmdeal
Copy link
Member

jmdeal commented Jul 22, 2024

/remove-lifecycle stale
/lifecycle frozen

Going to freeze this issue, #1314 should address this issue but there is a race condition that needs to be fixed first. Hopefully I should be able to prioritize this in the next few weeks.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jul 22, 2024
@andyspiers
Copy link

#1314 has now been closed off without merging so where does that leave this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprovisioning Issues related to node deprovisioning kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. v1.x Issues prioritized for post-1.0
Projects
None yet
Development

No branches or pull requests

8 participants