-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MinAge
for NodeClaims
#1030
Comments
would minAge be specifed on the nodepool in the disruption controls? and be on the nodeclaim itself? This seems to suggest we would put it on the nodeclaim, but it would be rather cumbrous to manage. I can see value in the per nodeclaim controls but also would like to manage this through the nodepool. |
For us karpenter scales just to fast, and even with budgets it is hard to control. This feature would solve it for me and would be a really nice one! |
@thomaspeitz based on your experience what do you think a good default value for minAge should be? |
Seems rather parallel to --scale-down-delay-after-add from the cluster autoscaler, where we don't want to get rid of any nodes post scale up. But min age seems to solve the problem better in the sense that only the "New Nodes" are blocked from scale down. That being said additional granularity comes at the cost of the configuration being more complex for the end user. I would love to build a DisruptionDefaults/ClusterDefaults CRD so that the end user doesn't have to configure the same thing all throughout their nodepool crs. |
Is anyone already tackling this or can I attempt a design + impl for this? |
When we come to name the field, try to pick a name that helps people work out the behavior in different scenarios just from the field name:
|
So what I did to tune the minAge (kind of) In detail you can see how bad it is: to
This reduced the (wrong) scaling down a lot. Blue line cpu allocatable / green line requested cores. --scale-down-delay-after-add I would suggest to allow configuring a scale down delay after add. This would solve a lot of problems from my point of view. I could just block scale down for 15m. This would mean less uneeded interruption. In combination of minAge of a node e.g. 15m. This would ensure:
The graphs is the production setup of a client of mine which has a lot of HPAs running. In the end I won't to avoid that nodes get scaled down, just because some HPA scales down for a short amount of time. |
One thing that we should consider here with the design is the overlap with #735. For you @thomaspeitz, I'm suspecting that either this issue or that issue could help solve your problem. The question that I want to answer here is: What use-cases does Does |
I'm personally less of a fan of cluster-wide defaults in a separate CRD; since, I think that any default that you want to set on an individual resource could just be managed by your IAC, Kustomize, Helm, or GitOps tooling. @Bryce-Soghigian Have you heard from users that managing the defaults across resources is cumbersome to do with this tooling? We've tended to index towards taking away global configuration (AWS provider removed global tagging, in favor of NodeClass-based tagging) exactly for the reason of: if you have the granularity, then you just need a templating engine to manage that consistency across resources. |
For me MinAge would ensure that old nodes are terminated with a higher frequency then the newest ones. But thats what we have the TTL for anyways. So yeah I am fine with |
Someone just opened an issue requesting some flavor this type of functionality see here
|
You are right of course that you can solve the problem with IAC solutions. I have also heard the customer narrative when watching videos on karpenter describing as harder to configure and understand than cluster autoscaler. This begs the question, Why? CAS has 100 Flags what makes it easier to understand? Harder to understand makes sense since we are working with more dimensions via pod + node level controls, node replacements + disruptions etc. There are pros to making this bit more complex. With CAS to configure it you go to a single place, the flags, and modify all changes to autoscaling behavior through this one interface. Karpenter has Adding a fourth global configuration CRD may not make the configuration process easier but I would like us to think about ways we can make configuration simpler, and have observability so they can understand why karpenter makes the decisions it does. |
Interesting, would love a link to the videos just to hear directly the reasoning around why people think that we are more confusing. I agree that if this is the customer sentiment, we should drive to make our configuration simpler to understand. |
+1 for this feature |
Let me see if I can dig up some of those videos. |
/lifecycle frozen |
Description
What problem are you trying to solve?
We are having some discussion around how to control deprovisioning here: aws/karpenter-provider-aws#1738. One mechanism for adding another layer of control is to allow users to set a
minNodeLifetime
orminAge
on the NodeClaim spec which would indicate to Karpenter that it can't begin to disrupt the node until after it is past its node lifetime.This has been mentioned in a few places across a few different issues (like #1014 and #752) but hasn't been formalized into its own separate issue. This effectively allows users to say that they wouldn't like Karpenter to begin disrupting the nodes until after a fixed amount of time. The benefit of this is that Karpenter will not quickly launch and deprovision nodes (even if there is some cost savings) since that churn on the application often means that the application never gets to run.
The use-cases for this has some overlap with #752, #795, and #735
The text was updated successfully, but these errors were encountered: