Support for stacked/cooperative autoscaling or adoption of 3rd party created node claims #920

gilqubex · 2024-01-03T01:48:35Z

Description

What problem are you trying to solve?

(in continuation of slack conversation in #karpenter-dev on kubernetes slack)

As a 3rd party independent software vendor, I would like Karpenter to allow for cooperative autoscaling - that is, two or more autoscalers sharing responsibility for the same cluster.

While having more than one autoscaler start instances is generally benign (at most, one ends up with temporary overcapacity), more than one autoscaler trying to downscale at the same time may very well cause self-induced unavailability.

in v1alpha5, the "linking" mechanism enabled karpenter to "adopt" foreign instances and thus remain as the sole cluster downscaler. This however was incidental as this was not the original purpose of the linking mechanism, perhaps now a proper feature aimed at this goal can be designed.

In lieu of cooperative autoscaling, we have observed 3rd party vendors requiring customers to migrate off of karpenter (or cluster autoscaler for that matter), which (at least in our scenario as a specialized autoscaler) is undesirable.

as a 3rd party vendor, we plan on assuming all integration responsibility upon ourselves, but the most basic supporting mechanism (e.g. not rejecting foreign nodeclaims) should hopefully exist.

How important is this feature to you?

This is very important to us as we would like to continue supporting cooperative autoscaling from v1beta1 onwards.

Bryce-Soghigian · 2024-01-03T06:24:37Z

Have you thought about natively implementing your instance launch in karpenter via an implementation of the cloud provider interface? Seems like bringing your own VMS means that effectively you take the responsiblity inside of karpenter for launching/creating instances(the responsibility of the cloud provider).

You can have cloudprovider.Create() wake up a hibernated instance, and cloudprovider.Delete() put it back to sleep.

Link to original slack thread: https://kubernetes.slack.com/archives/C04JW2J5J5P/p1703276514443079

gilqubex · 2024-01-03T06:43:18Z

Have you thought about natively implementing your instance launch in karpenter via an implementation of the cloud provider interface? Seems like bringing your own VMS means that effectively you take the responsiblity inside of karpenter for launching/creating instances(the responsibility of the cloud provider).

Link to original slack thread: https://kubernetes.slack.com/archives/C04JW2J5J5P/p1703276514443079

Before referring to the specific use case of our software, I have to say that I cannot confirm that this is a sufficient solution for all third party cooperative autoscalers. Even if we could "squeeze in", I believe this degree of freedom is very likely to be limiting for other use cases.

As for the specifics of us as a third party: we use hibernated instances. in order for karpenter to support our use case, warm pool functionality must be implemented.
This however is sorts of a circular: we would be replacing instances under karpenter's watch, we would just be doing it through the warm pool. if karpenter's warm pool implementation would still be using nodeclaims to avoid foreign nodes, we'd be at the same starting point. and if it doesn't, well, why not support foreign nodes to begin with?

edit: another example of a missing feaure is scale in protection (i.e. this instance won't be scaled in for X minutes). our current way of implementing this is delaying the transfer of responsibility to karpenter for those X minutes.

Bryce-Soghigian · 2024-01-03T07:25:32Z

edit: another example of a missing feaure is scale in protection (i.e. this instance won't be scaled in for X minutes). our current way of implementing this is delaying the transfer of responsibility to karpenter for those X minutes.

Nodes can be opted out of consolidation disruption by setting the annotation karpenter.sh/do-not-disrupt: "true" on the node. Do you need more granular controls? Like do not disrupt until after x minutes?

apiVersion: v1
kind: Node
metadata:
  annotations:
    karpenter.sh/do-not-disrupt: "true"

https://karpenter.sh/preview/concepts/disruption/#node-level-controls

gilqubex · 2024-01-03T07:33:56Z

edit: another example of a missing feaure is scale in protection (i.e. this instance won't be scaled in for X minutes). our current way of implementing this is delaying the transfer of responsibility to karpenter for those X minutes.

Nodes can be opted out of consolidation disruption by setting the annotation karpenter.sh/do-not-disrupt: "true" on the node. Do you need more granular controls? Like do not disrupt until after x minutes?
apiVersion: v1
kind: Node
metadata:
  annotations:
    karpenter.sh/do-not-disrupt: "true"
https://karpenter.sh/preview/concepts/disruption/#node-level-controls

The mechanism itself is technically sufficient and we actually plan to slightly improve our current Scale in protection implementation with it. Our concerns are:

(A) We have not been able to affirm that this applies to all forms of possible downscale - drift, empty, consolidation...
(B) We risk a race condition if the tag is not applied post-factum and not during node object creation. Currently we plan to create it before the relevant machine object

Bryce-Soghigian · 2024-01-03T08:26:40Z

for A)
All disruption methods are blocked via that annotation. Each disruption method calls disrupt, and in each disrupt call, we call a function called GetCandidates to get its disruption candidates.

In this function in the NewCandidate filter we check for the disruption annotation here. So it should cover all possible forms of automated scale down.

@jonathan-innis @njtran feel free to correct any mistakes in my explanation, I think the code here speaks for itself however.

jonathan-innis · 2024-01-03T16:02:15Z

Like do not disrupt until after x minutes

Agree with @Bryce-Soghigian that a good way of doing this is to add the karpenter.sh/do-not-disrupt annotation to the NodePool spec.template.metadata so that it automatically propagates to all NodeClaims that are created by the NodePool. The trade-off (right now) is that we don't currently have a concept of a time that this annotation is respected. Realistically, we are trying to get this feature in: #752 so that we could have the concept of "respect this do-not-disrupt annotation up until the grace period (which would be a set time past the creationTimestamp)" which I think would cover the use-case you are trying to achieve here.

in order for karpenter to support our use case, warm pool functionality must be implemented

I'd be interested to see if we can just knock-out each of the current gaps that you are identifying in Karpenter today rather than delegating Karpenter as purely a down-scaler. IMO, this breaks a lot of assumptions that we make around safety when we are walking through disrupting nodes, since we assume that we can launch replacements for these nodes and ensure that replacements are safely running and a node is fully drained before continuing on. With your current implementation with linking, how do you get around these assumptions that Karpenter makes around its ability to launch replacements? I'm assuming that Karpenter might be fighting with whichever up-scaling solution that you are using when it tries to disrupt existing nodes with pods.

gilqubex · 2024-01-03T16:38:10Z

in order for karpenter to support our use case, warm pool functionality must be implemented

I'd be interested to see if we can just knock-out each of the current gaps that you are identifying in Karpenter today rather than delegating Karpenter as purely a down-scaler. IMO, this breaks a lot of assumptions that we make around safety when we are walking through disrupting nodes, since we assume that we can launch replacements for these nodes and ensure that replacements are safely running and a node is fully drained before continuing on. With your current implementation with linking, how do you get around these assumptions that Karpenter makes around its ability to launch replacements? I'm assuming that Karpenter might be fighting with whichever up-scaling solution that you are using when it tries to disrupt existing nodes with pods.

When we request karpenter to take responsibility for our nodes, we assign them to a provisioner and we would be the one at fault if they are not compatible (including from the customer's perspective). We take extreme care to verify that, inasmuch as we monitor for every karpenter configuration change and adapt our configuration to match.

It works beautifully with v1alpha5: while both our scaler and karpenter start instances, once "adopted" karpenter will process those nodes successfully as it downscales them, if at all.

garvinp-stripe · 2024-01-18T23:04:02Z

Wonder if it somewhat related to #749

sftim · 2024-01-29T09:57:11Z

When we request karpenter to take responsibility for our nodes, we assign them to a provisioner and we would be the one at fault if they are not compatible (including from the customer's perspective). We take extreme care to verify that, inasmuch as we monitor for every karpenter configuration change and adapt our configuration to match.

For v1 of NodePool and friends, we could aim to define the exact details of which party is responsible for what. This might be:

Karpenter doesn't support adoption; you can try it but if it breaks you get to keep both halves
Support for Karpenter node adoption is determined by the specific provider integration you use
Karpenter will adopt your nodes subject to [conditions], but may replace the node at will

etc

k8s-triage-robot · 2024-05-07T00:39:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-06T01:13:30Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

njtran · 2024-06-12T00:07:40Z

/lifecycle frozen

arikkfir · 2024-10-08T08:10:59Z

Thank you everyone for helping out on this 🙏
This is indeed still needed as of Karpenter v1.

Given aws/karpenter-provider-aws#6601 and this issue - what would be the best course of action here, given our desire to get our node "adopted" by a v1 Karpenter? Even a workaround would work for us for now.

Things we've considered:

Creating a new EC2 machine (as we do today) with appropriate tags: seems Karpenter deletes it immediately.
Creating a new NodeClaim seems to get Karpenter to immediately create another EC2 machine, not allowing us the chance to update its status subresource to point to our instance.
Any other suggestions that might work? Even in an interim until a full solution is in place.

@bartosz-qubex FYI :)

Thanks again for everyone in the thread!

sftim · 2024-12-24T16:04:29Z

@arikkfir you could send in a PR that proposes the definition you think people will agree to, and see if it gets consensus.

gilqubex added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2024

jonathan-innis removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 3, 2024

This was referenced Jan 4, 2024

REQUEST: New membership for Bryce Soghigian kubernetes/org#4652

Closed

chore: add initialization buffer to disruptable nodes #926

Closed

njtran changed the title ~~Minimal support for stacked/cooperative autoscaling~~ Support for stacked/cooperative autoscaling or adoption of 3rd party created node claims Feb 7, 2024

Bryce-Soghigian mentioned this issue Feb 7, 2024

docs(RFC): disruption controls by reason #936

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 6, 2024

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jun 12, 2024

njtran mentioned this issue Jul 30, 2024

Karpenter should support managing non-Karpenter nodes aws/karpenter-provider-aws#6601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for stacked/cooperative autoscaling or adoption of 3rd party created node claims #920

Support for stacked/cooperative autoscaling or adoption of 3rd party created node claims #920

gilqubex commented Jan 3, 2024

Bryce-Soghigian commented Jan 3, 2024 •

edited

Loading

gilqubex commented Jan 3, 2024 •

edited

Loading

Bryce-Soghigian commented Jan 3, 2024

gilqubex commented Jan 3, 2024

Bryce-Soghigian commented Jan 3, 2024 •

edited

Loading

jonathan-innis commented Jan 3, 2024

gilqubex commented Jan 3, 2024

garvinp-stripe commented Jan 18, 2024

sftim commented Jan 29, 2024

k8s-triage-robot commented May 7, 2024

k8s-triage-robot commented Jun 6, 2024

njtran commented Jun 12, 2024

arikkfir commented Oct 8, 2024 •

edited

Loading

sftim commented Dec 24, 2024

Support for stacked/cooperative autoscaling or adoption of 3rd party created node claims #920

Support for stacked/cooperative autoscaling or adoption of 3rd party created node claims #920

Comments

gilqubex commented Jan 3, 2024

Description

Bryce-Soghigian commented Jan 3, 2024 • edited Loading

gilqubex commented Jan 3, 2024 • edited Loading

Bryce-Soghigian commented Jan 3, 2024

gilqubex commented Jan 3, 2024

Bryce-Soghigian commented Jan 3, 2024 • edited Loading

jonathan-innis commented Jan 3, 2024

gilqubex commented Jan 3, 2024

garvinp-stripe commented Jan 18, 2024

sftim commented Jan 29, 2024

k8s-triage-robot commented May 7, 2024

k8s-triage-robot commented Jun 6, 2024

njtran commented Jun 12, 2024

arikkfir commented Oct 8, 2024 • edited Loading

sftim commented Dec 24, 2024

Bryce-Soghigian commented Jan 3, 2024 •

edited

Loading

gilqubex commented Jan 3, 2024 •

edited

Loading

Bryce-Soghigian commented Jan 3, 2024 •

edited

Loading

arikkfir commented Oct 8, 2024 •

edited

Loading