Add support for scaling up with ZeroToMaxNodesScaling option #5826

hbostan · 2023-05-31T15:09:26Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds a node group AtomicScaleUp option, that allows for all-or-nothing scale up of the node group.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Add AtomicScaleUp option that allows all-or-nothing scale up of node groups.
In atomic scale-ups, the node group should be scaled up to max size (without partial scale-ups).

hbostan · 2023-05-31T15:09:59Z

CC @x13n @kawych

Bryce-Soghigian · 2023-06-01T05:59:21Z

So the purpose of this feature is we just want to scale up to the max when the flag is set to true? How should scale down be defined for this feature? What is the condition for atomic scale up? Just for a normal scale operation but we scale the entire nodepool up to max count?

hbostan · 2023-06-01T08:45:08Z

So the purpose of this feature is we just want to scale up to the max when the flag is set to true?

Correct, when the flag is set even if only a single node is sufficient for the pod we scale up to the max.

How should scale down be defined for this feature?

Please take a look at #5695

What is the condition for atomic scale up? Just for a normal scale operation but we scale the entire nodepool up to max count?

I don't quite understand the question, it is like a normal scale operation only difference is if the node group has the atomic option we scale up the whole group.

cluster-autoscaler/expander/expander.go

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go

cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go

MaciekPytel · 2023-06-29T12:20:03Z

Can we rename this mechanism? My expectation for 'atomic scaling' would be along the lines of handling a set of pods (perhaps a job?) in a single scale-up operation on a single NodeGroup or something along those lines.

This is very far from what I'd expect and I can easily imagine someone enabling this feature based on their intuition around the name of the flag and getting very surprised with a large bill after their nodegroup scaled to max. Maybe something like MaxOrNothingScaling?

BigDarkClown · 2023-06-29T12:29:13Z

I like it, the solution is much more elegant now. Great work!
/lgtm

hbostan · 2023-06-29T14:28:15Z

Can we rename this mechanism?

Renamed the mechanism to ZeroOrMaxNodeScaling. I think this is a bit more descriptive and clear than 'atomic scaling' about what the mechanism does.

BigDarkClown · 2023-06-29T15:06:38Z

/lgtm

MaciekPytel · 2023-06-29T15:39:22Z

/approve
/hold
@hbostan Can you update release note to match the new name of the feature? Once you do please feel free to unhold

k8s-ci-robot · 2023-06-29T15:39:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hbostan, MaciekPytel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

BigDarkClown · 2023-06-30T08:56:05Z

/lgtm

BigDarkClown · 2023-06-30T09:04:04Z

/lgtm

* Merged multiple tests into one single table driven test. * Fixed some typos.

…strator * Started handling scale up options for ZeroToMaxNodeScaling with the existing estimator * Skip setting similar node groups for the node groups that use ZeroToMaxNodeScaling * Renamed the autoscaling option from "AtomicScaleUp" to "AtomicScaling" * Merged multiple tests into one single table driven test. * Fixed some typos.

* Renamed the "AtomicScaling" autoscaling option to "ZeroOrMaxNodeScaling" to be more clear about the behavior.

BigDarkClown · 2023-06-30T12:32:29Z

/lgtm

hbostan · 2023-06-30T12:58:00Z

/unhold

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 31, 2023

k8s-ci-robot requested a review from feiskyer May 31, 2023 15:09

k8s-ci-robot added the area/cluster-autoscaler label May 31, 2023

k8s-ci-robot requested a review from x13n May 31, 2023 15:09

kawych reviewed Jun 1, 2023

View reviewed changes

olagacek reviewed Jun 21, 2023

View reviewed changes

hbostan force-pushed the master branch 2 times, most recently from e98c397 to 639253e Compare June 22, 2023 10:45

BigDarkClown reviewed Jun 22, 2023

View reviewed changes

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go Outdated Show resolved Hide resolved

BigDarkClown reviewed Jun 22, 2023

View reviewed changes

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go Outdated Show resolved Hide resolved

BigDarkClown reviewed Jun 22, 2023

View reviewed changes

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go Outdated Show resolved Hide resolved

BigDarkClown reviewed Jun 22, 2023

View reviewed changes

cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go Outdated Show resolved Hide resolved

hbostan force-pushed the master branch from 8a23bfd to 086c713 Compare June 29, 2023 12:13

k8s-ci-robot assigned BigDarkClown Jun 29, 2023

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 29, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2023

BigDarkClown mentioned this pull request Jun 29, 2023

Add BigDarkClown to Cluster Autoscaler approvers #5915

Merged

6 tasks

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 29, 2023

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 30, 2023

hbostan force-pushed the master branch from d549569 to 2ac1e8f Compare June 30, 2023 07:12

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 30, 2023

hbostan changed the title ~~Add support atomic scale up for node groups~~ Add support for scaling up with ZeroToMaxNodesScaling option Jun 30, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2023

hbostan force-pushed the master branch from fdfca86 to 5c4647d Compare June 30, 2023 10:28

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2023

hbostan force-pushed the master branch from 5c4647d to ebed4a6 Compare June 30, 2023 10:35

hbostan added 6 commits June 30, 2023 11:17

Add support for scaling up ZeroToMaxNodesScaling node groups

7b8e0e6

Use appropriate logging levels

c255aaa

Remove unused field in expander and add comment about estimator

38d18c6

Merge tests for ZeroToMaxNodesScaling into one table-driven test.

79c611c

* Merged multiple tests into one single table driven test. * Fixed some typos.

Rename the autoscaling option

333a028

* Renamed the "AtomicScaling" autoscaling option to "ZeroOrMaxNodeScaling" to be more clear about the behavior.

hbostan force-pushed the master branch from ebed4a6 to 333a028 Compare June 30, 2023 11:18

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2023

k8s-ci-robot merged commit e6397c6 into kubernetes:master Jun 30, 2023

apricote mentioned this pull request Apr 2, 2024

Scale-up broken for Cloud Providers not implementing NodeGroup.GetOptions() #6676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for scaling up with ZeroToMaxNodesScaling option #5826

Add support for scaling up with ZeroToMaxNodesScaling option #5826

hbostan commented May 31, 2023

hbostan commented May 31, 2023

Bryce-Soghigian commented Jun 1, 2023

hbostan commented Jun 1, 2023

MaciekPytel commented Jun 29, 2023

BigDarkClown commented Jun 29, 2023

hbostan commented Jun 29, 2023

BigDarkClown commented Jun 29, 2023

MaciekPytel commented Jun 29, 2023

k8s-ci-robot commented Jun 29, 2023

BigDarkClown commented Jun 30, 2023

BigDarkClown commented Jun 30, 2023

BigDarkClown commented Jun 30, 2023

hbostan commented Jun 30, 2023

Add support for scaling up with ZeroToMaxNodesScaling option #5826

Add support for scaling up with ZeroToMaxNodesScaling option #5826

Conversation

hbostan commented May 31, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

hbostan commented May 31, 2023

Bryce-Soghigian commented Jun 1, 2023

hbostan commented Jun 1, 2023

MaciekPytel commented Jun 29, 2023

BigDarkClown commented Jun 29, 2023

hbostan commented Jun 29, 2023

BigDarkClown commented Jun 29, 2023

MaciekPytel commented Jun 29, 2023

k8s-ci-robot commented Jun 29, 2023

BigDarkClown commented Jun 30, 2023

BigDarkClown commented Jun 30, 2023

BigDarkClown commented Jun 30, 2023

hbostan commented Jun 30, 2023