[KEP] Add resource policy plugin #594

KunWuLuan · 2023-05-25T01:35:47Z

/kind feature

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot · 2023-05-25T01:35:49Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-05-25T01:35:56Z

Hi @KunWuLuan. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

KunWuLuan · 2023-05-25T01:36:06Z

releated to #475

ffromani · 2023-05-26T15:17:19Z

/cc

KunWuLuan · 2023-06-06T10:25:28Z

@denkensk @Huang-Wei Hi, if you have time, you can help to review the kep, thanks very much. 😆

ffromani · 2023-06-06T10:39:55Z

/ok-to-test

It's a bit of a crazy time for me but I'll try to also add my (non-binding) review

Huang-Wei · 2023-06-06T18:19:18Z

I will try to review it by this week.

KunWuLuan · 2023-07-14T02:33:38Z

Hi, @Huang-Wei , This PR is ready for review. PTAL. Thanks for your time. : )

KunWuLuan · 2023-07-14T02:41:25Z

There is a known issue, when the number of nodes in cluster is larger than 100, scheduler will not find all feasible nodes unless percentageOfNodesToScore is set to 100. This can make resource policy lose efficacy.

ffromani

initial pass (sorry for the long delay)
at glance looks sensible, will be asking questions to fully grasp the proposal

kep/594-resourcepolicy/README.md

netlify · 2023-09-12T07:17:05Z

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name	Link
🔨 Latest commit	`5062e58`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/6603b72d49bb010008f983aa

KunWuLuan · 2023-11-29T02:00:02Z

@ffromani @Huang-Wei Hi, do you have any other questions?

ffromani · 2024-01-15T09:15:34Z

The main blocker atm is that the KEP template is not fully filled (e.g. PostFilter - do we need it? Do we have Graduation criterias? Do we need Production Readiness? [probably not] We probably need a few words in Feature enablement and rollback though)

From what I've read so far I have no major objections. About the architecture and the fitness of this plugin in the existing ecosystem I'd have to defer to someone more experience in the scheduling.

KunWuLuan · 2024-01-16T03:09:48Z

The main blocker atm is that the KEP template is not fully filled (e.g. PostFilter - do we need it? Do we have Graduation criterias? Do we need Production Readiness? [probably not] We probably need a few words in Feature enablement and rollback though)

From what I've read so far I have no major objections. About the architecture and the fitness of this plugin in the existing ecosystem I'd have to defer to someone more experience in the scheduling.

Thanks! I will add these part.

k8s-triage-robot · 2024-06-25T07:06:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KunWuLuan · 2024-06-25T12:53:41Z

/remove-lifecycle stale

k8s-triage-robot · 2024-09-23T13:16:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KunWuLuan · 2024-09-24T10:56:56Z

/remove-lifecycle stale

k8s-triage-robot · 2024-12-23T11:00:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KunWuLuan · 2024-12-24T03:42:49Z

/remove-lifecycle stale

Huang-Wei · 2024-02-21T01:05:05Z

kep/594-resourcepolicy/README.md

+consumption for workload on differnet resources.
+
+## Motivation
+The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To 


Suggested change

The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To

A Kubernetes cluster typically consists of heterogeneous machines, with varying SKUs on CPU, memory, GPU, and pricing.

Huang-Wei · 2024-02-21T01:05:35Z

kep/594-resourcepolicy/README.md

+
+### Use Cases
+
+1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot 


Maybe use general terms to replace ECS: (and also replace stable/unstable with static/dynamic)

... there are some static but expensive VM instances... and some dynamic ...

Huang-Wei · 2024-02-21T01:09:38Z

kep/594-resourcepolicy/README.md

+
+1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot 
+instances in my cluster. I hope that my workload can be deployed first on stable ECS instances, and during business peak 
+periods, the Pods that are scaled out are deployed on Spot instances. At the end of the business peak, the Pods on Spot 


usually we use the term "scale up/down" (not scale out/in)

Huang-Wei · 2025-01-04T23:08:55Z

kep/594-resourcepolicy/README.md

+This proposal introduces a plugin to allow users to specify the priority of different resources and max resource 
+consumption for workload on differnet resources.


Suggested change

This proposal introduces a plugin to allow users to specify the priority of different resources and max resource

consumption for workload on differnet resources.

This proposal introduces a plugin that enables users to set priorities for various resources and define maximum resource consumption limits for workloads across different resources.

Huang-Wei · 2025-01-04T23:21:15Z

kep/594-resourcepolicy/README.md

+
+### Goals
+
+1. Develop a filter plugin to restrict the resource consumption on each unit for different workloads.


could you elaborate on "unit"? do you mean one kind of resource?

Huang-Wei · 2025-01-04T23:23:22Z

kep/594-resourcepolicy/README.md

+
+1. Modify the workload controller to support deletion costs. If the workload don't support deletion costs, scaling in 
+sequence will be random.
+2. When creating a ResourcePolicy, if the number of Pods has already violated the quantity constraint of the 


could you reword this to be more readable? basically, just use a positive tone to describe it as a goal that is out of scope of this KEP.

Huang-Wei · 2025-01-04T23:25:47Z

kep/594-resourcepolicy/README.md

+        key1: value3
+```
+
+`Priority` define the priority of each unit. Pods will be scheduled on units with a higher priority. 


is Prirority is under units[*]? if so, it'd read better to use a bullet to explain it under a top-level units API field.

I think we missed a detailed explanation for units.

Updated the kep. Thanks

Huang-Wei · 2025-01-04T23:28:57Z

kep/594-resourcepolicy/README.md

+If all units have the same priority, resourcepolicy will only limit the max pod on these units.
+
+`Strategy` indicate how we treat the nodes doesn't match any unit. 
+If strategy is `required`, the pod can only be scheduled on nodes that match the units in resource policy. 


In a multi-tenant cluster, given ResourcePolicy is a namespace-scoped CR, but nodes/node pools might be shared across tenants. Does it mean that if a tenant set a high-priority policy with strategy required, it could make other tenants who don't set or set low-priority policy get less chances to get their workloads scheduled?

I think the ResourcePolicy should only be set by cluster administrator. This should be a method to help reduce the cost without change the YAML of workloads. When used in multi-tenant, this can be used to limit the resource consumption of each tenant on different resources, and this should also not be set by tenant itself.

Huang-Wei · 2025-01-04T23:30:36Z

kep/594-resourcepolicy/README.md

+
+`matchPolicy` indicate if we should ignore some kind pods when calculate pods in certain unit.
+
+If `forceMaxNum` is set `true`, we will not try the next units when one unit is not full, this property have no effect


Could you reword this sentence a bit?

Rewrote this sentence. Pods will be matched by the ResourcePolicy in same namespace when the .spec.podSelector. And if .spec.matchPolicy.ignoreTerminatingPod is true, pods with Non-Zero .spec.deletionTimestamp will be ignored.

Huang-Wei · 2025-01-04T23:32:07Z

kep/594-resourcepolicy/README.md

+
+For each unit, we will record which pods were scheduled on it to prevent too many pods scheduled on it.
+
+##### PreFilter


Let’s avoid using ##### for paragraph indentation. The maximum indentation level should be #### in most cases.

Ok, no problem.

Signed-off-by: KunWuLuan <[email protected]>

k8s-ci-robot · 2025-01-24T01:30:44Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KunWuLuan
Once this PR has been reviewed and has the lgtm label, please assign ffromani for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2025-01-24T01:30:54Z

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name	Link
🔨 Latest commit	`eba9758`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/67934ad1c795620008481cee

k8s-ci-robot · 2025-01-24T08:12:17Z

@KunWuLuan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-scheduler-plugins-verify	`eba9758`	link	true	`/test pull-scheduler-plugins-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Huang-Wei · 2025-01-27T00:31:18Z

@KunWuLuan I guess the toc needs a refresh. Could you re-run hack/update-toc.sh?

13567436138 · 2025-01-28T02:25:20Z

where is implemention？it has been a year ago。

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 25, 2023

k8s-ci-robot requested review from PiotrProkop and seanmalloy May 25, 2023 01:35

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 25, 2023

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 25, 2023

KunWuLuan force-pushed the kep/resourcepolicy branch from 595d47d to 1246664 Compare May 25, 2023 01:37

k8s-ci-robot requested a review from ffromani May 26, 2023 15:17

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 6, 2023

ffromani reviewed Sep 6, 2023

View reviewed changes

kep/594-resourcepolicy/README.md Outdated Show resolved Hide resolved

kep/594-resourcepolicy/README.md Outdated Show resolved Hide resolved

KunWuLuan force-pushed the kep/resourcepolicy branch from 1246664 to 52b1408 Compare September 12, 2023 07:17

KunWuLuan requested a review from ffromani October 23, 2023 12:10

KunWuLuan changed the title ~~[proposal] add resource policy plugin~~ [WIP] Add resource policy plugin Jan 16, 2024

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2024

KunWuLuan force-pushed the kep/resourcepolicy branch from 52b1408 to be8f5e8 Compare January 19, 2024 08:44

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 24, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 23, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2024

Huang-Wei reviewed Jan 4, 2025

View reviewed changes

KunWuLuan added 4 commits January 23, 2025 19:09

add kep

0f165d4

update kep details

daa629b

add some more details in kep

100630b

update toc

7b4f893

Signed-off-by: KunWuLuan <[email protected]>

KunWuLuan force-pushed the kep/resourcepolicy branch from 5062e58 to 0e43567 Compare January 24, 2025 01:30

KunWuLuan force-pushed the kep/resourcepolicy branch 5 times, most recently from 8394acb to 27081bb Compare January 24, 2025 07:55

Update README.md

eba9758

KunWuLuan force-pushed the kep/resourcepolicy branch from 27081bb to eba9758 Compare January 24, 2025 08:09

	The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To
	A Kubernetes cluster typically consists of heterogeneous machines, with varying SKUs on CPU, memory, GPU, and pricing.


		### Use Cases

		1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot

		This proposal introduces a plugin to allow users to specify the priority of different resources and max resource
		consumption for workload on differnet resources.

	This proposal introduces a plugin to allow users to specify the priority of different resources and max resource
	consumption for workload on differnet resources.
	This proposal introduces a plugin that enables users to set priorities for various resources and define maximum resource consumption limits for workloads across different resources.


		### Goals

		1. Develop a filter plugin to restrict the resource consumption on each unit for different workloads.


		`matchPolicy` indicate if we should ignore some kind pods when calculate pods in certain unit.

		If `forceMaxNum` is set `true`, we will not try the next units when one unit is not full, this property have no effect


		For each unit, we will record which pods were scheduled on it to prevent too many pods scheduled on it.

		##### PreFilter

[KEP] Add resource policy plugin #594

Are you sure you want to change the base?

[KEP] Add resource policy plugin #594

Conversation

KunWuLuan commented May 25, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented May 25, 2023

k8s-ci-robot commented May 25, 2023

KunWuLuan commented May 25, 2023

ffromani commented May 26, 2023

KunWuLuan commented Jun 6, 2023

ffromani commented Jun 6, 2023

Huang-Wei commented Jun 6, 2023

KunWuLuan commented Jul 14, 2023

KunWuLuan commented Jul 14, 2023

ffromani left a comment

Choose a reason for hiding this comment

netlify bot commented Sep 12, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

KunWuLuan commented Nov 29, 2023

ffromani commented Jan 15, 2024

KunWuLuan commented Jan 16, 2024

k8s-triage-robot commented Jun 25, 2024

KunWuLuan commented Jun 25, 2024

k8s-triage-robot commented Sep 23, 2024

KunWuLuan commented Sep 24, 2024

k8s-triage-robot commented Dec 23, 2024

KunWuLuan commented Dec 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 24, 2025

netlify bot commented Jan 24, 2025 • edited Loading

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

k8s-ci-robot commented Jan 24, 2025

Huang-Wei commented Jan 27, 2025

13567436138 commented Jan 28, 2025

netlify bot commented Sep 12, 2023 •

edited

Loading

netlify bot commented Jan 24, 2025 •

edited

Loading