-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KEP] Add resource policy plugin #594
base: master
Are you sure you want to change the base?
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @KunWuLuan. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
releated to #475 |
595d47d
to
1246664
Compare
/cc |
@denkensk @Huang-Wei Hi, if you have time, you can help to review the kep, thanks very much. 😆 |
/ok-to-test It's a bit of a crazy time for me but I'll try to also add my (non-binding) review |
I will try to review it by this week. |
Hi, @Huang-Wei , This PR is ready for review. PTAL. Thanks for your time. : ) |
There is a known issue, when the number of nodes in cluster is larger than 100, scheduler will not find all feasible nodes unless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial pass (sorry for the long delay)
at glance looks sensible, will be asking questions to fully grasp the proposal
1246664
to
52b1408
Compare
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
@ffromani @Huang-Wei Hi, do you have any other questions? |
The main blocker atm is that the KEP template is not fully filled (e.g. PostFilter - do we need it? Do we have Graduation criterias? Do we need Production Readiness? [probably not] We probably need a few words in Feature enablement and rollback though) From what I've read so far I have no major objections. About the architecture and the fitness of this plugin in the existing ecosystem I'd have to defer to someone more experience in the scheduling. |
Thanks! I will add these part. |
52b1408
to
be8f5e8
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
kep/594-resourcepolicy/README.md
Outdated
consumption for workload on differnet resources. | ||
|
||
## Motivation | ||
The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The machines in a Kubernetes cluster are typically heterogeneous, with varying CPU, memory, GPU, and pricing. To | |
A Kubernetes cluster typically consists of heterogeneous machines, with varying SKUs on CPU, memory, GPU, and pricing. |
kep/594-resourcepolicy/README.md
Outdated
|
||
### Use Cases | ||
|
||
1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use general terms to replace ECS: (and also replace stable/unstable with static/dynamic)
... there are some static but expensive VM instances... and some dynamic ...
kep/594-resourcepolicy/README.md
Outdated
|
||
1. As a user of cloud services, there are some stable but expensive ECS instances and some unstable but cheaper Spot | ||
instances in my cluster. I hope that my workload can be deployed first on stable ECS instances, and during business peak | ||
periods, the Pods that are scaled out are deployed on Spot instances. At the end of the business peak, the Pods on Spot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually we use the term "scale up/down" (not scale out/in)
kep/594-resourcepolicy/README.md
Outdated
This proposal introduces a plugin to allow users to specify the priority of different resources and max resource | ||
consumption for workload on differnet resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal introduces a plugin to allow users to specify the priority of different resources and max resource | |
consumption for workload on differnet resources. | |
This proposal introduces a plugin that enables users to set priorities for various resources and define maximum resource consumption limits for workloads across different resources. |
kep/594-resourcepolicy/README.md
Outdated
|
||
### Goals | ||
|
||
1. Develop a filter plugin to restrict the resource consumption on each unit for different workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate on "unit"? do you mean one kind of resource?
kep/594-resourcepolicy/README.md
Outdated
|
||
1. Modify the workload controller to support deletion costs. If the workload don't support deletion costs, scaling in | ||
sequence will be random. | ||
2. When creating a ResourcePolicy, if the number of Pods has already violated the quantity constraint of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you reword this to be more readable? basically, just use a positive tone to describe it as a goal that is out of scope of this KEP.
kep/594-resourcepolicy/README.md
Outdated
key1: value3 | ||
``` | ||
|
||
`Priority` define the priority of each unit. Pods will be scheduled on units with a higher priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is Prirority
is under units[*]
? if so, it'd read better to use a bullet to explain it under a top-level units
API field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we missed a detailed explanation for units
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the kep. Thanks
If all units have the same priority, resourcepolicy will only limit the max pod on these units. | ||
|
||
`Strategy` indicate how we treat the nodes doesn't match any unit. | ||
If strategy is `required`, the pod can only be scheduled on nodes that match the units in resource policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a multi-tenant cluster, given ResourcePolicy is a namespace-scoped CR, but nodes/node pools might be shared across tenants. Does it mean that if a tenant set a high-priority policy with strategy required
, it could make other tenants who don't set or set low-priority policy get less chances to get their workloads scheduled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ResourcePolicy should only be set by cluster administrator. This should be a method to help reduce the cost without change the YAML of workloads. When used in multi-tenant, this can be used to limit the resource consumption of each tenant on different resources, and this should also not be set by tenant itself.
kep/594-resourcepolicy/README.md
Outdated
|
||
`matchPolicy` indicate if we should ignore some kind pods when calculate pods in certain unit. | ||
|
||
If `forceMaxNum` is set `true`, we will not try the next units when one unit is not full, this property have no effect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you reword this sentence a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote this sentence. Pods will be matched by the ResourcePolicy in same namespace when the .spec.podSelector
. And if .spec.matchPolicy.ignoreTerminatingPod
is true
, pods with Non-Zero .spec.deletionTimestamp
will be ignored.
kep/594-resourcepolicy/README.md
Outdated
|
||
For each unit, we will record which pods were scheduled on it to prevent too many pods scheduled on it. | ||
|
||
##### PreFilter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s avoid using #####
for paragraph indentation. The maximum indentation level should be ####
in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no problem.
Signed-off-by: KunWuLuan <[email protected]>
5062e58
to
0e43567
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KunWuLuan The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
8394acb
to
27081bb
Compare
27081bb
to
eba9758
Compare
@KunWuLuan: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@KunWuLuan I guess the toc needs a refresh. Could you re-run |
where is implemention?it has been a year ago。 |
/kind feature
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?