[MVP] add resourcequota plugin in scheduler-estimator: add resourcequota plugin #4566

wengyao04 · 2024-01-19T18:36:16Z

What type of PR is this?
This is the followup PR of #4534

The resourceQuotaEstimator plugin will include

Add resourceQuotaEstimator plugin

It is an alpha feature, default to false in the featureGates
The resourceQuotaEstimator only supports
- compute resources (cpu/memory), extended resources (like gpu) and storage resource (ephemeral-storage)
- priorityClass scope selector

add two optional fields Namespace and PriorityClassName in pkg/estimator/pb/types.go

    // ReplicaRequirements represents the requirements required by each replica.
type ReplicaRequirements struct {
    // NodeClaim represents the NodeAffinity, NodeSelector and Tolerations required by each replica.
    // +optional
    NodeClaim *NodeClaim `json:"nodeClaim,omitempty" protobuf:"bytes,1,opt,name=nodeClaim"`
    // ResourceRequest represents the resources required by each replica.
    // +optional
    ResourceRequest corev1.ResourceList `json:"resourceRequest,omitempty" protobuf:"bytes,2,rep,name=resourceRequest,casttype=k8s.io/api/core/v1.ResourceList,castkey=k8s.io/api/core/v1.ResourceName"`
    // +optional
    Namespace string `json:"namespace,omitempty" protobuf:"bytes,3,opt,name=namespace"`
    // +optional
    PriorityClassName string `json:"priorityClassName,omitempty" protobuf:"bytes,4,opt,name=priorityClassName"`
}

add two optional fields Namespace and PriorityClassName in pkg/apis/work/v1alpha2/binding_types.go

    // ReplicaRequirements represents the requirements required by each replica.
type ReplicaRequirements struct {
    // NodeClaim represents the node claim HardNodeAffinity, NodeSelector and Tolerations required by each replica.
    // +optional
    NodeClaim *NodeClaim `json:"nodeClaim,omitempty"`

    // ResourceRequest represents the resources required by each replica.
    // +optional
    ResourceRequest corev1.ResourceList `json:"resourceRequest,omitempty"`

    // Namespace represents the resources namespaces
    // +optional
    Namespace string `json:"namespace,omitempty"`

    // PriorityClassName represents the resources priorityClassName
    // +optional
    PriorityClassName string `json:"priorityClassName,omitempty"`
}

We discuss if we need to add duplicate namespace filed workv1alpha2.ReplicaRequirements. I add it because we also need that in the resource interpreter https://github.com/karmada-io/karmada/blob/master/pkg/resourceinterpreter/interpreter.go#L46-L47
if we don't include the namespace in workv1alpha2.ReplicaRequirements, I have to make this interface change to return namespace.

// GetReplicas returns the desired replicas of the object as well as the requirements of each replica.
GetReplicas(object *unstructured.Unstructured) (replica int32, replicaRequires *workv1alpha2.ReplicaRequirements, err error)

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #
Fixes ##4369

Special notes for your reviewer:
@RainbowMango, @Garrybest, @chaosi-zju could you help check ? Thx

Does this PR introduce a user-facing change?:

codecov-commenter · 2024-01-19T18:48:34Z

Codecov Report

Attention: 49 lines in your changes are missing coverage. Please review.

Comparison is base (3cf27cc) 51.71% compared to head (e0b3e83) 51.94%.

Files	Patch %	Lines
...r/framework/plugins/resourcequota/resourcequota.go	79.42%	34 Missing and 9 partials ⚠️
pkg/util/helper/binding.go	25.00%	5 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4566      +/-   ##
==========================================
+ Coverage   51.71%   51.94%   +0.22%     
==========================================
  Files         247      248       +1     
  Lines       24419    24634     +215     
==========================================
+ Hits        12629    12795     +166     
- Misses      11103    11142      +39     
- Partials      687      697      +10

Flag	Coverage Δ
unittests	`51.94% <77.41%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Garrybest

BTW, I am not so familiar with resource quota, kindly invite @RainbowMango and @XiShanYongYe-Chang to take a look as well.

Garrybest · 2024-01-21T03:50:00Z

pkg/apis/work/v1alpha2/binding_types.go

+
+	// Namespace represents the resources namespaces
+	// +optional
+	Namespace string `json:"namespace,omitempty"`


Duplicated with ObjectReference.Namespace, why not share? Did I miss something?

That is because the ReplicaEstimator interface shares the same ReplicaRequirements struct with ResourceBinding:

karmada/pkg/estimator/client/interface.go

Line 39 in f054313

MaxAvailableReplicas(ctx context.Context, clusters []*clusterv1alpha1.Cluster, replicaRequirements *workv1alpha2.ReplicaRequirements) ([]workv1alpha2.TargetCluster, error)

I also noticed this issue at #4534 (comment).

Another reason just as mentioned in PR description:

We discuss if we need to add duplicate namespace filed workv1alpha2.ReplicaRequirements. I add it because we also need that in the resource interpreter https://github.com/karmada-io/karmada/blob/master/pkg/resourceinterpreter/interpreter.go#L46-L47
if we don't include the namespace in workv1alpha2.ReplicaRequirements, I have to make this interface change to return namespace.

This issue is now tracked by #4578.

Garrybest · 2024-01-21T04:05:53Z

pkg/util/helper/binding.go

+			NodeClaim:         nodeClaim,
+			ResourceRequest:   resourceRequest,
+			Namespace:         podTemplate.Namespace,
+			PriorityClassName: podTemplate.Spec.PriorityClassName,


I remember priority class name can be left empty in pod template and K8s pod admission hook will set the default priority class name for pod. So this name could be empty.

Do we handle this default value in the plugin?

In this MVP, we probably not, otherwise I need to listWatch PriorityClass in the karmada controller. Besides that, I think client should explicitly specify the priorityClass because the default global priorityClass might be different among managed clusters

priorityClassName will be empty

Then the plugin will be skipped for those without priorityClassName, right?
That's kind of a limitation, we can do that for this MVP.

It will be handled by the matchScope function https://github.com/karmada-io/karmada/pull/4566/files#diff-e489afa02e13c3ea92d007b4a8575045b8695f34cd31539769f2e28c3c0cb322R240-R261

Besides that, I think client should explicitly specify the priorityClass because the default global priorityClass might be different among managed clusters

But we can't determine the action of users. This field is filled by users.

You could try to watch PriorityClass in estimator instead. If we have a request of empty PriorityClass field, the estimator tries to get default value of its cluster and run this plugin.

@Garrybest I think client should explicitly specify the priorityClass because the default global priorityClass might be different among managed clusters.

I guess you mean that the administrator might set different default priority classes for each cluster, such as by PriorityClass.

So, the default priority in Karmada might not be accurate.

pkg/estimator/pb/types.go

pkg/estimator/server/framework/plugins/resourcequota/resourcequota.go

RainbowMango · 2024-01-21T09:48:44Z

pkg/estimator/server/framework/plugins/registry.go

+	registry := runtime.Registry{
+		resourcequota.Name: resourcequota.New,
+	}


Suggested change

registry := runtime.Registry{

resourcequota.Name: resourcequota.New,

}

registry := runtime.Registry{}

if features.FeatureGate.Enabled(features.ResourceQuotaEstimate) {

registry.Register(resourcequota.Name, resourcequota.New) // TODO: we might need to deal with the unhandled error

}

Is it better? So that we can skip the effort to check the feature gate in multiple places when implementing the plugin.
In addition, a feature gate will be removed eventually once it moves to a stable, we don't need to touch the plugin logic when doing so.

Instead of controlling in the registry, do you think if we can have feature-gate check in each plugin.New function like https://github.com/karmada-io/karmada/pull/4566/files#diff-e489afa02e13c3ea92d007b4a8575045b8695f34cd31539769f2e28c3c0cb322R72-R77 ?

I'm saying that just because I see there are two places we need to check the feature-gate in the plugin. :)
I'm not insisting on it, both will work. No big deal.

pkg/estimator/server/framework/plugins/resourcequota/resourcequota.go

wengyao04 · 2024-01-25T03:22:00Z

I create an umbrella issue #4578

pkg/estimator/server/framework/plugins/resourcequota/resourcequota.go

RainbowMango

Generally looks good to me.

RainbowMango · 2024-01-25T09:26:05Z

pkg/util/helper/binding.go

@@ -399,6 +399,10 @@ func GenerateReplicaRequirements(podTemplate *corev1.PodTemplateSpec) *workv1alp
 		return &workv1alpha2.ReplicaRequirements{
 			NodeClaim:       nodeClaim,
 			ResourceRequest: resourceRequest,
+			Namespace:       podTemplate.Namespace,


Shall we add a feature gate check here? The two fields are unnecessary for users that do not enable this feature?

By the way, we are sure that we have the opportunity to set the PriorityClassName in the if nodeClaim != nil || resourceRequest != nil, because the resourceRequest can not be nil, right?

Hi, I am not quite sure if I understand this question correct, I think we handle the nil replicaRequirement in estimator here

The default pb.ReplicaRequirements is empty, it will be populated only if ReplicaRequirements is not nil

req := &pb.MaxAvailableReplicasRequest{ Cluster: cluster, ReplicaRequirements: pb.ReplicaRequirements{}, } if replicaRequirements != nil { req.ReplicaRequirements.ResourceRequest = replicaRequirements.ResourceRequest if replicaRequirements.NodeClaim != nil { req.ReplicaRequirements.NodeClaim = &pb.NodeClaim{ NodeAffinity: replicaRequirements.NodeClaim.HardNodeAffinity, NodeSelector: replicaRequirements.NodeClaim.NodeSelector, Tolerations: replicaRequirements.NodeClaim.Tolerations, } } }

If the pb.ReplicaRequirements is empty, which means empty ResourceList, the current logic still calculate the replica

func (es *AccurateSchedulerEstimatorServer) nodeMaxAvailableReplica(node *framework.NodeInfo, rl corev1.ResourceList) int32 { rest := node.Allocatable.Clone().SubResource(node.Requested) // The number of pods in a node is a kind of resource in node allocatable resources. // However, total requested resources of all pods on this node, i.e. `node.Requested`, // do not contain pod resources. So after subtraction, we should cope with allowed pod // number manually which is the upper bound of this node available replicas. rest.AllowedPodNumber = util.MaxInt64(rest.AllowedPodNumber-int64(len(node.Pods)), 0) return int32(rest.MaxDivided(rl)) }

Garrybest · 2024-01-26T02:42:59Z

/lgtm

…ota plugin Signed-off-by: yweng14 <[email protected]>

RainbowMango · 2024-01-27T07:29:44Z

Just helped rebase the code.
Looks good to me now.

RainbowMango

/lgtm
/approve

karmada-bot · 2024-01-28T00:58:14Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot requested review from Garrybest, jrkeen, jwcesign and whitewindmills January 19, 2024 18:36

karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 19, 2024

Garrybest reviewed Jan 21, 2024

View reviewed changes

RainbowMango reviewed Jan 21, 2024

View reviewed changes

RainbowMango reviewed Jan 25, 2024

View reviewed changes

pkg/estimator/server/framework/plugins/resourcequota/resourcequota.go Outdated Show resolved Hide resolved

RainbowMango reviewed Jan 25, 2024

View reviewed changes

pkg/estimator/server/framework/plugins/resourcequota/resourcequota.go Show resolved Hide resolved

RainbowMango reviewed Jan 25, 2024

View reviewed changes

karmada-bot assigned Garrybest Jan 26, 2024

karmada-bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Jan 26, 2024

[MVP] add resourcequota plugin in scheduler-estimator: add resourcequ…

e0b3e83

…ota plugin Signed-off-by: yweng14 <[email protected]>

RainbowMango approved these changes Jan 28, 2024

View reviewed changes

karmada-bot assigned RainbowMango Jan 28, 2024

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 28, 2024

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2024

karmada-bot merged commit fa4d6d3 into karmada-io:master Jan 28, 2024
13 checks passed

RainbowMango mentioned this pull request Jan 30, 2024

[umbrella] enhance scheduler-estimator #4578

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MVP] add resourcequota plugin in scheduler-estimator: add resourcequota plugin #4566

[MVP] add resourcequota plugin in scheduler-estimator: add resourcequota plugin #4566

wengyao04 commented Jan 19, 2024 •

edited

Loading

codecov-commenter commented Jan 19, 2024 •

edited

Loading

Garrybest left a comment

Garrybest Jan 21, 2024

RainbowMango Jan 21, 2024

RainbowMango Jan 25, 2024

Garrybest Jan 21, 2024

wengyao04 Jan 22, 2024 •

edited

Loading

RainbowMango Jan 23, 2024

wengyao04 Jan 23, 2024 •

edited

Loading

Garrybest Jan 23, 2024

wengyao04 Jan 25, 2024

RainbowMango Jan 25, 2024

RainbowMango Jan 21, 2024

wengyao04 Jan 22, 2024 •

edited

Loading

RainbowMango Jan 23, 2024

wengyao04 commented Jan 25, 2024

RainbowMango left a comment

RainbowMango Jan 25, 2024

RainbowMango Jan 25, 2024

wengyao04 Jan 26, 2024

Garrybest commented Jan 26, 2024

RainbowMango commented Jan 27, 2024

RainbowMango left a comment

karmada-bot commented Jan 28, 2024

-	registry := runtime.Registry{
-		resourcequota.Name: resourcequota.New,
-	}
+	registry := runtime.Registry{}
+	if features.FeatureGate.Enabled(features.ResourceQuotaEstimate) {
+		registry.Register(resourcequota.Name, resourcequota.New) // TODO: we might need to deal with the unhandled error
+	}

[MVP] add resourcequota plugin in scheduler-estimator: add resourcequota plugin #4566

[MVP] add resourcequota plugin in scheduler-estimator: add resourcequota plugin #4566

Conversation

wengyao04 commented Jan 19, 2024 • edited Loading

codecov-commenter commented Jan 19, 2024 • edited Loading

Codecov Report

Garrybest left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wengyao04 Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wengyao04 Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wengyao04 Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wengyao04 commented Jan 25, 2024

RainbowMango left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Garrybest commented Jan 26, 2024

RainbowMango commented Jan 27, 2024

RainbowMango left a comment

Choose a reason for hiding this comment

karmada-bot commented Jan 28, 2024

wengyao04 commented Jan 19, 2024 •

edited

Loading

codecov-commenter commented Jan 19, 2024 •

edited

Loading

wengyao04 Jan 22, 2024 •

edited

Loading

wengyao04 Jan 23, 2024 •

edited

Loading

wengyao04 Jan 22, 2024 •

edited

Loading