How to handle dynamic parameters which need to be changed every time before generating works for member cluster? #3436

mkloveyy · 2023-04-20T05:53:11Z

Below is a basic scenario in our release process:

A fort pod means the first machine deployed in a cluster. During the bastion deployment phase, no traffic is routed to it. And in the canary deployment phase, traffic is routed to it.

We've implement a deploy CRD called Rollout based on openkruise to handle the whole process. Part of rollout's definition for above case is as follows:

apiVersion:
kind: Rollout
metadata:
  name:
  namespace:
spec:
  fortPull: false
  replicas: 4
  selector:
    matchLabels:
      app: 
  strategy:
    canary:
      steps:
      - properties:  # fort
          delay: "-1"
          pull: "false"
        type: Fort
      - properties:  # canary
          delay: "-1"
          pull: "true"
        type: Fort
      - properties:  # first batch of rolling
          delay: "-1"
          maxUnavailable: "0"
          replicas: 50%
          timeout: "5"
        type: Batch
      - properties:  # second batch of rolling
          delay: "-1"
          maxUnavailable: "0"
          replicas: 100%
          timeout: "5"
        type: Batch
  subsetType: CloneSet
  template:
    ...

So in multi-cluster case, every time user creating a release, we need to do:

Generate canary.steps for all member clusters.
Allocate a fort pod to one of member clusters (values in steps are different between fort and non-fort clusters).

For example, above rollout need to be updated in two clusters: cluster-x and cluster-z. The differences between two cluster are as follows:

# cluster-x.yaml, fort pod in this cluster
strategy:
    canary:
      steps:
      - properties:  # fort
          delay: "-1"
          pull: "false"  # pull out fort
        type: Fort
      - properties:  # canary
          delay: "-1"
          pull: "true"  # pull out fort
        type: Fort
      ...

# cluster-z.yaml, no fort pod
strategy:
    canary:
      steps:
      - properties:  # fort
          delay: "-1"
          pull: nil  # skip
        type: Fort
      - properties:  # canary
          delay: "-1"
          pull: nil  # skip
        type: Fort
      ...

Fort pod is allocated by below rules:

If app has no fort pod before, allocate it to the higher weight cluster based on ResourceBinding. If weights are equal, allocate it the first cluster in ResourceBinding clusters by default.
If app has a fort pod, keep it to the current cluster it's staying.

Currently, we are trying to put the code in the reviseReplicas hook (#3375)

but we encountered some issues during the develop, most related to the fort pod:

If replicas is 0, reviseReplicas hook will not be executed.
We enable users to start a release if replicas is 0, but we cannot update in this case as reviseReplicas hook won't be reached.

// pkg/controllers/binding/common.go
func needReviseReplicas(replicas int32, placement *policyv1alpha1.Placement) bool {
    return replicas > 0 && placement != nil && placement.ReplicaSchedulingType() == policyv1alpha1.ReplicaSchedulingTypeDivided
}

We cannot get cluster_name in reviseReplicas hook, so we cannot judge if we need to put fort pod.
Usually we put the fort pod to cluster with higher weight. I the weights are evenly distributed, we decide to select the first cluster in ResourceBinding by default. But we can't know the cluster in this hook as it was not passed.

// pkg/controllers/binding/common.go
	for i := range targetClusters {
		...
		if needReviseReplicas(replicas, placement) {
			if resourceInterpreter.HookEnabled(clonedWorkload.GroupVersionKind(), configv1alpha1.InterpreterOperationReviseReplica) {
				clonedWorkload, err = resourceInterpreter.ReviseReplica(clonedWorkload, int64(targetCluster.Replicas))
				if err != nil {
					klog.Errorf("Failed to revise replica for %s/%s/%s in cluster %s, err is: %v",
						workload.GetKind(), workload.GetNamespace(), workload.GetName(), targetCluster.Name, err)
					return err
				}
			}
		}
                ...
        } 
}

So, is there any suitable ways to implement such differentiated configuration? And where is the appropriate place to put the code? Maybe expanding reviseReplica by adding cluster_name and removing replicas > 0 judgement and giving a new hook point?

Environment:

Karmada version: v1.4.1-with-lua
Kubernetes version: 1.19
Others:

The text was updated successfully, but these errors were encountered:

XiShanYongYe-Chang · 2023-04-20T06:11:03Z

/cc @RainbowMango

chaunceyjiang · 2023-04-20T06:30:16Z

For example, above rollout need to be updated in two clusters: cluster-x and cluster-z. The differences between two cluster are as follows:

apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
spec:
  overrideRules:
    - targetCluster:
        clusterNames:
          - cluster-x
      overriders:
        plaintext:
          - path: /spec/strategy/canary/steps/0/pull
            operator: replace
            value:  "false"
          - path: /spec/strategy/canary/steps/1/pull
            operator: replace
            value:  "true"
    - targetCluster:
        clusterNames:
          - cluster-z
      overriders:
        plaintext:
          - path: /spec/strategy/canary/steps/0/pull
            operator: replace
            value:  nil
          - path: /spec/strategy/canary/steps/1/pull
            operator: replace
            value:  nil

Can this help you?

mkloveyy · 2023-04-20T06:39:14Z

For example, above rollout need to be updated in two clusters: cluster-x and cluster-z. The differences between two cluster are as follows:

apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
spec:
  overrideRules:
    - targetCluster:
        clusterNames:
          - cluster-x
      overriders:
        plaintext:
          - path: /spec/strategy/canary/steps/0/pull
            operator: replace
            value:  "false"
          - path: /spec/strategy/canary/steps/1/pull
            operator: replace
            value:  "true"
    - targetCluster:
        clusterNames:
          - cluster-z
      overriders:
        plaintext:
          - path: /spec/strategy/canary/steps/0/pull
            operator: replace
            value:  nil
          - path: /spec/strategy/canary/steps/1/pull
            operator: replace
            value:  nil

Can this help you?

Thanks for your reply!

Currently we do not use OverridePolicy to put canary.steps config in but implement the generating code in reviseReplicas hook and it will be directly applied on works.

1. If replicas is 0, reviseReplicas hook will not be executed.
2. We cannot get cluster_name in reviseReplicas hook, so we cannot judge if we need to put fort pod.

And still we cannot decide which cluster to put the fort pod in as issues above.

chaunceyjiang · 2023-04-20T06:53:09Z

Currently we do not use OverridePolicy to put canary.steps config in but implement the generating code in reviseReplicas hook and it will be directly applied on works.

I'm sorry, I didn't understand why you don't use OverridePolicy and instead want to use reviseReplicas?

mkloveyy · 2023-04-20T07:17:35Z

I'm sorry, I didn't understand why you don't use OverridePolicy and instead want to use reviseReplicas?
Mainly for two reason:

We need to regenerate /spec/strategy/canary/steps every time when user creates a release. If we put this dynamic config in op, we need to ensure op updated first before user updates the global resource. I think now we don't have this option.
We consider OverridePolicy is not very suitable for put dynamic configs. Now we put some static config in OverridePolicy:

apiVersion: policy.karmada.io/v1alpha1
kind: OverridePolicy
metadata:
  annotations:
    versionid: "47677"
  name: 
  namespace: 
spec:
  resourceSelectors:
    - apiVersion: 
      kind: Rollout
      name:
      namespace:
  overrideRules:
    - targetCluster:
        clusterNames:
          - cluster-x
      overriders:
        plaintext:
          - path: /metadata/annotations/cdos~1az
            operator: replace
            value: X
          - path: /spec/template/spec/containers/0/env/11
            operator: replace
            value: 
              name: CDOS_K8S_CLUSTER
              value: cluster-x
    - targetCluster:
        clusterNames:
          - cluster-z
      overriders:
        plaintext:
          - path: /metadata/annotations/cdos~1az
            operator: replace
            value: Z
          - path: /spec/template/spec/containers/0/env/11
            operator: replace
            value: 
              name: CDOS_K8S_CLUSTER
              value: cluster-z

Poor12 · 2023-04-20T07:35:13Z

ReviseReplicas is basicly for dividing replicas when replica scheduling policy is divided. I don't think your scenario is suitable to use it because you do not change replicas in resource template but change its strategy in different clusters.

mkloveyy · 2023-04-20T07:52:46Z

ReviseReplicas is basicly for dividing replicas when replica scheduling policy is divided. I don't think your scenario is suitable to use it because you do not change replicas in resource template but change its strategy in different clusters.

Thanks for your reply!

Where do you think would be the appropriate place to put this piece of logic? Or, to expand the scope a bit, how could we better implement such scenarios?

chaunceyjiang · 2023-04-20T08:48:55Z

We cannot get cluster_name in reviseReplicas hook, so we cannot judge if we need to put fort pod.

Similar problem.
#3159

I'm thinking about whether we should introduce cluster info in the ReviseReplica step.

mkloveyy · 2023-04-20T09:11:39Z

We cannot get cluster_name in reviseReplicas hook, so we cannot judge if we need to put fort pod.

Similar problem. #3159

I'm thinking about whether we should introduce cluster info in the ReviseReplica step.

Agreed. If do so, we can implement some custom logic that need to base on cluster info.

RainbowMango · 2023-04-21T11:08:08Z

If replicas is 0, reviseReplicas hook will not be executed.

Do you mean you want to revise the Rollout by reviseReplicas hook point? Have you implemented InterpretReplica?

We need to regenerate /spec/strategy/canary/steps every time when user creates a release.

Does creates a release means create a Rollout?

I talked to @snowplayfire yesterday about this issue. Technically, we can do that by extending the resource interpreter but seems it essentially belongs to the override area. Not sure yet, maybe we can have a chat at community meeting.

mkloveyy · 2023-04-23T11:34:41Z

Do you mean you want to revise the Rollout by reviseReplicas hook point? Have you implemented InterpretReplica?

Yes, we have implemented InterpretReplica. However, as we need the newest ResourceBinding, we need to revise the Rollout between BuildResourceBinding and ensureWork.

We need to regenerate /spec/strategy/canary/steps every time when user creates a release.

Does creates a release means create a Rollout?

Roughly correct. A release means a deploy access which creates or updates a Rollout.

I talked to @snowplayfire yesterday about this issue. Technically, we can do that by extending the resource interpreter but seems it essentially belongs to the override area.

My pleasure. Today we had a meeting with @snowplayfire and I've heard about this from her. And we've also communicated with @XiShanYongYe-Chang and @chaunceyjiang. Currently I think maybe both the two solutions(Expanding reviseReplica by adding cluster_name and Giving a new hook point) have more universal value and can solve our problems.

Not sure yet, maybe we can have a chat at community meeting.

Agreed. If possible, we hope to have some concrete discussions and reach some conclusions at next Tuesday's meeting.
Anyway, thank you very much for your assistance!😊

CharlesQQ · 2023-04-25T07:46:55Z

Does this issue meet the requirements? #1567

CharlesQQ · 2023-04-25T07:53:00Z

Or add pause field for PP, issue like: #517

mkloveyy · 2023-04-25T08:03:26Z

Does this issue meet the requirements? #1567

Maybe this plan is more suitable for us. However, as OverridePolicy seems to be used for storing static parameters, I consider whether dynamic parameters can be applied on the Work directly.

mkloveyy · 2023-04-26T03:12:33Z

@RainbowMango
I think maybe there is another option that we can introduce a new hook point maybe called reviseObject. This hook appears after op has been applied on works.

cops, ops, err := overrideManager.ApplyOverridePolicies(clonedWorkload, targetCluster.Name)
if err != nil {
    klog.Errorf("Failed to apply overrides for %s/%s/%s, err is: %v", clonedWorkload.GetKind(), clonedWorkload.GetNamespace(), clonedWorkload.GetName(), err)
        return err
}

if resourceInterpreter.HookEnabled(clonedWorkload.GroupVersionKind(), configv1alpha1.InterpreterOperationReviseObject) {
    clonedWorkload, err = resourceInterpreter.ReviseObject(clonedWorkload, targetCluster.Name, int64(targetCluster.Replicas))
        if err != nil {
            klog.Errorf("Failed to revise object for %s/%s/%s in cluster %s, err is: %v", workload.GetKind(), workload.GetNamespace(), workload.GetName(), targetCluster.Name, err)
            return err
        }
}

The definition of this hook is to allow users to directly modify the work.

XiShanYongYe-Chang · 2023-04-26T08:31:56Z

Maybe there is another option that we can introduce a new hook point maybe called reviseObject. This hook appears after op has been applied on works.

Ask some guys to help take a look @RainbowMango @chaunceyjiang @CharlesQQ @Poor12

mkloveyy · 2023-04-26T08:45:27Z

@shiyan2016 @snowplayfire @lxtywypc
We can have a look together.

chaunceyjiang · 2023-04-26T08:47:02Z

cops, ops, err := overrideManager.ApplyOverridePolicies(clonedWorkload, targetCluster.Name)
if err != nil {
    klog.Errorf("Failed to apply overrides for %s/%s/%s, err is: %v", clonedWorkload.GetKind(), clonedWorkload.GetNamespace(), clonedWorkload.GetName(), err)
        return err
}

if resourceInterpreter.HookEnabled(clonedWorkload.GroupVersionKind(), configv1alpha1.InterpreterOperationReviseObject) {
    clonedWorkload, err = resourceInterpreter.ReviseObject(workload, clonedWorkload, targetCluster.Name)
        if err != nil {
            klog.Errorf("Failed to revise object for %s/%s/%s in cluster %s, err is: %v", workload.GetKind(), workload.GetNamespace(), workload.GetName(), targetCluster.Name, err)
            return err
        }
}

My suggestion is resourceInterpreter.ReviseObject(workload, clonedWorkload, targetCluster.Name)

CharlesQQ · 2023-04-28T02:16:44Z

I think maybe there is another option that we can introduce a new hook point maybe called reviseObject. This hook appears after op has been applied on works.

reviseObject may be lead to user don't know which cluster's work object is revised？

mkloveyy · 2023-04-28T08:09:50Z

reviseObject may be lead to user don't know which cluster's work object is revised？

That's why we need to introduce targetCluster.Name to it.

mkloveyy added the kind/question Indicates an issue that is a support question. label Apr 20, 2023

Poor12 mentioned this issue Apr 20, 2023

Add GracePeriodSeconds for GracefulEvictionTask #3437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle dynamic parameters which need to be changed every time before generating works for member cluster? #3436

How to handle dynamic parameters which need to be changed every time before generating works for member cluster? #3436

mkloveyy commented Apr 20, 2023 •

edited

Loading

XiShanYongYe-Chang commented Apr 20, 2023

chaunceyjiang commented Apr 20, 2023

mkloveyy commented Apr 20, 2023 •

edited

Loading

chaunceyjiang commented Apr 20, 2023

mkloveyy commented Apr 20, 2023

Poor12 commented Apr 20, 2023

mkloveyy commented Apr 20, 2023

chaunceyjiang commented Apr 20, 2023

mkloveyy commented Apr 20, 2023

RainbowMango commented Apr 21, 2023

mkloveyy commented Apr 23, 2023 •

edited

Loading

CharlesQQ commented Apr 25, 2023

CharlesQQ commented Apr 25, 2023

mkloveyy commented Apr 25, 2023 •

edited

Loading

mkloveyy commented Apr 26, 2023 •

edited

Loading

XiShanYongYe-Chang commented Apr 26, 2023

mkloveyy commented Apr 26, 2023

chaunceyjiang commented Apr 26, 2023

CharlesQQ commented Apr 28, 2023 •

edited

Loading

mkloveyy commented Apr 28, 2023 •

edited

Loading

How to handle dynamic parameters which need to be changed every time before generating works for member cluster? #3436

How to handle dynamic parameters which need to be changed every time before generating works for member cluster? #3436

Comments

mkloveyy commented Apr 20, 2023 • edited Loading

XiShanYongYe-Chang commented Apr 20, 2023

chaunceyjiang commented Apr 20, 2023

mkloveyy commented Apr 20, 2023 • edited Loading

chaunceyjiang commented Apr 20, 2023

mkloveyy commented Apr 20, 2023

Poor12 commented Apr 20, 2023

mkloveyy commented Apr 20, 2023

chaunceyjiang commented Apr 20, 2023

mkloveyy commented Apr 20, 2023

RainbowMango commented Apr 21, 2023

mkloveyy commented Apr 23, 2023 • edited Loading

CharlesQQ commented Apr 25, 2023

CharlesQQ commented Apr 25, 2023

mkloveyy commented Apr 25, 2023 • edited Loading

mkloveyy commented Apr 26, 2023 • edited Loading

XiShanYongYe-Chang commented Apr 26, 2023

mkloveyy commented Apr 26, 2023

chaunceyjiang commented Apr 26, 2023

CharlesQQ commented Apr 28, 2023 • edited Loading

mkloveyy commented Apr 28, 2023 • edited Loading

mkloveyy commented Apr 20, 2023 •

edited

Loading

mkloveyy commented Apr 20, 2023 •

edited

Loading

mkloveyy commented Apr 23, 2023 •

edited

Loading

mkloveyy commented Apr 25, 2023 •

edited

Loading

mkloveyy commented Apr 26, 2023 •

edited

Loading

CharlesQQ commented Apr 28, 2023 •

edited

Loading

mkloveyy commented Apr 28, 2023 •

edited

Loading