The replicas of deployment is incorrectly when the related HPA is abnormal, #4109

Rains6 · 2023-10-09T09:25:11Z

What happened:
The hpaReplicasSyncer controller is enabled. When the hpa delivered to the member cluster is abnormal, the desiredReplicas of the hpa is 0. In this case, the replicas synchronized to the control plane deployment are incorrect. Expected to use currentReplicas instead of desiredReplicas as the calculated value when hpa is abnormal

What you expected to happen:
Expected to use currentReplicas instead of desiredReplicas as the calculated value when hpa is abnormal.

How to reproduce it (as minimally and precisely as possible):
1.The hpaReplicasSyncer controller is enabled. Delivering Deployment and HPA to member cluster A.
2.The HPA in cluster A is abnormal. In this case, the value of desiredReplicas of the HPA is 0.

3.On the control plane, the replicas of the deployment is 0, which is expected to be 1.

Anything else we need to know?:

Environment:

Karmada version: v1.7.0.alpha3
kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
Others:

The text was updated successfully, but these errors were encountered:

chaunceyjiang · 2023-10-13T09:07:07Z

/assign

chaunceyjiang · 2023-10-25T05:33:26Z

The root cause of this issue is the instability of HPA.

The current implementation of #4072 heavily relies on HPA. If there is an exception with HPA, the number of replicas synchronized from the member cluster to the control plane becomes meaningless. And since HPA itself also heavily relies on the stability of metrics-server, HPA itself becomes even more unstable.

There are two failures that can occur with HPA:

Exception with metrics-server.
Accidental deletion of HPA.

Therefore, we are trying to introduce a new mechanism to avoid strong dependency on HPA:
Solution 1:
Directly query the scale sub-resource of workloads in the member cluster. This can accurately obtain the number of replicas for workloads. However, it cannot be used in Karmada's PULL mode.

Solution 2:
Aggregate status resources for workloads in the control plane, but for some custom resources, there may not be replica-related infos in their status field.

chaunceyjiang · 2023-10-25T05:36:00Z

@XiShanYongYe-Chang @jwcesign @lxtywypc @RainbowMango Do you have any other solutions?

XiShanYongYe-Chang · 2023-10-25T08:59:29Z

Look for more people's ideas.
/cc @GitHubxsy

lxtywypc · 2023-10-30T02:13:20Z

Humm...In fact we chose solution 2 for our own implement. We introduced some 'parser's to tell what replicas is for each kind of workload.

We also consider that if it is necessary to expand the InterpretStatus in resource-interpreter, or introduce a new InterpreterOperation, to intrepret some replicas info into status of work. We believe these info could help us do more in the future.

XiShanYongYe-Chang · 2023-10-31T03:50:07Z

We introduced some 'parser's to tell what replicas is for each kind of workload.

Doesn't this require a new component?

We also consider that if it is necessary to expand the InterpretStatus in resource-interpreter, or introduce a new InterpreterOperation, to intrepret some replicas info into status of work. We believe these info could help us do more in the future.

Can you expand on what's relevant to the current issue? And we can start a new issue to talk about the rest.

lxtywypc · 2023-11-01T07:20:40Z

Doesn't this require a new component?

We hard-coded some parsers in our own project.

Can you expand on what's relevant to the current issue? And we can start a new issue to talk about the rest.

I mean that if we could introduce a new hook point to interpret actual replica-related info in each member clusters into status of work, we could use these info directly in hpaReplicasSyncer.

Like this:

apiVersion: work.karmada.io/v1alpha1
kind: Work
metadata:
  name: workload-example
  namespace: karmada-es-cluster1
spec:
  workload:
  # ...
status:
  manifestStatuses:
  - status:
  # ...
  replicas: 1  # new replica-related field, could be used in hpaReplicasSyncer
  readyReplicas: 1 # new replica-related field

chaunceyjiang · 2023-11-22T14:30:41Z

I mean that if we could introduce a new hook point to interpret actual replica-related info in each member clusters into status of work, we could use these info directly in hpaReplicasSyncer.

I think this is a good idea.

RainbowMango · 2023-11-23T08:37:01Z

I mean that if we could introduce a new hook point to interpret actual replica-related info in each member clusters into status of work, we could use these info directly in hpaReplicasSyncer.

I get it.
The first thing we need to do is extend the ReflectStatus to get replica-related info, such as replica(as desired replica) and readyReplicas(as current ready replicas).

After that, we need to extend the Work API to record the info, then it will be used by hpaReplicasSyncer.

All these works seem dedicated to hpaReplicasSyncer, can this info be used in other scenarios? I'm wondering if it's worth doing it this way.

lxtywypc · 2023-11-24T03:00:51Z

All these works seem dedicated to hpaReplicasSyncer, can this info be used in other scenarios? I'm wondering if it's worth doing it this way.

Now it seems dedicated to hpaReplicasSyncer, but I believe the replicas related could help do more in the future, especially on scheduling.

Maybe we could invite more others to raise their mind.

XiShanYongYe-Chang · 2024-05-31T07:18:01Z

/close

karmada-bot · 2024-05-31T07:18:05Z

@XiShanYongYe-Chang: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Rains6 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 9, 2023

chaunceyjiang mentioned this issue Oct 13, 2023

feat: Refactor hpa_replicas_syncer_controller using a more stable solution #4132

Closed

karmada-bot assigned chaunceyjiang Oct 13, 2023

XiShanYongYe-Chang mentioned this issue Oct 13, 2023

The replicas of deployment cannot be changed after hpaReplicasSyncer controller is enabled. #4110

Open

chaosi-zju mentioned this issue Mar 13, 2024

sync deployment replicas when it is controlled by hpa #4707

Merged

karmada-bot closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The replicas of deployment is incorrectly when the related HPA is abnormal, #4109

The replicas of deployment is incorrectly when the related HPA is abnormal, #4109

Rains6 commented Oct 9, 2023

chaunceyjiang commented Oct 13, 2023

chaunceyjiang commented Oct 25, 2023 •

edited

Loading

chaunceyjiang commented Oct 25, 2023

XiShanYongYe-Chang commented Oct 25, 2023

lxtywypc commented Oct 30, 2023

XiShanYongYe-Chang commented Oct 31, 2023

lxtywypc commented Nov 1, 2023

chaunceyjiang commented Nov 22, 2023

RainbowMango commented Nov 23, 2023

lxtywypc commented Nov 24, 2023

XiShanYongYe-Chang commented May 31, 2024

karmada-bot commented May 31, 2024

The replicas of deployment is incorrectly when the related HPA is abnormal, #4109

The replicas of deployment is incorrectly when the related HPA is abnormal, #4109

Comments

Rains6 commented Oct 9, 2023

chaunceyjiang commented Oct 13, 2023

chaunceyjiang commented Oct 25, 2023 • edited Loading

chaunceyjiang commented Oct 25, 2023

XiShanYongYe-Chang commented Oct 25, 2023

lxtywypc commented Oct 30, 2023

XiShanYongYe-Chang commented Oct 31, 2023

lxtywypc commented Nov 1, 2023

chaunceyjiang commented Nov 22, 2023

RainbowMango commented Nov 23, 2023

lxtywypc commented Nov 24, 2023

XiShanYongYe-Chang commented May 31, 2024

karmada-bot commented May 31, 2024

chaunceyjiang commented Oct 25, 2023 •

edited

Loading