You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently both alerts (KubeCPUQuotaOvercommit and KubeMemortQuotaOvercommit) compute the total available capacity using the current amount of nodes in the cluster.
However, this alert can become really noisy if the current amount of nodes is not the maximum amount of nodes available.
Today, a lot of the time k8s clusters are deployed with cluster autoscaler or come with it builtin (AKS, GKE...).
We should have a way to influence those alerts to retrieve the maximum amount of nodes instead of looking at the current one.
What's the general idea for the enhancement?
Currently both alerts (KubeCPUQuotaOvercommit and KubeMemortQuotaOvercommit) compute the total available capacity using the current amount of nodes in the cluster.
However, this alert can become really noisy if the current amount of nodes is not the maximum amount of nodes available.
Today, a lot of the time k8s clusters are deployed with cluster autoscaler or come with it builtin (AKS, GKE...).
We should have a way to influence those alerts to retrieve the maximum amount of nodes instead of looking at the current one.
See this metric that could be helpful for example: see https://github.com/kubernetes/autoscaler/blob/213a8595ea2bddf433dd56e50c31ca868ef1da80/cluster-autoscaler/metrics/metrics.go#L157-L163
Please provide any helpful snippets.
No response
What parts of the codebase does the enhancement target?
Alerts
Anything else relevant to the enhancement that would help with the triage process?
No response
I agree to the following terms:
The text was updated successfully, but these errors were encountered: