-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics-server reporting inconsistent numbers of control plane nodes #803
Comments
Could you provide more information about Raw API result(as issue #792) and logs of metrics-server? |
Here's the logs from metrics-server when running the specific query (running Request logs
And the output of command output
|
I ran the following code:
on a metrics-server node to run these output from control plane node
|
Just poking in to see if anything's going on. I'm a bit flummoxed with this issue. |
I don't see any problems from the metrics above, but I noticed that the startup time of the node test-us-west-1-control-plane-577lp is |
The three nodes have been running normally, as far as I can tell. I'm not sure why was a 46-second difference in the timestamps of those two. Moreover, I'm not sure why it's just an issue with the control-plane nodes. I have taken a look at this dozens, if not hundreds, of times, in the past several weeks, and not once have I see any of the three worker nodes not show up. And again, metrics-server is always getting 200s when pulling the data from the nodes, whether control plane or worker. Is there a reason why a node wouldn't show up in the metrics-server in memory store even after metrics-server got the data? |
Yeah, I can't got the reason, but we really found there was a 46-second difference in the timestamps of those two. |
FYI we don't support bitnami images as we don't even know what MS version they use or if they do any code changes. Please confirm if I understood the problem, Kubelet reports invalid node start time for control plane nodes resulting in MS sometimes not reporting node metrics for those nodes? |
I agree with @serathius, the reason is |
This means that v0.5.0 should not use Kubelet start time. I think we should fix this and release v0.5.1. @yangjunmyfm192085 what do you think? |
ping @yangjunmyfm192085 |
Ok, Let me prepare for it |
Fix was implemented and released in v0.5.1 |
@techstep Please confirm if that fixes the issue for you. |
What happened:
When I run
kubectl top nodes
, orkubectl get nodemetrics
on a k8s cluster with metrics-server, I almost always have at least one control-plane node unaccounted for. The missing control plane node(s) change every minute with every run. All three control plane nodes are up and healthy, and the worker nodes show up all the time.What you expected to happen:
I expected to see all three worker nodes, and all three control plane nodes.
Anything else we need to know?:
I have looked through the metrics-server logs, and found that the requests to the nodes, control plane and worker, received 200 responses; moreover, manually making those requests returned metrics I was expecting to see.
While the control planes flicker in and out of existence on the aforementioned commands, the actual number and type of pods remains consistent, and the metrics for the pods look completely fine.
The problem persists whether I am running on one or two replicas.
We are running metrics-server on the control plane, because we could not get metrics for pods running on the control plane otherwise.
Environment:
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): kubeadm on top of OpenStack using ClusterAPI
Container Network Setup (flannel, calico, etc.): calico
Kubernetes version (use
kubectl version
): 1.21 (client), 1.20 (server)Metrics Server manifest
spoiler for Metrics Server manifest:
spoiler for Kubelet config:
spolier for Status of Metrics API:
/kind bug
The text was updated successfully, but these errors were encountered: