metrics-server reporting inconsistent numbers of control plane nodes #803

techstep · 2021-07-27T20:39:19Z

What happened:

When I run kubectl top nodes, or kubectl get nodemetrics on a k8s cluster with metrics-server, I almost always have at least one control-plane node unaccounted for. The missing control plane node(s) change every minute with every run. All three control plane nodes are up and healthy, and the worker nodes show up all the time.

What you expected to happen:

I expected to see all three worker nodes, and all three control plane nodes.

Anything else we need to know?:

I have looked through the metrics-server logs, and found that the requests to the nodes, control plane and worker, received 200 responses; moreover, manually making those requests returned metrics I was expecting to see.
While the control planes flicker in and out of existence on the aforementioned commands, the actual number and type of pods remains consistent, and the metrics for the pods look completely fine.
The problem persists whether I am running on one or two replicas.
We are running metrics-server on the control plane, because we could not get metrics for pods running on the control plane otherwise.

Environment:

Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.): kubeadm on top of OpenStack using ClusterAPI
Container Network Setup (flannel, calico, etc.): calico
Kubernetes version (use kubectl version): 1.21 (client), 1.20 (server)
Metrics Server manifest

spoiler for Metrics Server manifest:


apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "26"
      meta.helm.sh/release-name: metrics-server
      meta.helm.sh/release-namespace: metrics-server
    creationTimestamp: "2021-07-13T18:41:53Z"
    generation: 26
 labels:
      app.kubernetes.io/instance: metrics-server
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: metrics-server
      helm.sh/chart: metrics-server-5.8.14
    name: metrics-server
    namespace: metrics-server
    resourceVersion: "11957101"
    uid: `[redacted]`
  spec:
    progressDeadlineSeconds: 600
    replicas: 2
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        app.kubernetes.io/instance: metrics-server
        app.kubernetes.io/name: metrics-server
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          ad.datadoghq.com/nginx-ingress-controller.check_names: '["kube_metrics_server"]'
          ad.datadoghq.com/nginx-ingress-controller.init_configs: '[{}]'
          ad.datadoghq.com/nginx-ingress-controller.instances: |
            [
              {
                "prometheus_url": "https://%%host%%:443/metrics"
              }
            ]
          enable.version-checker.io/metrics-server: "true"
          override-url.version-checker.io/metrics-server: bitnami/metrics-server
        creationTimestamp: null
        labels:
          app.kubernetes.io/instance: metrics-server
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: metrics-server
          helm.sh/chart: metrics-server-5.8.14
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: node-role.kubernetes.io/master
                  operator: Exists
        containers:
        - command:
          - /pod_nanny
          - --config-dir=/etc/config
          - --cpu=100m
          - --extra-cpu=7m
          - --memory=300Mi
          - --extra-memory=3Mi
          - --threshold=10
          - --deployment=metrics-server
          - --container=metrics-server
          env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
          - name: ADDON_NAME
            value: metrics
          image: [image_mirror]/k8s.gcr.io/addon-resizer:1.8.11
          imagePullPolicy: IfNotPresent
          name: pod-nanny
          resources:
            limits:
              cpu: 100m
              memory: 20Mi
            requests:
              cpu: 100m
              memory: 20Mi
          securityContext:
            runAsGroup: 65534
            runAsUser: 65534
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /etc/config
            name: nanny-config-volume
        - args:
          - --secure-port=8443
          - --cert-dir=/tmp
          - --kubelet-insecure-tls=true
          - --kubelet-preferred-address-types=\[InternalDNS,InternalIP,ExternalDNS,ExternalIP\]
          - --profiling=true
          command:
          - metrics-server
          image: [image_mirror]/bitnami/metrics-server:0.5.0
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /livez
              port: https
              scheme: HTTPS
            initialDelaySeconds: 40
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: metrics-server
          ports:
          - containerPort: 8443
            hostPort: 8443
            name: https
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readyz
              port: https
              scheme: HTTPS
            initialDelaySeconds: 40
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: 142m
              memory: 318Mi
            requests:
              cpu: 142m
              memory: 318Mi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
            runAsGroup: 10001
            runAsNonRoot: true
            runAsUser: 10001
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /etc/config
            name: nanny-config-volume
          - mountPath: /tmp
            name: tmpdir
        dnsPolicy: ClusterFirst
        hostNetwork: true
        imagePullSecrets:
        - name: regcred-pseudo
        priorityClassName: highest-platform
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: metrics-server
        serviceAccountName: metrics-server
        terminationGracePeriodSeconds: 30
        tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
        volumes:
        - configMap:
            defaultMode: 420
            name: nanny-config-metrics-server
          name: nanny-config-volume
        - emptyDir: {}
          name: tmpdir
  status:
    availableReplicas: 2
    conditions:
    - lastTransitionTime: "2021-07-27T20:09:32Z"
      lastUpdateTime: "2021-07-27T20:09:32Z"
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    - lastTransitionTime: "2021-07-13T18:41:54Z"
      lastUpdateTime: "2021-07-27T20:10:01Z"
      message: ReplicaSet "metrics-server-[redacted]" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
    observedGeneration: 26
    readyReplicas: 2
    replicas: 2
    updatedReplicas: 2
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Kubelet config:

spoiler for Kubelet config:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: [redacted]
    server: https://[redacted]:6443
  name: default-cluster
contexts:
- context:
    cluster: default-cluster
    namespace: default
    user: default-auth
  name: default-context
current-context: default-context
kind: Config
preferences: {}
users:
- name: default-auth
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

Status of Metrics API:

spolier for Status of Metrics API:

kubectl describe apiservice v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       app.kubernetes.io/instance=metrics-server
            app.kubernetes.io/managed-by=Helm
            app.kubernetes.io/name=metrics-server
            helm.sh/chart=metrics-server-5.8.14
Annotations:  meta.helm.sh/release-name: metrics-server
            meta.helm.sh/release-namespace: metrics-server
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
Creation Timestamp:  2021-07-13T18:57:27Z
Resource Version:    11462943
UID:                 86dd3191-802e-4695-996a-017984296eff
Spec:
Group:                     metrics.k8s.io
Group Priority Minimum:    100
Insecure Skip TLS Verify:  true
Service:
  Name:            metrics-server
  Namespace:       metrics-server
  Port:            443
Version:           v1beta1
Version Priority:  100
Status:
Conditions:
  Last Transition Time:  2021-07-25T06:47:48Z
  Message:               all checks passed
  Reason:                Passed
  Status:                True
  Type:                  Available
Events:                    <none>

/kind bug

The text was updated successfully, but these errors were encountered:

yangjunmyfm192085 · 2021-07-28T06:32:40Z

Could you provide more information about Raw API result(as issue #792) and logs of metrics-server?
Let's also analyse the data from kubelet

techstep · 2021-08-03T19:06:33Z

Here's the logs from metrics-server when running the specific query (running metrics-server with -v=8). The logs for the request imply that everything is fine, that everything is returning with 200s, but kubectl top nodes in this case returns one control plane node, not the three.

Request logs

metrics-server I0803 18:58:17.923289       1 server.go:136] "Scraping metrics"
metrics-server I0803 18:58:17.923369       1 scraper.go:114] "Scraping metrics from nodes" nodeCount=6
metrics-server I0803 18:58:17.927587       1 scraper.go:136] "Scraping node" node="test-us-west-1-md-0-dmqmp"
metrics-server I0803 18:58:17.927811       1 round_trippers.go:432] GET https://100.113.136.117:10250/stats/summary?only_cpu_and_memory=true
metrics-server I0803 18:58:17.927825       1 round_trippers.go:438] Request Headers:
metrics-server I0803 18:58:17.927836       1 round_trippers.go:442]     Authorization: Bearer <masked>
metrics-server I0803 18:58:17.927842       1 round_trippers.go:442]     User-Agent: metrics-server/v0.5.0 (linux/amd64) kubernetes/d766094
metrics-server I0803 18:58:17.941558       1 scraper.go:136] "Scraping node" node="test-us-west-1-control-plane-577lp"
metrics-server I0803 18:58:17.941628       1 round_trippers.go:432] GET https://100.113.137.187:10250/stats/summary?only_cpu_and_memory=true
metrics-server I0803 18:58:17.941636       1 round_trippers.go:438] Request Headers:
metrics-server I0803 18:58:17.941643       1 round_trippers.go:442]     User-Agent: metrics-server/v0.5.0 (linux/amd64) kubernetes/d766094
metrics-server I0803 18:58:17.941651       1 round_trippers.go:442]     Authorization: Bearer <masked>
metrics-server I0803 18:58:17.949539       1 scraper.go:136] "Scraping node" node="test-us-west-1-control-plane-7hkxd"
metrics-server I0803 18:58:17.949590       1 round_trippers.go:432] GET https://100.113.137.251:10250/stats/summary?only_cpu_and_memory=true
metrics-server I0803 18:58:17.949612       1 round_trippers.go:438] Request Headers:
metrics-server I0803 18:58:17.949620       1 round_trippers.go:442]     User-Agent: metrics-server/v0.5.0 (linux/amd64) kubernetes/d766094
metrics-server I0803 18:58:17.949659       1 round_trippers.go:442]     Authorization: Bearer <masked>
metrics-server I0803 18:58:17.954348       1 round_trippers.go:457] Response Status: 200 OK in 26 milliseconds
metrics-server I0803 18:58:17.954364       1 round_trippers.go:460] Response Headers:
metrics-server I0803 18:58:17.954373       1 round_trippers.go:463]     Content-Type: application/json
metrics-server I0803 18:58:17.954378       1 round_trippers.go:463]     Date: Tue, 03 Aug 2021 18:58:17 GMT
metrics-server I0803 18:58:17.954475       1 scraper.go:136] "Scraping node" node="test-us-west-1-md-0-zm24b"
metrics-server I0803 18:58:17.954550       1 round_trippers.go:432] GET https://100.113.136.132:10250/stats/summary?only_cpu_and_memory=true
metrics-server I0803 18:58:17.954562       1 round_trippers.go:438] Request Headers:
metrics-server I0803 18:58:17.954569       1 round_trippers.go:442]     User-Agent: metrics-server/v0.5.0 (linux/amd64) kubernetes/d766094
metrics-server I0803 18:58:17.954596       1 round_trippers.go:442]     Authorization: Bearer <masked>
metrics-server I0803 18:58:17.961507       1 scraper.go:136] "Scraping node" node="test-us-west-1-control-plane-qkcnm"
metrics-server I0803 18:58:17.961577       1 round_trippers.go:432] GET https://100.113.137.63:10250/stats/summary?only_cpu_and_memory=true
metrics-server I0803 18:58:17.961590       1 round_trippers.go:438] Request Headers:
metrics-server I0803 18:58:17.961596       1 round_trippers.go:442]     User-Agent: metrics-server/v0.5.0 (linux/amd64) kubernetes/d766094
metrics-server I0803 18:58:17.961603       1 round_trippers.go:442]     Authorization: Bearer <masked>
metrics-server I0803 18:58:17.966518       1 scraper.go:136] "Scraping node" node="test-us-west-1-md-0-9shcw"
metrics-server I0803 18:58:17.966592       1 round_trippers.go:432] GET https://100.113.136.191:10250/stats/summary?only_cpu_and_memory=true
metrics-server I0803 18:58:17.966603       1 round_trippers.go:438] Request Headers:
metrics-server I0803 18:58:17.966609       1 round_trippers.go:442]     User-Agent: metrics-server/v0.5.0 (linux/amd64) kubernetes/d766094
metrics-server I0803 18:58:17.966616       1 round_trippers.go:442]     Authorization: Bearer <masked>
metrics-server I0803 18:58:17.981479       1 round_trippers.go:457] Response Status: 200 OK in 14 milliseconds
metrics-server I0803 18:58:17.981498       1 round_trippers.go:460] Response Headers:
metrics-server I0803 18:58:17.981507       1 round_trippers.go:463]     Date: Tue, 03 Aug 2021 18:58:17 GMT
metrics-server I0803 18:58:17.981512       1 round_trippers.go:463]     Content-Type: application/json
metrics-server I0803 18:58:17.990917       1 round_trippers.go:457] Response Status: 200 OK in 36 milliseconds
metrics-server I0803 18:58:17.990934       1 round_trippers.go:460] Response Headers:
metrics-server I0803 18:58:17.990940       1 round_trippers.go:463]     Content-Type: application/json
metrics-server I0803 18:58:17.990945       1 round_trippers.go:463]     Date: Tue, 03 Aug 2021 18:58:17 GMT
metrics-server I0803 18:58:18.008613       1 round_trippers.go:457] Response Status: 200 OK in 66 milliseconds
metrics-server I0803 18:58:18.008626       1 round_trippers.go:460] Response Headers:
metrics-server I0803 18:58:18.008631       1 round_trippers.go:463]     Content-Type: application/json
metrics-server I0803 18:58:18.008663       1 round_trippers.go:463]     Date: Tue, 03 Aug 2021 18:58:18 GMT
metrics-server I0803 18:58:18.042276       1 round_trippers.go:457] Response Status: 200 OK in 80 milliseconds
metrics-server I0803 18:58:18.042293       1 round_trippers.go:460] Response Headers:
metrics-server I0803 18:58:18.042301       1 round_trippers.go:463]     Content-Type: application/json
metrics-server I0803 18:58:18.042306       1 round_trippers.go:463]     Date: Tue, 03 Aug 2021 18:58:18 GMT
metrics-server I0803 18:58:18.052463       1 round_trippers.go:457] Response Status: 200 OK in 102 milliseconds
metrics-server I0803 18:58:18.052490       1 round_trippers.go:460] Response Headers:
metrics-server I0803 18:58:18.052502       1 round_trippers.go:463]     Content-Type: application/json
metrics-server I0803 18:58:18.052511       1 round_trippers.go:463]     Date: Tue, 03 Aug 2021 18:58:18 GMT
metrics-server I0803 18:58:18.052921       1 scraper.go:157] "Scrape finished" duration="129.533693ms" nodeCount=6 podCount=81
metrics-server I0803 18:58:18.052930       1 server.go:139] "Storing metrics"
metrics-server I0803 18:58:18.053128       1 server.go:144] "Scraping cycle complete"

And the output of kubectl top nodes --use-protocol-buffers:

command output

NAME                                           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
test-us-west-1-control-plane-7hkxd   380m         9%     2824Mi          73%
test-us-west-1-md-0-9shcw            418m         10%    2798Mi          47%
test-us-west-1-md-0-dmqmp            312m         7%     2723Mi          46%
test-us-west-1-md-0-zm24b            301m         7%     2749Mi          47%
test-us-west-1-control-plane-577lp   <unknown>                           <unknown>               <unknown>               <unknown>
test-us-west-1-control-plane-qkcnm   <unknown>                           <unknown>               <unknown>               <unknown>

techstep · 2021-08-03T20:06:16Z

I ran the following code:

while true; do 
for i in 187 251 63; do 
curl -ik -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://100.113.137.$i:10250/stats/summary\?only_cpu_and_memory=true -o control-plane-$i-`date +"%Y%m%d-%H%M%S"`.json; 
done;
sleep 10
done

on a metrics-server node to run these stats/summary?only_cpu_and_memory=true queries. In particular, I ran these queries before, during, and after the time I was getting missing nodes from the output of kubectl top nodes, which was running on a 10-second watch loop.

output from control plane node

{
 "node": {
  "nodeName": "test-us-west-1-control-plane-577lp",
  "systemContainers": [
   {
    "name": "pods",
    "startTime": "2021-08-03T19:33:46Z",
    "cpu": {
     "time": "2021-08-03T19:34:36Z",
     "usageNanoCores": 270639291,
     "usageCoreNanoSeconds": 1171995323473710
    },
    "memory": {
     "time": "2021-08-03T19:34:36Z",
     "availableBytes": 2518097920,
     "usageBytes": 1873137664,
     "workingSetBytes": 1608773632,
     "rssBytes": 1586286592,
     "pageFaults": 0,
     "majorPageFaults": 0
    }
   },
   {
    "name": "kubelet",
    "startTime": "2021-05-25T05:37:19Z",
    "cpu": {
     "time": "2021-08-03T19:34:26Z",
     "usageNanoCores": 37437406,
     "usageCoreNanoSeconds": 172566180358587
    },
    "memory": {
     "time": "2021-08-03T19:34:26Z",
     "usageBytes": 102686720,
     "workingSetBytes": 80879616,
     "rssBytes": 64901120,
     "pageFaults": 2426534022,
     "majorPageFaults": 34419
    }
   }
  ],
  "startTime": "2021-08-03T19:33:50Z",
  "cpu": {
   "time": "2021-08-03T19:34:36Z",
   "usageNanoCores": 364450073,
   "usageCoreNanoSeconds": 1547019621854448
  },
  "memory": {
   "time": "2021-08-03T19:34:36Z",
   "availableBytes": 1233833984,
   "usageBytes": 3665260544,
   "workingSetBytes": 2893037568,
   "rssBytes": 1818714112,
   "pageFaults": 48411,
   "majorPageFaults": 99
  }
 },
 "pods": [
  {
   "podRef": {
    "name": "metrics-server-695c48797c-tptnf",
    "namespace": "metrics-server",
    "uid": "bf93b656-1582-49cf-bc10-b9620a3555a0"
   },
   "startTime": "2021-07-28T17:50:27Z",
   "containers": [
    {
     "name": "pod-nanny",
     "startTime": "2021-07-28T17:50:27Z",
     "cpu": {
      "time": "2021-08-03T19:34:32Z",
      "usageNanoCores": 180233,
      "usageCoreNanoSeconds": 100820965334
     },
     "memory": {
      "time": "2021-08-03T19:34:32Z",
      "availableBytes": 10649600,
      "usageBytes": 13307904,
      "workingSetBytes": 10321920,
      "rssBytes": 7180288,
      "pageFaults": 11616,
      "majorPageFaults": 1881
     }
    },
    {
     "name": "metrics-server",
     "startTime": "2021-07-28T17:50:28Z",
     "cpu": {
      "time": "2021-08-03T19:34:38Z",
      "usageNanoCores": 3390945,
      "usageCoreNanoSeconds": 2049563278770
     },
     "memory": {
      "time": "2021-08-03T19:34:38Z",
      "availableBytes": 301834240,
      "usageBytes": 44208128,
      "workingSetBytes": 31612928,
      "rssBytes": 30502912,
      "pageFaults": 56298,
      "majorPageFaults": 4257
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:37Z",
    "usageNanoCores": 4143885,
    "usageCoreNanoSeconds": 2150410328900
   },
   "memory": {
    "time": "2021-08-03T19:34:37Z",
    "availableBytes": 311881728,
    "usageBytes": 58118144,
    "workingSetBytes": 42536960,
    "rssBytes": 37556224,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "calico-node-pmg6h",
    "namespace": "kube-system",
    "uid": "986e7980-868f-45c6-8073-4a3416125e08"
   },
   "startTime": "2021-06-08T12:07:49Z",
   "containers": [
    {
     "name": "calico-node",
     "startTime": "2021-06-08T12:07:54Z",
     "cpu": {
      "time": "2021-08-03T19:34:33Z",
      "usageNanoCores": 19360202,
      "usageCoreNanoSeconds": 107480343458651
     },
     "memory": {
      "time": "2021-08-03T19:34:33Z",
      "usageBytes": 113590272,
      "workingSetBytes": 108130304,
      "rssBytes": 61505536,
      "pageFaults": 4223765865,
      "majorPageFaults": 23133
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:39Z",
    "usageNanoCores": 22898353,
    "usageCoreNanoSeconds": 107480929830994
   },
   "memory": {
    "time": "2021-08-03T19:34:39Z",
    "usageBytes": 114692096,
    "workingSetBytes": 109232128,
    "rssBytes": 61288448,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "test-g8psk",
    "namespace": "default",
    "uid": "15e3279a-b266-4e36-a461-80eca8ef0a7b"
   },
   "startTime": "2021-06-03T21:41:57Z",
   "containers": [
    {
     "name": "shell",
     "startTime": "2021-06-17T00:09:42Z",
     "cpu": {
      "time": "2021-08-03T19:34:39Z",
      "usageNanoCores": 0,
      "usageCoreNanoSeconds": 220740717
     },
     "memory": {
      "time": "2021-08-03T19:34:39Z",
      "usageBytes": 2985984,
      "workingSetBytes": 2732032,
      "rssBytes": 24576,
      "pageFaults": 10923,
      "majorPageFaults": 132
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:40Z",
    "usageNanoCores": 0,
    "usageCoreNanoSeconds": 256198188
   },
   "memory": {
    "time": "2021-08-03T19:34:40Z",
    "usageBytes": 3596288,
    "workingSetBytes": 3342336,
    "rssBytes": 0,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "falco-frdvw",
    "namespace": "falco",
    "uid": "67a68e22-a866-42c4-be68-7a028f81835f"
   },
   "startTime": "2021-07-12T19:17:55Z",
   "containers": [
    {
     "name": "falco",
     "startTime": "2021-07-12T19:18:00Z",
     "cpu": {
      "time": "2021-08-03T19:34:26Z",
      "usageNanoCores": 18779041,
      "usageCoreNanoSeconds": 44824479061195
     },
     "memory": {
      "time": "2021-08-03T19:34:26Z",
      "availableBytes": 1012006912,
      "usageBytes": 61906944,
      "workingSetBytes": 61734912,
      "rssBytes": 58675200,
      "pageFaults": 310299,
      "majorPageFaults": 297
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:36Z",
    "usageNanoCores": 24610354,
    "usageCoreNanoSeconds": 44824825915501
   },
   "memory": {
    "time": "2021-08-03T19:34:36Z",
    "availableBytes": 1011412992,
    "usageBytes": 62636032,
    "workingSetBytes": 62328832,
    "rssBytes": 58556416,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "goldpinger-9h28f",
    "namespace": "goldpinger",
    "uid": "de9ed254-5677-43c2-9dbc-bd43e6e0bacf"
   },
   "startTime": "2021-06-17T22:16:13Z",
   "containers": [
    {
     "name": "goldpinger",
     "startTime": "2021-06-17T22:16:18Z",
     "cpu": {
      "time": "2021-08-03T19:34:38Z",
      "usageNanoCores": 656481,
      "usageCoreNanoSeconds": 3607231811312
     },
     "memory": {
      "time": "2021-08-03T19:34:38Z",
      "availableBytes": 63569920,
      "usageBytes": 27750400,
      "workingSetBytes": 20316160,
      "rssBytes": 20336640,
      "pageFaults": 243903,
      "majorPageFaults": 11715
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:34Z",
    "usageNanoCores": 719180,
    "usageCoreNanoSeconds": 3607272866898
   },
   "memory": {
    "time": "2021-08-03T19:34:34Z",
    "availableBytes": 62742528,
    "usageBytes": 28577792,
    "workingSetBytes": 21143552,
    "rssBytes": 20054016,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "kube-proxy-qthj7",
    "namespace": "kube-system",
    "uid": "b97a8af3-a80b-4926-bf32-8e444a43825c"
   },
   "startTime": "2021-05-25T05:37:23Z",
   "containers": [
    {
     "name": "kube-proxy",
     "startTime": "2021-05-25T05:37:23Z",
     "cpu": {
      "time": "2021-08-03T19:34:34Z",
      "usageNanoCores": 7087456,
      "usageCoreNanoSeconds": 17138928259050
     },
     "memory": {
      "time": "2021-08-03T19:34:34Z",
      "usageBytes": 34574336,
      "workingSetBytes": 27643904,
      "rssBytes": 19759104,
      "pageFaults": 1027524432,
      "majorPageFaults": 17985
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:40Z",
    "usageNanoCores": 4594782,
    "usageCoreNanoSeconds": 17138941972237
   },
   "memory": {
    "time": "2021-08-03T19:34:40Z",
    "usageBytes": 35254272,
    "workingSetBytes": 28323840,
    "rssBytes": 19701760,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "kube-scheduler-test-us-west-1-control-plane-577lp",
    "namespace": "kube-system",
    "uid": "9be8cb4627e7e5ad4c3f8acabd4b49b3"
   },
   "startTime": "2021-05-25T05:37:24Z",
   "containers": [
    {
     "name": "kube-scheduler",
     "startTime": "2021-05-25T05:37:25Z",
     "cpu": {
      "time": "2021-08-03T19:34:42Z",
      "usageNanoCores": 3353050,
      "usageCoreNanoSeconds": 13818935012985
     },
     "memory": {
      "time": "2021-08-03T19:34:42Z",
      "usageBytes": 45481984,
      "workingSetBytes": 40767488,
      "rssBytes": 33591296,
      "pageFaults": 62337,
      "majorPageFaults": 13332
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:27Z",
    "usageNanoCores": 2089896,
    "usageCoreNanoSeconds": 13818925016594
   },
   "memory": {
    "time": "2021-08-03T19:34:27Z",
    "usageBytes": 46161920,
    "workingSetBytes": 41447424,
    "rssBytes": 33570816,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "node-problem-detector-t6vms",
    "namespace": "node-problem-detector",
    "uid": "a60918ff-1b57-46f3-863e-c9a9e6efd363"
   },
   "startTime": "2021-06-21T20:59:47Z",
   "containers": [
    {
     "name": "node-problem-detector",
     "startTime": "2021-06-21T20:59:51Z",
     "cpu": {
      "time": "2021-08-03T19:34:33Z",
      "usageNanoCores": 326959,
      "usageCoreNanoSeconds": 1567543118508
     },
     "memory": {
      "time": "2021-08-03T19:34:33Z",
      "availableBytes": 52670464,
      "usageBytes": 19095552,
      "workingSetBytes": 14438400,
      "rssBytes": 13729792,
      "pageFaults": 259380,
      "majorPageFaults": 4620
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:33Z",
    "usageNanoCores": 318479,
    "usageCoreNanoSeconds": 1567558263836
   },
   "memory": {
    "time": "2021-08-03T19:34:33Z",
    "availableBytes": 52084736,
    "usageBytes": 19681280,
    "workingSetBytes": 15024128,
    "rssBytes": 13643776,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "datadog-v8x66",
    "namespace": "datadog",
    "uid": "e0047b06-4ed1-4cca-97d8-11edf720d102"
   },
   "startTime": "2021-07-13T18:52:57Z",
   "containers": [
    {
     "name": "process-agent",
     "startTime": "2021-07-13T18:53:02Z",
     "cpu": {
      "time": "2021-08-03T19:34:36Z",
      "usageNanoCores": 3032691,
      "usageCoreNanoSeconds": 7315957210925
     },
     "memory": {
      "time": "2021-08-03T19:34:36Z",
      "availableBytes": 370860032,
      "usageBytes": 53485568,
      "workingSetBytes": 48570368,
      "rssBytes": 43458560,
      "pageFaults": 272184,
      "majorPageFaults": 5973
     }
    },
    {
     "name": "agent",
     "startTime": "2021-07-13T18:53:02Z",
     "cpu": {
      "time": "2021-08-03T19:34:39Z",
      "usageNanoCores": 125317680,
      "usageCoreNanoSeconds": 226387905934640
     },
     "memory": {
      "time": "2021-08-03T19:34:39Z",
      "availableBytes": 75259904,
      "usageBytes": 367538176,
      "workingSetBytes": 327393280,
      "rssBytes": 343683072,
      "pageFaults": 71382069,
      "majorPageFaults": 11121
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:33Z",
    "usageNanoCores": 129694728,
    "usageCoreNanoSeconds": 233703275027230
   },
   "memory": {
    "time": "2021-08-03T19:34:33Z",
    "availableBytes": 443330560,
    "usageBytes": 423813120,
    "workingSetBytes": 378753024,
    "rssBytes": 389124096,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "kube-controller-manager-test-us-west-1-control-plane-577lp",
    "namespace": "kube-system",
    "uid": "a40fb931ece5fcc5db1085981df97fea"
   },
   "startTime": "2021-05-25T05:37:24Z",
   "containers": [
    {
     "name": "kube-controller-manager",
     "startTime": "2021-05-25T05:37:25Z",
     "cpu": {
      "time": "2021-08-03T19:34:29Z",
      "usageNanoCores": 15165979,
      "usageCoreNanoSeconds": 106956688334361
     },
     "memory": {
      "time": "2021-08-03T19:34:29Z",
      "usageBytes": 122839040,
      "workingSetBytes": 105414656,
      "rssBytes": 98041856,
      "pageFaults": 119559,
      "majorPageFaults": 28347
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:31Z",
    "usageNanoCores": 13252067,
    "usageCoreNanoSeconds": 106956756198148
   },
   "memory": {
    "time": "2021-08-03T19:34:31Z",
    "usageBytes": 123641856,
    "workingSetBytes": 106217472,
    "rssBytes": 97947648,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "etcd-test-us-west-1-control-plane-577lp",
    "namespace": "kube-system",
    "uid": "d6ac6e5189a596324d657fb2283dc044"
   },
   "startTime": "2021-05-25T05:37:24Z",
   "containers": [
    {
     "name": "etcd",
     "startTime": "2021-05-25T05:37:26Z",
     "cpu": {
      "time": "2021-08-03T19:34:26Z",
      "usageNanoCores": 25744216,
      "usageCoreNanoSeconds": 146208956956459
     },
     "memory": {
      "time": "2021-08-03T19:34:26Z",
      "usageBytes": 142880768,
      "workingSetBytes": 108838912,
      "rssBytes": 107782144,
      "pageFaults": 1905585,
      "majorPageFaults": 100749
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:35Z",
    "usageNanoCores": 21348447,
    "usageCoreNanoSeconds": 146209178208128
   },
   "memory": {
    "time": "2021-08-03T19:34:35Z",
    "usageBytes": 143663104,
    "workingSetBytes": 109621248,
    "rssBytes": 107687936,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  },
  {
   "podRef": {
    "name": "kube-apiserver-test-us-west-1-control-plane-577lp",
    "namespace": "kube-system",
    "uid": "adfe87522ebbb0293cc2814e0806dc5f"
   },
   "startTime": "2021-05-25T05:37:24Z",
   "containers": [
    {
     "name": "kube-apiserver",
     "startTime": "2021-05-25T05:37:25Z",
     "cpu": {
      "time": "2021-08-03T19:34:37Z",
      "usageNanoCores": 52532418,
      "usageCoreNanoSeconds": 349209320675957
     },
     "memory": {
      "time": "2021-08-03T19:34:37Z",
      "usageBytes": 785354752,
      "workingSetBytes": 663539712,
      "rssBytes": 747573248,
      "pageFaults": 4107840,
      "majorPageFaults": 35904
     }
    }
   ],
   "cpu": {
    "time": "2021-08-03T19:34:42Z",
    "usageNanoCores": 48563154,
    "usageCoreNanoSeconds": 349209562111403
   },
   "memory": {
    "time": "2021-08-03T19:34:42Z",
    "usageBytes": 785985536,
    "workingSetBytes": 664170496,
    "rssBytes": 747425792,
    "pageFaults": 0,
    "majorPageFaults": 0
   }
  }
 ]
 }

techstep · 2021-08-10T15:38:49Z

Just poking in to see if anything's going on. I'm a bit flummoxed with this issue.

yangjunmyfm192085 · 2021-08-10T16:52:36Z

I don't see any problems from the metrics above, but I noticed that the startup time of the node test-us-west-1-control-plane-577lp is "startTime": "2021-08-03T19:33:50Z",, and timestamp reported by metrics is "2021-08-03T19:34:36Z", , Is the node running normally?

techstep · 2021-08-11T16:44:27Z

The three nodes have been running normally, as far as I can tell. I'm not sure why was a 46-second difference in the timestamps of those two.

Moreover, I'm not sure why it's just an issue with the control-plane nodes. I have taken a look at this dozens, if not hundreds, of times, in the past several weeks, and not once have I see any of the three worker nodes not show up. And again, metrics-server is always getting 200s when pulling the data from the nodes, whether control plane or worker.

Is there a reason why a node wouldn't show up in the metrics-server in memory store even after metrics-server got the data?

yangjunmyfm192085 · 2021-08-19T07:27:38Z

Yeah, I can't got the reason, but we really found there was a 46-second difference in the timestamps of those two.
we need at least two cycles of data before before exposing nodeMetrics after the node is started.
So could we try to get help from node team?

yangjunmyfm192085 · 2021-08-19T07:42:24Z

@techstep, thanks for your feedback, I open an issue 104445 about sig node to track it.
If any other information, please help to add it

serathius · 2021-09-15T18:14:50Z

FYI we don't support bitnami images as we don't even know what MS version they use or if they do any code changes.

Please confirm if I understood the problem, Kubelet reports invalid node start time for control plane nodes resulting in MS sometimes not reporting node metrics for those nodes?

yangjunmyfm192085 · 2021-09-16T00:51:59Z

I agree with @serathius, the reason is Kubelet reports invalid node start time for control plane nodes resulting in MS sometimes not reporting node metrics for those nodes

serathius · 2021-09-16T06:16:24Z

This means that v0.5.0 should not use Kubelet start time. I think we should fix this and release v0.5.1. @yangjunmyfm192085 what do you think?

serathius · 2021-09-23T18:22:28Z

ping @yangjunmyfm192085

yangjunmyfm192085 · 2021-09-24T00:40:44Z

ping @yangjunmyfm192085

Ok, Let me prepare for it

serathius · 2021-09-26T16:14:20Z

Fix was implemented and released in v0.5.1

serathius · 2021-09-26T16:14:48Z

@techstep Please confirm if that fixes the issue for you.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 27, 2021

yangjunmyfm192085 mentioned this issue Aug 19, 2021

Kubernetes have been running normally, but the control-plane nodes's metrics value of startTime is similar to the timestamp kubernetes/kubernetes#104445

Closed

yangjunmyfm192085 mentioned this issue Sep 24, 2021

Don‘t use Kubelet start time for metrics-server. #838

Merged

serathius closed this as completed Sep 26, 2021

serathius mentioned this issue Oct 2, 2021

error: metrics not available yet #828

Closed

uGiFarukh mentioned this issue Jan 5, 2022

Upgrade: metrics server version bump from v0.5.0 to v0.5.2 k3s-io/k3s#4867

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics-server reporting inconsistent numbers of control plane nodes #803

metrics-server reporting inconsistent numbers of control plane nodes #803

techstep commented Jul 27, 2021

yangjunmyfm192085 commented Jul 28, 2021

techstep commented Aug 3, 2021

techstep commented Aug 3, 2021

techstep commented Aug 10, 2021

yangjunmyfm192085 commented Aug 10, 2021

techstep commented Aug 11, 2021 •

edited

Loading

yangjunmyfm192085 commented Aug 19, 2021

yangjunmyfm192085 commented Aug 19, 2021

serathius commented Sep 15, 2021

yangjunmyfm192085 commented Sep 16, 2021

serathius commented Sep 16, 2021

serathius commented Sep 23, 2021

yangjunmyfm192085 commented Sep 24, 2021

serathius commented Sep 26, 2021

serathius commented Sep 26, 2021

metrics-server reporting inconsistent numbers of control plane nodes #803

metrics-server reporting inconsistent numbers of control plane nodes #803

Comments

techstep commented Jul 27, 2021

yangjunmyfm192085 commented Jul 28, 2021

techstep commented Aug 3, 2021

techstep commented Aug 3, 2021

techstep commented Aug 10, 2021

yangjunmyfm192085 commented Aug 10, 2021

techstep commented Aug 11, 2021 • edited Loading

yangjunmyfm192085 commented Aug 19, 2021

yangjunmyfm192085 commented Aug 19, 2021

serathius commented Sep 15, 2021

yangjunmyfm192085 commented Sep 16, 2021

serathius commented Sep 16, 2021

serathius commented Sep 23, 2021

yangjunmyfm192085 commented Sep 24, 2021

serathius commented Sep 26, 2021

serathius commented Sep 26, 2021

techstep commented Aug 11, 2021 •

edited

Loading