Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-server doesn't report stats for pods which have initContainers #792

Closed
oshoval opened this issue Jun 29, 2021 · 24 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@oshoval
Copy link

oshoval commented Jun 29, 2021

What happened:
Installed latest metrics-server
ran kubectl top pod
Saw that all pods are showing information, beside pod that have initContainer

What you expected to happen:
metrics-server should show info for all pods

Anything else we need to know?:

Set

--v=5
--logtostderr

getting this for the missing pod

I0628 10:12:48.675595       1 decode.go:105] "Skipped container CPU metric" containerName="ovs-cni-plugin" pod="cluster-network-addons/ovs-cni-amd64-5b266" err="Got UsageCoreNanoSeconds equal zero"
I0628 10:12:48.675625       1 decode.go:109] "Skipped container memory metric" containerName="ovs-cni-plugin" pod="cluster-network-addons/ovs-cni-amd64-5b266" err="Got WorkingSetBytes equal zero"

Environment:

  • Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
    Vanilla

  • Container Network Setup (flannel, calico, etc.):
    Calico

  • Kubernetes version (use kubectl version):
    v1.21.2

  • Metrics Server manifest
    Just latest with --kubelet-insecure-tls and debug settings as above

spoiler for Metrics Server manifest:

apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:

  • apiGroups:
    • metrics.k8s.io
      resources:
    • pods
    • nodes
      verbs:
    • get
    • list
    • watch

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:

  • apiGroups:
    • ""
      resources:
    • pods
    • nodes
    • nodes/stats
    • namespaces
    • configmaps
      verbs:
    • get
    • list
    • watch

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:

  • kind: ServiceAccount
    name: metrics-server
    namespace: kube-system

apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:

  • name: https
    port: 443
    protocol: TCP
    targetPort: https
    selector:
    k8s-app: metrics-server

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
- --v=5
- --logtostderr
image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

  • Kubelet config:
spoiler for Kubelet config:
  • Metrics server logs:
spoiler for Metrics Server logs:

I0628 10:12:48.675595 1 decode.go:105] "Skipped container CPU metric" containerName="ovs-cni-plugin" pod="cluster-network-addons/ovs-cni-amd64-5b266" err="Got UsageCoreNanoSeconds equal zero"
I0628 10:12:48.675625 1 decode.go:109] "Skipped container memory metric" containerName="ovs-cni-plugin" pod="cluster-network-addons/ovs-cni-amd64-5b266" err="Got WorkingSetBytes equal zero"

  • Status of Metrics API:
spolier for Status of Metrics API:
kubectl describe apiservice v1beta1.metrics.k8s.io

Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2021-06-28T14:14:26Z
Managed Fields:
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
.:
k:{"type":"Available"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
Manager: kube-apiserver
Operation: Update
Time: 2021-06-28T14:14:26Z
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.:
f:k8s-app:
f:spec:
f:group:
f:groupPriorityMinimum:
f:insecureSkipTLSVerify:
f:service:
.:
f:name:
f:namespace:
f:port:
f:version:
f:versionPriority:
Manager: oc
Operation: Update
Time: 2021-06-28T14:14:26Z
Resource Version: 957
UID: 85149324-a1ac-4a1a-8b6d-e3980770bcfa
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-06-28T14:14:56Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events:

Raw API result
we can see the real container do have values, just not the init one

{
  "podRef": {
    "name": "macvtap-cni-h42rc",
    "namespace": "cluster-network-addons",
    "uid": "08627988-1f04-4289-be98-5153651771a3"
  },
  "startTime": "2021-06-28T14:19:06Z",
  "containers": [
    {
      "name": "install-cni",
      "startTime": "2021-06-28T14:19:09Z",
      "cpu": {
        "time": "2021-06-29T07:06:32Z",
        "usageNanoCores": 0,
        "usageCoreNanoSeconds": 0
      },
      "memory": {
        "time": "2021-06-29T07:06:32Z",
        "workingSetBytes": 0
      },
      "rootfs": {
        "time": "2021-06-29T07:06:32Z",
        "availableBytes": 24373460992,
        "capacityBytes": 37569409024,
        "usedBytes": 0,
        "inodesFree": 18120932,
        "inodes": 18349504,
        "inodesUsed": 5
      },
      "logs": {
        "time": "2021-06-29T07:06:32Z",
        "availableBytes": 24373460992,
        "capacityBytes": 37569409024,
        "usedBytes": 0,
        "inodesFree": 18120932,
        "inodes": 18349504,
        "inodesUsed": 228572
      }
    },
    {
      "name": "macvtap-cni",
      "startTime": "2021-06-28T14:19:10Z",
      "cpu": {
        "time": "2021-06-29T07:06:45Z",
        "usageNanoCores": 5306,
        "usageCoreNanoSeconds": 309467434
      },
      "memory": {
        "time": "2021-06-29T07:06:45Z",
        "usageBytes": 28999680,
        "workingSetBytes": 13725696,
        "rssBytes": 12660736,
        "pageFaults": 1419,
        "majorPageFaults": 0
      },
      "rootfs": {
        "time": "2021-06-29T07:06:45Z",
        "availableBytes": 24373460992,
        "capacityBytes": 37569409024,
        "usedBytes": 0,
        "inodesFree": 18120932,
        "inodes": 18349504,
        "inodesUsed": 5
      },
      "logs": {
        "time": "2021-06-29T07:06:45Z",
        "availableBytes": 24373460992,
        "capacityBytes": 37569409024,
        "usedBytes": 4096,
        "inodesFree": 18120932,
        "inodes": 18349504,
        "inodesUsed": 228572
      }
    }
  ],
  "cpu": {
    "time": "2021-06-29T07:06:29Z",
    "usageNanoCores": 4323,
    "usageCoreNanoSeconds": 608707823
  },
  "memory": {
    "time": "2021-06-29T07:06:29Z",
    "usageBytes": 37138432,
    "workingSetBytes": 14024704,
    "rssBytes": 12718080,
    "pageFaults": 0,
    "majorPageFaults": 0
  },
  "network": {
    "time": "2021-06-29T07:06:40Z",
    "name": "eth0",
    "rxBytes": 224138727,
    "rxErrors": 0,
    "txBytes": 930994449,
    "txErrors": 0,
    "interfaces": [
      {
        "name": "cali70c151e3d40",
        "rxBytes": 82189921,
        "rxErrors": 0,
        "txBytes": 163137981,
        "txErrors": 0
      },
      {
        "name": "cali088ca729a2c",
        "rxBytes": 20642154,
        "rxErrors": 0,
        "txBytes": 15894567,
        "txErrors": 0
      },
      {
        "name": "cali57d919d4291",
        "rxBytes": 350269,
        "rxErrors": 0,
        "txBytes": 385145,
        "txErrors": 0
      },
      {
        "name": "cali7c9448c22e8",
        "rxBytes": 760,
        "rxErrors": 0,
        "txBytes": 996,
        "txErrors": 0
      },
      {
        "name": "eth0",
        "rxBytes": 224138727,
        "rxErrors": 0,
        "txBytes": 930994449,
        "txErrors": 0
      },
      {
        "name": "cali9b375788d78",
        "rxBytes": 9961582,
        "rxErrors": 0,
        "txBytes": 6183357,
        "txErrors": 0
      },
      {
        "name": "cali2ac6346d336",
        "rxBytes": 22046050,
        "rxErrors": 0,
        "txBytes": 44852008,
        "txErrors": 0
      },
      {
        "name": "calif736c75102b",
        "rxBytes": 97040936,
        "rxErrors": 0,
        "txBytes": 169912953,
        "txErrors": 0
      },
      {
        "name": "tunl0",
        "rxBytes": 8100979,
        "rxErrors": 0,
        "txBytes": 4420633,
        "txErrors": 0
      },
      {
        "name": "cali1f73d8531a3",
        "rxBytes": 9970369,
        "rxErrors": 0,
        "txBytes": 6183316,
        "txErrors": 0
      },
      {
        "name": "calib616e9be311",
        "rxBytes": 2634417,
        "rxErrors": 0,
        "txBytes": 23530357,
        "txErrors": 0
      }
    ]
  },
  "volume": [
    {
      "time": "2021-06-28T14:19:25Z",
      "availableBytes": 4701962240,
      "capacityBytes": 4701974528,
      "usedBytes": 12288,
      "inodesFree": 1147934,
      "inodes": 1147943,
      "inodesUsed": 9,
      "name": "kube-api-access-kjmdp"
    }
  ],
  "ephemeral-storage": {
    "time": "2021-06-29T07:06:45Z",
    "availableBytes": 24373460992,
    "capacityBytes": 37569409024,
    "usedBytes": 12288,
    "inodesFree": 18120932,
    "inodes": 18349504,
    "inodesUsed": 12
  },
  "process_stats": {
    "process_count": 0
  }
}

Pod manifest (running instance)

hades05 kubevirt (master) $ oc get pod -n cluster-network-addons macvtap-cni-dfjmk -oyaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2021-06-28T14:19:06Z"
  generateName: macvtap-cni-
  labels:
    app.kubernetes.io/component: network
    app.kubernetes.io/managed-by: cnao-operator
    controller-revision-hash: 5596886b88
    name: macvtap-cni
    pod-template-generation: "1"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/component: {}
          f:app.kubernetes.io/managed-by: {}
          f:controller-revision-hash: {}
          f:name: {}
          f:pod-template-generation: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"74e5f40c-365d-4759-a1d1-689b6823f629"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:affinity:
          .: {}
          f:nodeAffinity:
            .: {}
            f:requiredDuringSchedulingIgnoredDuringExecution:
              .: {}
              f:nodeSelectorTerms: {}
        f:containers:
          k:{"name":"macvtap-cni"}:
            .: {}
            f:command: {}
            f:envFrom: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources:
              .: {}
              f:requests:
                .: {}
                f:cpu: {}
                f:memory: {}
            f:securityContext:
              .: {}
              f:privileged: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/var/lib/kubelet/device-plugins"}:
                .: {}
                f:mountPath: {}
                f:name: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:hostNetwork: {}
        f:hostPID: {}
        f:initContainers:
          .: {}
          k:{"name":"install-cni"}:
            .: {}
            f:command: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:securityContext:
              .: {}
              f:privileged: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/host/opt/cni/bin"}:
                .: {}
                f:mountPath: {}
                f:mountPropagation: {}
                f:name: {}
        f:nodeSelector:
          .: {}
          f:beta.kubernetes.io/arch: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext: {}
        f:terminationGracePeriodSeconds: {}
        f:tolerations: {}
        f:volumes:
          .: {}
          k:{"name":"cni"}:
            .: {}
            f:hostPath:
              .: {}
              f:path: {}
              f:type: {}
            f:name: {}
          k:{"name":"deviceplugin"}:
            .: {}
            f:hostPath:
              .: {}
              f:path: {}
              f:type: {}
            f:name: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-06-28T14:19:06Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:initContainerStatuses: {}
        f:phase: {}
        f:podIP: {}
        f:podIPs:
          .: {}
          k:{"ip":"192.168.66.102"}:
            .: {}
            f:ip: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-06-28T14:19:11Z"
  name: macvtap-cni-dfjmk
  namespace: cluster-network-addons
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: macvtap-cni
    uid: 74e5f40c-365d-4759-a1d1-689b6823f629
  resourceVersion: "1691"
  uid: 26d70aed-409f-437d-841d-16f56b63bb65
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - node02
  containers:
  - command:
    - /macvtap-deviceplugin
    - -v
    - "3"
    - -logtostderr
    envFrom:
    - configMapRef:
        name: macvtap-deviceplugin-config
    image: quay.io/kubevirt/macvtap-cni@sha256:f20d5e56f8b8c1ab7e5a64e536b66f65aa688b2d1dc0b37e3c26c2af2b481266
    imagePullPolicy: IfNotPresent
    name: macvtap-cni
    resources:
      requests:
        cpu: 60m
        memory: 30Mi
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/kubelet/device-plugins
      name: deviceplugin
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-r42r2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  hostPID: true
  initContainers:
  - command:
    - cp
    - /macvtap-cni
    - /host/opt/cni/bin/macvtap
    image: quay.io/kubevirt/macvtap-cni@sha256:f20d5e56f8b8c1ab7e5a64e536b66f65aa688b2d1dc0b37e3c26c2af2b481266
    imagePullPolicy: IfNotPresent
    name: install-cni
    resources: {}
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/opt/cni/bin
      mountPropagation: Bidirectional
      name: cni
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-r42r2
      readOnly: true
  nodeName: node02
  nodeSelector:
    beta.kubernetes.io/arch: amd64
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - hostPath:
      path: /var/lib/kubelet/device-plugins
      type: ""
    name: deviceplugin
  - hostPath:
      path: /opt/cni/bin
      type: ""
    name: cni
  - name: kube-api-access-r42r2
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-06-28T14:19:09Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-06-28T14:19:10Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-06-28T14:19:10Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-06-28T14:19:06Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://b4ff79cbcedf92e1bcf91e58d4ee09c2c6163854e9662086bc431ac87a818a35
    image: quay.io/kubevirt/macvtap-cni@sha256:f20d5e56f8b8c1ab7e5a64e536b66f65aa688b2d1dc0b37e3c26c2af2b481266
    imageID: quay.io/kubevirt/macvtap-cni@sha256:f20d5e56f8b8c1ab7e5a64e536b66f65aa688b2d1dc0b37e3c26c2af2b481266
    lastState: {}
    name: macvtap-cni
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-06-28T14:19:09Z"
  hostIP: 192.168.66.102
  initContainerStatuses:
  - containerID: cri-o://9437ce2683226d9616aeffe9816981c275c76148b50d0f7d6d96803b06f5b23f
    image: quay.io/kubevirt/macvtap-cni@sha256:f20d5e56f8b8c1ab7e5a64e536b66f65aa688b2d1dc0b37e3c26c2af2b481266
    imageID: quay.io/kubevirt/macvtap-cni@sha256:f20d5e56f8b8c1ab7e5a64e536b66f65aa688b2d1dc0b37e3c26c2af2b481266
    lastState: {}
    name: install-cni
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: cri-o://9437ce2683226d9616aeffe9816981c275c76148b50d0f7d6d96803b06f5b23f
        exitCode: 0
        finishedAt: "2021-06-28T14:19:08Z"
        reason: Completed
        startedAt: "2021-06-28T14:19:08Z"
  phase: Running
  podIP: 192.168.66.102
  podIPs:
  - ip: 192.168.66.102
  qosClass: Burstable
  startTime: "2021-06-28T14:19:06Z"

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 29, 2021
@oshoval
Copy link
Author

oshoval commented Jun 29, 2021

/cc @serathius
Thanks

@serathius
Copy link
Contributor

cc @dgrisonnet @yangjunmyfm192085
Would you be interested in looking into it?

@yangjunmyfm192085
Copy link
Contributor

/assgin
Let me take a look

@yangjunmyfm192085
Copy link
Contributor

/assign

@yangjunmyfm192085
Copy link
Contributor

I analyzed the data above.Summarized as follows,

  • There are two containers in pod macvtap-cni-h42rc, install-cni and macvtap-cni.
    the status of install-cni is terminated, and the status of macvtap-cni is running.

  • we can get macvtap-cni's metrics, but it can be seen from the above that install-cni can also collect metrics, but the value is 0,For this scenario, we discard the pod’s metrics.

  • The strange thing is, why can we get the metrics of the terminated container?

  • I haven't reproduced locally.

  • Maybe it depends on runtime?

@serathius
Copy link
Contributor

serathius commented Jun 30, 2021

Can we reach out to SIG-Node and ask is metric information about terminated pods is expected? If so, could we ask if that has changed recently?

Whether Kubelet endpoints and CRI metrics protos should report metrics for terminated pod should be standardized by SIG Node.

@yangjunmyfm192085
Copy link
Contributor

Yeah, I open one issue #103368 for this question.

@serathius
Copy link
Contributor

@yangjunmyfm192085 Can you test manually what happens when we run pod with init container on kind? adding an e2e test would be also great.

@yangjunmyfm192085
Copy link
Contributor

@yangjunmyfm192085 Can you test manually what happens when we run pod with init container on kind? adding an e2e test would be also great.

ok, Let me try to reproduce it.
I used kind to run pod with init container some days ago, but it didn’t reproduce.
I need to analysis the configuration about #792 and #796

@oshoval
Copy link
Author

oshoval commented Jul 4, 2021

As a workaround, i removed the initContainer (did what it did manually), and it worked

If you want you can use the cluster we are using in order to simulate it easily

git clone https://github.com/kubevirt/cluster-network-addons-operator

make cluster-up
export KUBECONFIG=./_kubevirtci/_ci-configs/k8s-1.19/.kubeconfig
make cluster-sync
kubectl create -f _out/cluster-network-addons/99.0.0/network-addons-config-example.cr.yaml

hope it works out of the box, at the worst case need to install some package (and make sure docker is install of course)
let me know if you need anything, i am also available in your slack channel

thanks

@yangjunmyfm192085
Copy link
Contributor

Thank you for the information, this is very helpful.
could you provide the information of docker? I will continue to analyze it.

@oshoval
Copy link
Author

oshoval commented Jul 4, 2021

Do you mean the versions ?
I use docker-ce

Client: Docker Engine - Community
 Version:           20.10.1
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        831ebea
 Built:             Tue Dec 15 04:35:53 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       f001486
  Built:            Tue Dec 15 04:33:10 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

BTW

In order to ssh into the nodes once you install the provider if needed you can use
./cluster/ssh.sh node01

@yangjunmyfm192085
Copy link
Contributor

ok, thanks

@haircommander
Copy link

hey @oshoval it looks like you're using cri-o. Can I ask what your kubelet config is? specifically for the runtime-endpoint? It is possible these metrics are coming from cadvisor, as cri-o uses the cadvisor stats provider by default

@oshoval
Copy link
Author

oshoval commented Jul 15, 2021

Hi @haircommander

Hope i did it right
used
https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/#generate-the-configuration-file

Note that if i remove the initContainer it does works,
also it works out of the box for the pods that don't have initContainer, so i do bet its the initContainer "fault"

{
  "enableServer": true,
  "staticPodPath": "/etc/kubernetes/manifests",
  "syncFrequency": "1m0s",
  "fileCheckFrequency": "20s",
  "httpCheckFrequency": "20s",
  "address": "0.0.0.0",
  "port": 10250,
  "tlsCertFile": "/var/lib/kubelet/pki/kubelet.crt",
  "tlsPrivateKeyFile": "/var/lib/kubelet/pki/kubelet.key",
  "rotateCertificates": true,
  "authentication": {
    "x509": {
      "clientCAFile": "/etc/kubernetes/pki/ca.crt"
    },
    "webhook": {
      "enabled": true,
      "cacheTTL": "2m0s"
    },
    "anonymous": {
      "enabled": false
    }
  },
  "authorization": {
    "mode": "Webhook",
    "webhook": {
      "cacheAuthorizedTTL": "5m0s",
      "cacheUnauthorizedTTL": "30s"
    }
  },
  "registryPullQPS": 5,
  "registryBurst": 10,
  "eventRecordQPS": 5,
  "eventBurst": 10,
  "enableDebuggingHandlers": true,
  "healthzPort": 10248,
  "healthzBindAddress": "127.0.0.1",
  "oomScoreAdj": -999,
  "clusterDomain": "cluster.local",
  "clusterDNS": [
    "10.96.0.10"
  ],
  "streamingConnectionIdleTimeout": "4h0m0s",
  "nodeStatusUpdateFrequency": "10s",
  "nodeStatusReportFrequency": "5m0s",
  "nodeLeaseDurationSeconds": 40,
  "imageMinimumGCAge": "2m0s",
  "imageGCHighThresholdPercent": 85,
  "imageGCLowThresholdPercent": 80,
  "volumeStatsAggPeriod": "1m0s",
  "kubeletCgroups": "/systemd/system.slice",
  "cgroupsPerQOS": true,
  "cgroupDriver": "systemd",
  "cpuManagerPolicy": "none",
  "cpuManagerReconcilePeriod": "10s",
  "topologyManagerPolicy": "none",
  "runtimeRequestTimeout": "2m0s",
  "hairpinMode": "promiscuous-bridge",
  "maxPods": 110,
  "podPidsLimit": -1,
  "resolvConf": "/etc/resolv.conf",
  "cpuCFSQuota": true,
  "cpuCFSQuotaPeriod": "100ms",
  "nodeStatusMaxImages": 50,
  "maxOpenFiles": 1000000,
  "contentType": "application/vnd.kubernetes.protobuf",
  "kubeAPIQPS": 5,
  "kubeAPIBurst": 10,
  "serializeImagePulls": true,
  "evictionHard": {
    "imagefs.available": "15%",
    "memory.available": "100Mi",
    "nodefs.available": "10%",
    "nodefs.inodesFree": "5%"
  },
  "evictionPressureTransitionPeriod": "5m0s",
  "enableControllerAttachDetach": true,
  "makeIPTablesUtilChains": true,
  "iptablesMasqueradeBit": 14,
  "iptablesDropBit": 15,
  "featureGates": {
    "BlockVolume": true,
    "CSIBlockVolume": true,
    "IPv6DualStack": true,
    "VolumeSnapshotDataSource": true
  },
  "failSwapOn": true,
  "containerLogMaxSize": "10Mi",
  "containerLogMaxFiles": 5,
  "configMapAndSecretChangeDetectionStrategy": "Watch",
  "enforceNodeAllocatable": [
    "pods"
  ],
  "volumePluginDir": "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/",
  "logging": {
    "format": "text"
  },
  "enableSystemLogHandler": true,
  "kind": "KubeletConfiguration",
  "apiVersion": "kubelet.config.k8s.io/v1beta1"
}

@nick-oconnor
Copy link

nick-oconnor commented Jul 19, 2021

Hi guys. I'm running into the same issue (i.e. that HPA's do not work on pods with init containers). My kubelets are returning metrics for the missing pods and I'm getting "Skipped container CPU/memory metric" for containers belonging to the missing pods. I took a quick look at the code and I think I see why this is happening. This loop does not record the pod at all if decodePodStats returns false. DecodePodStats returns false if either decodeCPU or decodeMemory return 0 for any container in the pod. This behavior appears to contradict this comment in the loop. I'm curious what you guys think.

@yangjunmyfm192085
Copy link
Contributor

as the issue #103368 discussed, We only provide metrics for non-terminal pods, We only provide metrics for running containers, so if either decodeCPU or decodeMemory return 0 for any container in the pod, we want to discard pods with partial results

@serathius
Copy link
Contributor

Current diagnosis is that the fact that there are init containers reported in by Kubelet, is a bug in container runtime you are using. As a workaround please downgrade to MS v0.4.4.

@yangjunmyfm192085 that's the current status of the issue? If I remember we got confirmation from node team, but you didn't close any bugs on our side? Are you planning to continue working on this issue?

@yangjunmyfm192085
Copy link
Contributor

Yeah, I did not close the issue in time. We have got confirmation from node team.
Let me close the issue.

@yangjunmyfm192085
Copy link
Contributor

close this issue.
use issue #103368 to track , because we have got confirmation from node team.
/close

@k8s-ci-robot
Copy link
Contributor

@yangjunmyfm192085: Closing this issue.

In response to this:

close this issue.
use issue #103368 to track , because we have got confirmation from node team.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lbogdan
Copy link

lbogdan commented Dec 19, 2021

Unfortunately the linked issue was closed for inactivity without any resolution, so until sig-node addresses the issue on their end (which might never happen?), can we fix it in metrics-server? I (think I) figured out the piece of code that needs changing, and I could come up with a PR, if maintainers agree with this approach.

@shashankn91
Copy link

@lbogdan I don't think you will get a response to a closed ticket. You might want to create a new ticket and suggest your approach to maintainers. If the change is not huge why don't you create a PR and seek approval from maintainers? It will be great for everybody.

@lbogdan
Copy link

lbogdan commented May 1, 2022

Hey @shashankn91 , thanks for the heads-up!

This has actually been fixed in 1.23, and doing a bit of digging, I tracked it back to this PR: kubernetes/kubernetes#103424 , refactoring removeTerminatedContainerInfo() (which seemed to not work correctly, as per this comment) with filterTerminatedContainerInfoAndAssembleByPodCgroupKey(), which now filters the terminated containers as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

8 participants