-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics-server doesn't report stats for pods which have initContainers #792
Comments
/cc @serathius |
cc @dgrisonnet @yangjunmyfm192085 |
/assgin |
/assign |
I analyzed the data above.Summarized as follows,
|
Can we reach out to SIG-Node and ask is metric information about terminated pods is expected? If so, could we ask if that has changed recently? Whether Kubelet endpoints and CRI metrics protos should report metrics for terminated pod should be standardized by SIG Node. |
Yeah, I open one issue #103368 for this question. |
@yangjunmyfm192085 Can you test manually what happens when we run pod with init container on kind? adding an e2e test would be also great. |
ok, Let me try to reproduce it. |
As a workaround, i removed the initContainer (did what it did manually), and it worked If you want you can use the cluster we are using in order to simulate it easily git clone https://github.com/kubevirt/cluster-network-addons-operator
hope it works out of the box, at the worst case need to install some package (and make sure docker is install of course) thanks |
Thank you for the information, this is very helpful. |
Do you mean the versions ?
BTW In order to ssh into the nodes once you install the provider if needed you can use |
ok, thanks |
hey @oshoval it looks like you're using cri-o. Can I ask what your kubelet config is? specifically for the runtime-endpoint? It is possible these metrics are coming from cadvisor, as cri-o uses the cadvisor stats provider by default |
Hope i did it right Note that if i remove the initContainer it does works,
|
Hi guys. I'm running into the same issue (i.e. that HPA's do not work on pods with init containers). My kubelets are returning metrics for the missing pods and I'm getting "Skipped container CPU/memory metric" for containers belonging to the missing pods. I took a quick look at the code and I think I see why this is happening. This loop does not record the pod at all if decodePodStats returns false. DecodePodStats returns false if either decodeCPU or decodeMemory return 0 for any container in the pod. This behavior appears to contradict this comment in the loop. I'm curious what you guys think. |
as the issue #103368 discussed, |
Current diagnosis is that the fact that there are init containers reported in by Kubelet, is a bug in container runtime you are using. As a workaround please downgrade to MS v0.4.4. @yangjunmyfm192085 that's the current status of the issue? If I remember we got confirmation from node team, but you didn't close any bugs on our side? Are you planning to continue working on this issue? |
Yeah, I did not close the issue in time. We have got confirmation from node team. |
close this issue. |
@yangjunmyfm192085: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Unfortunately the linked issue was closed for inactivity without any resolution, so until |
@lbogdan I don't think you will get a response to a closed ticket. You might want to create a new ticket and suggest your approach to maintainers. If the change is not huge why don't you create a PR and seek approval from maintainers? It will be great for everybody. |
Hey @shashankn91 , thanks for the heads-up! This has actually been fixed in 1.23, and doing a bit of digging, I tracked it back to this PR: kubernetes/kubernetes#103424 , refactoring |
What happened:
Installed latest metrics-server
ran
kubectl top pod
Saw that all pods are showing information, beside pod that have initContainer
What you expected to happen:
metrics-server should show info for all pods
Anything else we need to know?:
Set
getting this for the missing pod
Environment:
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
Vanilla
Container Network Setup (flannel, calico, etc.):
Calico
Kubernetes version (use
kubectl version
):v1.21.2
Metrics Server manifest
Just latest with
--kubelet-insecure-tls
and debug settings as abovespoiler for Metrics Server manifest:
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
resources:
verbs:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
resources:
verbs:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
name: metrics-server
namespace: kube-system
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
name: metrics-server
namespace: kube-system
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
- --v=5
- --logtostderr
image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
spoiler for Kubelet config:
spoiler for Metrics Server logs:
I0628 10:12:48.675595 1 decode.go:105] "Skipped container CPU metric" containerName="ovs-cni-plugin" pod="cluster-network-addons/ovs-cni-amd64-5b266" err="Got UsageCoreNanoSeconds equal zero"
I0628 10:12:48.675625 1 decode.go:109] "Skipped container memory metric" containerName="ovs-cni-plugin" pod="cluster-network-addons/ovs-cni-amd64-5b266" err="Got WorkingSetBytes equal zero"
spolier for Status of Metrics API:
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: k8s-app=metrics-server
Annotations:
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2021-06-28T14:14:26Z
Managed Fields:
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
.:
k:{"type":"Available"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
Manager: kube-apiserver
Operation: Update
Time: 2021-06-28T14:14:26Z
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:labels:
.:
f:k8s-app:
f:spec:
f:group:
f:groupPriorityMinimum:
f:insecureSkipTLSVerify:
f:service:
.:
f:name:
f:namespace:
f:port:
f:version:
f:versionPriority:
Manager: oc
Operation: Update
Time: 2021-06-28T14:14:26Z
Resource Version: 957
UID: 85149324-a1ac-4a1a-8b6d-e3980770bcfa
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-06-28T14:14:56Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events:
Raw API result
we can see the real container do have values, just not the init one
Pod manifest (running instance)
/kind bug
The text was updated successfully, but these errors were encountered: