Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cadvisor metrics in master are inconsistent - not all calls return the correct metrics #15974

Closed
smarterclayton opened this issue Aug 24, 2017 · 9 comments

Comments

@smarterclayton
Copy link
Contributor

Running the metrics dump from master (in this case alpha.0 and latest) I get only a portion of metrics each time. The number of metrics should be consistent from run to run.

○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
     338
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
     338
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
       0
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
       0
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      50
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
     568
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
       0
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
       0
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
       0
○ oc get --raw /metrics/cadvisor --server https://10.1.2.2:10250 | grep container_name | wc -l
      82
@smarterclayton
Copy link
Contributor Author

@sjenning reported by the CM ops team, I hadn't noticed this before but it's happening in in both alpha.0 and master latest for 3.7. Could be something related to a patch we carry.

@smarterclayton
Copy link
Contributor Author

I see

E0824 23:09:06.952339    9754 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /rootfs/var/lib/docker/overlay2/38b26f8b877f3d280460b5da0110b62ba954808b44ad5fa92da5c936262945d8 with output stdout: , stderr: du: cannot access '/rootfs/var/lib/docker/overlay2/38b26f8b877f3d280460b5da0110b62ba954808b44ad5fa92da5c936262945d8': No such file or directory
 - exit status 1, rootInodeErr: cmd [find /rootfs/var/lib/docker/overlay2/38b26f8b877f3d280460b5da0110b62ba954808b44ad5fa92da5c936262945d8 -xdev -printf .] failed. stderr: find: '/rootfs/var/lib/docker/overlay2/38b26f8b877f3d280460b5da0110b62ba954808b44ad5fa92da5c936262945d8': No such file or directory
; err: exit status 1, extraDiskErr: du command failed on /rootfs/var/lib/docker/containers/949106cc41597b695e2f9088adb22026efad3c2d9757c04a15b512c8de6ced04 with output stdout: , stderr: du: cannot access '/rootfs/var/lib/docker/containers/949106cc41597b695e2f9088adb22026efad3c2d9757c04a15b512c8de6ced04': No such file or directory
 - exit status 1

in the logs but may be unrelated.

@smarterclayton
Copy link
Contributor Author

Also seeing

W0824 22:36:57.971384    9754 helpers.go:771] eviction manager: no observation found for eviction signal allocatableNodeFs.available

@smarterclayton
Copy link
Contributor Author

W0824 21:37:15.871049    9754 helpers.go:771] eviction manager: no observation found for eviction signal allocatableNodeFs.available
E0824 21:37:25.167629    9754 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db with output stdout: 1280412	/rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db
, stderr: du: cannot access '/rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db/merged/proc/9754/fdinfo/317': No such file or directory
du: cannot access '/rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db/merged/proc/32673/task/32673/fd/4': No such file or directory
du: cannot access '/rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db/merged/proc/32673/task/32673/fdinfo/4': No such file or directory
du: cannot access '/rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db/merged/proc/32673/fd/4': No such file or directory
du: cannot access '/rootfs/var/lib/docker/overlay2/640c4cfbd8bd91643972edeaa5bd97135e2edc29717cc7c3f42d9f70099cc1db/merged/proc/32673/fdinfo/4': No such file or directory
 - exit status 1, rootInodeErr: <nil>, extraDiskErr: <nil>

@smarterclayton
Copy link
Contributor Author

I don't see those log lines correlating with repeatedly calling cadvisor, so may be unrelated.

@sjenning sjenning self-assigned this Aug 25, 2017
@derekwaynecarr
Copy link
Member

we need google/cadvisor#1681

@smarterclayton
Copy link
Contributor Author

Hrm, we have that already.

@derekwaynecarr
Copy link
Member

looks like this: kubernetes/kubernetes#51473

@smarterclayton
Copy link
Contributor Author

Cherrypigking

openshift-merge-robot added a commit that referenced this issue Sep 1, 2017
Automatic merge from submit-queue

UPSTREAM: 51473: Fix cAdvisor prometheus metrics

Fixes #15974

@sjenning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants