-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory utilization on 1.13.4 release #3057
Comments
That issue also mentions that the "misc" controller was introduced in kernel 5.13, which explains why this affects variants with the 5.15 kernel (k8s-1.24, k8s-1.25) but not older variants with the 5.10 kernel (k8s-1.22, k8s-1.23). The runc change also only affects the v1 controller, so k8s-1.26 variants using cgroup v2 are not affected. |
Which kubernetes version was used? I suspect that runc 1.1.6 binary creates misc cgroup, and then kubelet uses runc's libcontainer (of an older version) to remove it. That older libcontainer version doesn't know about misc (and systemd doesn't know about it either), so it's not removed. Thus, bumping the runc/libcontainer dependency to 1.1.6 in k8s should fix this. |
Ah I see it was k8s-1.24 and k8s-1.25. I will open PRs to backport kubernetes/kubernetes#117242 to those releases. |
What about excessive logging caused by runc 1.1.6 "Path not found" issue ? seems to also be related to this issue ? |
@yeazelm Two questions
|
as well as:
Not sure if those are the ones you were looking for, but can grab the entries again and get more of them if you think that would help. |
I was asking about systemd logs related to an error like this one:
What happens here, kubelet asks systemd to remove the unit, and times out wairing for reply from systemd. The timeout is long enough -- 30 seconds. I was wondering if there are some logs emitted by systemd related to the above named slice. |
This would be awesome, as currently this is the proposed fix, but no one had it tested yet. |
@kolyshkin I took kubernetes/kubernetes#117682 manually and have version As for the log entries when this is failing, it really just spams:
coupled with:
I don't see any other log entries (although its a bit tough since we end up getting quite a few of these so I could be missing other messages if they are infrequent). |
Looks like this has been addressed with #3074. |
Image I'm using:
ami-035c9bfb9c905837c
- but I believe this is with any AWS K8S variant. I've reproduced this on x86_64.What I expected to happen:
The cluster nodes to work without filling up on memory and errors to come in the logs about "Timed out while waiting for systemd to remove" in the
kubelet
logs.What actually happened:
How to reproduce the problem:
I launch a cluster with 2 nodes, add in a some load (in this case simulated by a webserver and some pods calling the webserver) and add in a CronJob as follows:
After some time, the memory on the node fills up with
kubelet
taking up GB of ram and the logs contain logs of errors with failing to clean up the cgroups.This seems to be a problem when lots of pods are cycled through. It may also have to do with the CronJob terminating the container since just simulating load and deleting the running containers manually doesn't seem to trigger it.
This does not happen on 1.13.3.
The text was updated successfully, but these errors were encountered: