Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket leak in Docker service #1078

Closed
kanor1306 opened this issue Nov 1, 2022 · 2 comments
Closed

Socket leak in Docker service #1078

kanor1306 opened this issue Nov 1, 2022 · 2 comments
Labels
duplicate This issue or pull request already exists

Comments

@kanor1306
Copy link

What happened: Updated our AMI to a new one and now node start to flap between Ready and NotReady after a few hours, as well as network issues like failed liveness probes, DNS resolution problems, etc

How to reproduce it (as minimally and precisely as possible): I don't have a reproduction recipe. This is a cluster that didn't have an issue and where the only change was the AMI version and the issues appeared.

Anything else we need to know?:
Our investigation shows something that looks like a connection leak in the system.

  • Socket count constantly increases until it breaks the node. See screenshot:

Screenshot 2022-10-31 at 14 33 36

  • Socket count in the pods is regular, no change from the previous AMI

Restarting the docker service in the host drops the socket count to a normal value, but immediately starts to grow again.

Environment:

  • AWS Region: us-east-1
  • Instance Type(s):
  • EKS Platform version: eks.11
  • Kubernetes version: 1.21
  • AMI Version:
  • Kernel: 5.4.217-126.408.amzn2.x86_64 #1 SMP Fri Oct 14 17:08:46 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Container runtime version: Docker 20.10.17
  • Release information
BASE_AMI_ID="ami-02dd04850de58599e"
BUILD_TIME="Mon Sep 26 21:55:27 UTC 2022"
BUILD_KERNEL="5.4.209-116.367.amzn2.x86_64"
ARCH="x86_64"

AMI without the issue:

  • Kernel: 5.4.209-116.363.amzn2.x86_64 \#1 SMP Wed Aug 10 21:19:18 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Release information
BASE_AMI_ID="ami-0f51d4d93d5bee36b"
BUILD_TIME="Wed Aug 24 01:01:12 UTC 2022"
BUILD_KERNEL="5.4.209-116.363.amzn2.x86_64"
ARCH="x86_64"
@cartermckinnon
Copy link
Member

There's some more info in #1071 , but we've recalled the AMI that included this kernel version. We have a growing number of reports of instability. We're working with Amazon Linux to address the kernel issue, and we've temporarily pinned the kernel in this AMI to the previous, known-stable version (#1072). An AMI release including the pinned kernel is working through our release pipeline now. I'm going to close this issue as a duplicate; we'll add more information to #1071 when we have it. Sorry for the hassle!

@cartermckinnon cartermckinnon closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2022
@cartermckinnon cartermckinnon added the duplicate This issue or pull request already exists label Nov 2, 2022
@kanor1306
Copy link
Author

Thanks for the answer @cartermckinnon and sorry for the duplicate! I was aware of #1071 but as I wasn't sure whether it was the same issue decided to create a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants