-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNI Plugin Not Initialized on some Bottlerocket nodes #2365
Comments
@shay-ul it is the When you describe the node, do you see memory consumption at 100%? |
Thanks for the feedback, unfortunately (or not?) I'm still waiting for this to happen again, for some reason all nodes worked perfectly for the past couple of days. |
Gotcha, so I would check in |
@jdn5126 We're speaking about nodes which are provisioned with Karpenter and never get to a "Ready" state. Most of the nodes do not have this issue, but every once in a while we have a newly provisioned node which is stuck NotReady. kubectl describe node
this is the output of kubectl describe pod aws-node
As you can see this situation is very strange since the node doesn't report requests, so why is default-scheduler declaring insufficient memory?
running
There is no memory request for the aws-node daemonset, only CPU request. I manually added memory requests for the pod, but the pod is still stuck Pending. Update: Thanks! |
|
@shay-ul Got it, I subscribed to bottlerocket-os/bottlerocket#3076 so that I can follow along. One note, it looks like your |
What happened:
We are running Bottlerocket nodes on EKS 1.25 with Karpenter and some very basic user data:
The cluster is configured for Custom Networking and prefix-delegation.
Occasunally (not too often) a node will be stuck NotReady. While exploring the journal on the node, we see the usual error which indicates something is not working with the VPC CNI:
"Error syncing pod, skipping" err="network is not ready: container runtime network notready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
We do not have IP Address shortage on our subnets.
The interesting thing is that the
/var/log/aws-routed-eni
directory is missing, so is the suppot script which should be located at/opt/cni/bin/aws-cni-support.sh
I have read that for the CNI Plugin to initialize (IPAMD), the
aws-node
daemonset should spin up on the node first.The
aws-node
pod that should be scheduled to the node is stuck Pending and has the following events:I cannot wrap my head around why "Insufficient memory" is even a thing, since we're speaking off a
c5a.2xlarge
node.Attach logs
What you expected to happen:
A node should spin up Ready, aws-routed-eni directory with ipamd logs should exist, and
aws-cni-support.sh
should also exist.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
EKS 1.25
kubectl version
):Major:"1", Minor:"25+", GitVersion:"v1.25.6-eks-48e63af"
v1.12.6-eksbuild.1
cat /etc/os-release
):Bottlerocket OS 1.13.4 (aws-k8s-1.25)
uname -a
):5.15.102
The text was updated successfully, but these errors were encountered: