Node is "NotReady" and waiting at "Terminating" for hours #1573

ibalat · 2024-08-14T13:04:19Z

Description

Observed Behavior:

Node is "NotReady" status but EC2 instance still exists in aws ec2 instance list and status is "Running", checks are "Passed"
Node status reason is "NodeStatusUnknown", message "Kubelet stopped posting node status"
Pods are waiting at "Terminating"
karpenter have logs related with pods waiting at "Terminating" like:

{"level":"INFO","time":"2024-08-14T12:13:23.794Z","logger":"controller","message":"pod xxxx has a preferred Anti-Affinity which can prevent consolidation","commit":"490ef94","controller":"provisioner"}

related ec2 instance logged lastly message:

[  423.353932] [  21815]  1001 21815  1314351    45882   770048        0          1000 java
[  423.361183] [  22145] 65532 22145   475493    12653   364544        0          1000 controller
[  423.368709] [  22199]  1001 22199   914462    84514   987136        0          1000 java
[  423.376073] [  33276]     0 33276   295992      601   188416        0          -998 runc
[  423.383344] [  33288]     0 33288     3094       12    45056        0          -998 exe
[  423.390531] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1344160e_dca0_4e9d_be15_ea0b63efb5b2.slice/cri-containerd-496edffa072b6d7835989a0dfbce3c3071
1a32903c757baf4fcd460c9479f3a8.scope,task=java,pid=22199,uid=1001
[  423.412634] Out of memory: Killed process 22199 (java) total-vm:3657848kB, anon-rss:338056kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:964kB oom_score_adj:1000
[  425.563371] oom_reaper: reaped process 22199 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        2024-08-14T13:38:15+00:00

Expected Behavior:

Karpenter have to remove node if it's not ready and provision new node

Reproduction Steps (Please include YAML):
I don't have any idea. It occur periodically

Versions:

Chart Version: 0.37.0
Kubernetes Version (kubectl version): 1.30

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-08-14T13:04:27Z

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ibalat · 2024-08-14T18:31:26Z

omg, after 6h later, still pods at "Terminating" status and node is "NotReady".

btw, instance is m5.large. And I got new instance stdout logs:

[ 8080.945657] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/config/mysql/1 supports timestamps until 2038 (0x7fffffff)
[ 8080.956982] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/template-sql/mysql/2 supports timestamps until 2038 (0x7fffffff)
[ 8080.970168] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/template-sql/mysql/3 supports timestamps until 2038 (0x7fffffff)
[ 8080.981712] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/etl-sql/mysql/4 supports timestamps until 2038 (0x7fffffff)
[ 8080.993163] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/prefera-sql/mysql/5 supports timestamps until 2038 (0x7fffffff)
[ 8112.949302] pci 0000:00:1d.0: [1d0f:8061] type 00 class 0x010802
[ 8112.952794] pci 0000:00:1d.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 8112.956559] pci 0000:00:1d.0: enabling Extended Tags
[ 8112.960301] pci 0000:00:1d.0: BAR 0: assigned [mem 0xc0114000-0xc0117fff]
[ 8112.964132] nvme nvme3: pci function 0000:00:1d.0
[ 8112.967238] nvme 0000:00:1d.0: enabling device (0000 -> 0002)
[ 8112.972352] PCI Interrupt Link [LNKA] enabled at IRQ 11
[ 8112.980317] nvme nvme3: 2/0/0 default/read/poll queues
[ 8113.229053] pci 0000:00:1c.0: [1d0f:8061] type 00 class 0x010802
[ 8113.232693] pci 0000:00:1c.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 8113.236424] pci 0000:00:1c.0: enabling Extended Tags
[ 8113.240326] pci 0000:00:1c.0: BAR 0: assigned [mem 0xc0118000-0xc011bfff]
[ 8113.244141] nvme nvme4: pci function 0000:00:1c.0
[ 8113.247190] nvme 0000:00:1c.0: enabling device (0000 -> 0002)
[ 8113.256918] nvme nvme4: 2/0/0 default/read/poll queues
[ 8113.573770] EXT4-fs (nvme3n1): mounted filesystem with ordered data mode. Opts: (null)
[ 8114.159309] IPv6: ADDRCONF(NETDEV_CHANGE): enia89b8c83c9a: link becomes ready
[ 8114.163261] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 8114.319775] xfs filesystem being remounted at /var/lib/kubelet/pods/177f12fc-a42d-464f-bdf0-ad1f53080f8b/volume-subpaths/scripts/kafka/2 supports timestamps until 2038 (0x7fffffff)
[ 8114.734723] EXT4-fs (nvme4n1): mounted filesystem with ordered data mode. Opts: (null)
[ 8114.972359] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 8115.074504] xfs filesystem being remounted at /var/lib/kubelet/pods/ad839521-93a0-4010-8b3f-0980d2375063/volume-subpaths/scripts/kafka/2 supports timestamps until 2038 (0x7fffffff)
[ 8119.176023] pci 0000:00:1b.0: [1d0f:8061] type 00 class 0x010802
[ 8119.179548] pci 0000:00:1b.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 8119.183285] pci 0000:00:1b.0: enabling Extended Tags
[ 8119.187010] pci 0000:00:1b.0: BAR 0: assigned [mem 0xc011c000-0xc011ffff]
[ 8119.190838] nvme nvme5: pci function 0000:00:1b.0
[ 8119.193879] nvme 0000:00:1b.0: enabling device (0000 -> 0002)
[ 8119.203356] nvme nvme5: 2/0/0 default/read/poll queues
[ 8120.146390] EXT4-fs (nvme5n1): mounted filesystem with ordered data mode. Opts: (null)
[ 8120.658980] IPv6: ADDRCONF(NETDEV_CHANGE): eni61bfec53e4d: link becomes ready
[ 8120.662926] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 8121.038972] xfs filesystem being remounted at /var/lib/kubelet/pods/9d3cd637-f153-4200-a302-04b9e60a273c/volume-subpaths/scripts/kafka/2 supports timestamps until 2038 (0x7fffffff)
[ 8299.855030] systemd-journald[537510]: File /var/log/journal/ec23eae178c2480d1224169d16678fc2/system.journal corrupted or uncleanly shut down, renaming and replacing.

sftim · 2024-08-15T05:20:20Z

If you're willing to try Karpenter 1.0 (newly released), you might see better behavior or diagnostics. I'd give it a go, honestly.

ibalat · 2024-08-15T06:29:31Z

@sftim thanks for suggestion, I'll try it but why karpenter or K8S doesn't intervene this situation? 18h passed and they are still waiting NotReady and Terminating. Is there any parameter to force terminate notready nodes? ttlAfterNotRegistered parameter deprecated and my consolidateAfter: 5m config not working for this situation :/

jigisha620 · 2024-08-15T18:03:56Z

HI @ibalat,
From the information that you have shared, it seems like the node registered but never got initialized. Karpenter handles registration failures by waiting for 15 minutes to check if the node registers, if it doesn't then we go ahead and delete the nodeClaim. But we still have an open issue for nodes that Karpenter never initializes at all, which should be captured by #750 where we are hoping to start by introducing a static TTL for initialization to kill off nodes that don't ever go Ready on startup. Can you describe the nodeClaim for this node and share it? Can you also share the logs from the time this happened so that we can confirm that's the issue?

ibalat · 2024-08-15T19:26:46Z

hi @jigisha620 , actually, nodes had initialized because these nodes are becoming "Ready", then pods are being scheduling and finally after a while (~30-60mins later) node is passing "NotReady" status. So, they work properly for a while. I tried to upgrade v1.0.0 but still same problem occur. I am sharing my nodeclass, nodepool and nodeclaim configs. Btw, do you know why pods still waiting at "Terminating" status? K8s or karpenter can force delete them after a while? Is there any config for that?

Also I found newly events, maybe they are related with this issue. Their repeat count so much

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: main
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: "KarpenterNodeRole"
  subnetSelectorTerms:
    %{~ for subnet in eks_dev_v1_subnet_ids ~}
    - id: "${subnet}"
    %{~ endfor ~}
  securityGroupSelectorTerms:
    - name: "*dev-v1-node*"

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: main-green
spec:
  template:
    metadata:
      labels:
        node-group-name: main-green
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: main
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: [ "r5", "m5", "c6i" ]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      terminationGracePeriod: 5m
      expireAfter: 720h # 30 * 24h = 720h | periodically recycle nodes due to security concerns
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "17843341971500854913"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
  creationTimestamp: "2024-08-15T10:59:10Z"
  finalizers:
  - karpenter.k8s.aws/termination
  generation: 1
  name: main
  resourceVersion: "525655958"
  uid: 742b9052-735a-4078-b2d3-bbfe0cf883e3
spec:
  amiSelectorTerms:
  - alias: al2023@latest
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: KarpenterNodeRole
  securityGroupSelectorTerms:
  - name: '*dev-v1-node*'
  subnetSelectorTerms:
  - id: subnet-xx
  - id: subnet-xx
  - id: subnet-xx
status:
  amis:
  - id: ami-0d43f736643876936
    name: amazon-eks-node-al2023-arm64-standard-1.30-v20240807
    requirements:
    - key: kubernetes.io/arch
      operator: In
      values:
      - arm64
    - key: karpenter.k8s.aws/instance-gpu-count
      operator: DoesNotExist
    - key: karpenter.k8s.aws/instance-accelerator-count
      operator: DoesNotExist
  - id: ami-0d694ee9037e1f937
    name: amazon-eks-node-al2023-x86_64-standard-1.30-v20240807
    requirements:
    - key: kubernetes.io/arch
      operator: In
      values:
      - amd64
    - key: karpenter.k8s.aws/instance-gpu-count
      operator: DoesNotExist
    - key: karpenter.k8s.aws/instance-accelerator-count
      operator: DoesNotExist
  conditions:
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: AMIsReady
    status: "True"
    type: AMIsReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: InstanceProfileReady
    status: "True"
    type: InstanceProfileReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
 - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: SecurityGroupsReady
    status: "True"
    type: SecurityGroupsReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: SubnetsReady
    status: "True"
    type: SubnetsReady
  instanceProfile: dev-v1_xx
  securityGroups:
  - id: sg-xx
    name: dev-v1-xx
  - id: sg-xx
    name: dev-v1-xx
  subnets:
  - id: subnet-xx
    zone: eu-west-1c
    zoneID: euw1-az2
  - id: subnet-xx
    zone: eu-west-1a
    zoneID: euw1-az3
  - id: subnet-xx
    zone: eu-west-1b
    zoneID: euw1-az1

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "14203437024067510703"
    karpenter.sh/nodepool-hash-version: v3
  creationTimestamp: "2024-08-15T10:55:03Z"
  generation: 1
  name: main-green
  resourceVersion: "525888522"
  uid: 5866c52d-bb13-479f-b034-822128ebc8f1
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidateAfter: 5m
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: 1000
  template:
    metadata:
      labels:
        node-group-name: main-green
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: main
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
       - key: karpenter.sh/capacity-type
        operator: In
        values:
        - spot
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c
        - m
        - r
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
        - r5
        - m5
        - c6i
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values:
        - "2"
      terminationGracePeriod: 5m
status:
  conditions:
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: NodeClassReady
    status: "True"
    type: NodeClassReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-08-15T10:55:03Z"
    message: ""
    reason: ValidationSucceeded
    status: "True"
    type: ValidationSucceeded
  resources:
    cpu: "294"
    ephemeral-storage: 417873520Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 695806732Ki
    nodes: "20"
    pods: "2425"

apiVersion: karpenter.sh/v1
kind: NodeClaim
metadata:
  annotations:
    compatibility.karpenter.k8s.aws/cluster-name-tagged: "true"
    compatibility.karpenter.k8s.aws/kubelet-drift-hash: "15379597991425564585"
    karpenter.k8s.aws/ec2nodeclass-hash: "17843341971500854913"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/nodepool-hash: "14203437024067510703"
    karpenter.sh/nodepool-hash-version: v3
  creationTimestamp: "2024-08-15T12:05:33Z"
  finalizers:
  - karpenter.sh/termination
  generateName: main-green-
  generation: 1
  labels:
    karpenter.k8s.aws/instance-category: c
    karpenter.k8s.aws/instance-cpu: "32"
    karpenter.k8s.aws/instance-cpu-manufacturer: intel
    karpenter.k8s.aws/instance-ebs-bandwidth: "10000"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "true"
    karpenter.k8s.aws/instance-family: c6i
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "65536"
    karpenter.k8s.aws/instance-network-bandwidth: "12500"
    karpenter.k8s.aws/instance-size: 8xlarge
    karpenter.sh/capacity-type: spot
    karpenter.sh/nodepool: main-green
    kubernetes.io/arch: amd64
    kubernetes.io/os: linux
    node-group-name: main-green
    node.kubernetes.io/instance-type: c6i.8xlarge
    topology.k8s.aws/zone-id: euw1-az1
    topology.kubernetes.io/region: eu-west-1
    topology.kubernetes.io/zone: eu-west-1b
  name: main-green-7rncx
  ownerReferences:
  - apiVersion: karpenter.sh/v1
    blockOwnerDeletion: true
    kind: NodePool
    name: main-green
    uid: 5866c52d-bb13-479f-b034-822128ebc8f1
  resourceVersion: "525859504"
  uid: bd1aea84-18be-4d42-9c17-3936137c89a5
spec:
  expireAfter: 720h
  nodeClassRef:
    group: karpenter.k8s.aws
    kind: EC2NodeClass
    name: main
  requirements:
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - spot
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - c6i.12xlarge
    - c6i.16xlarge
    - c6i.24xlarge
    - c6i.32xlarge
    - c6i.8xlarge
    - c6i.metal
    - m5.12xlarge
    - m5.16xlarge
    - m5.24xlarge
    - m5.4xlarge
    - m5.8xlarge
    - m5.metal
    - r5.12xlarge
    - r5.16xlarge
    - r5.24xlarge
    - r5.4xlarge
    - r5.8xlarge
    - r5.metal
  - key: node-group-name
      operator: In
    values:
    - main-green
  - key: karpenter.k8s.aws/instance-generation
    operator: Gt
    values:
    - "2"
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - main-green
  - key: karpenter.k8s.aws/instance-category
    operator: In
    values:
    - c
    - m
    - r
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - c6i
    - m5
    - r5
  resources:
    requests:
      cpu: 4280m
      memory: 36152Mi
      pods: "67"
  terminationGracePeriod: 5m0s
status:
  allocatable:
    cpu: 31850m
    ephemeral-storage: 17Gi
    memory: 57691Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "84"
  capacity:
    cpu: "32"
    ephemeral-storage: 20Gi
    memory: 60620Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "84"
  conditions:
  - lastTransitionTime: "2024-08-15T12:15:35Z"
    message: ""
    reason: ConsistentStateFound
    status: "True"
    type: ConsistentStateFound
  - lastTransitionTime: "2024-08-15T15:46:53Z"
    message: ""
    reason: Consolidatable
    status: "True"
    type: Consolidatable
  - lastTransitionTime: "2024-08-15T12:06:14Z"
    message: ""
    reason: Initialized
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-08-15T12:05:35Z"
    message: ""
    reason: Launched
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-08-15T12:06:14Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-08-15T12:06:04Z"
    message: ""
    reason: Registered
    status: "True"
    type: Registered
  imageID: ami-0d694ee9037e1f937
  lastPodEventTime: "2024-08-15T15:41:53Z"
  nodeName: ip-10-xx-xx-xx.eu-west-1.compute.internal
  providerID: aws:///eu-west-1b/i-xxxxxx

jigisha620 · 2024-08-16T01:25:29Z

I think that the snippet that you have shared with "No allowed disruptions for disruption reason" is not the problem here. The nodes that you have, were already in NotReady state so they will not be considered for allowed disruptions. Can you share Karpenter controller logs from the same time?

ibalat · 2024-08-16T06:12:25Z

sure, between 05:58:24 and 06:09:12 3 nodes became NotReady and I saw them lively. But no related log :( You can see all logs between these times:

{"level":"INFO","time":"2024-08-16T05:58:24.287Z","logger":"controller","message":"created nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:26.268Z","logger":"controller","message":"launched nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:54.219Z","logger":"controller","message":"pod(s) have a preferred Anti-Affinity which can prevent consolidation",
{"level":"INFO","time":"2024-08-16T05:58:54.360Z","logger":"controller","message":"found provisionable pod(s)",
{"level":"INFO","time":"2024-08-16T05:58:54.360Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)",
{"level":"INFO","time":"2024-08-16T05:58:54.360Z","logger":"controller","message":"computed 1 unready node(s) will fit 1 pod(s)",
{"level":"INFO","time":"2024-08-16T05:58:54.376Z","logger":"controller","message":"created nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:56.599Z","logger":"controller","message":"deleted node",
{"level":"INFO","time":"2024-08-16T05:58:56.870Z","logger":"controller","message":"launched nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:56.902Z","logger":"controller","message":"deleted nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:19.838Z","logger":"controller","message":"registered nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:20.169Z","logger":"controller","message":"registered nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:24.803Z","logger":"controller","message":"pod(s) have a preferred Anti-Affinity which can prevent consolidation",
{"level":"INFO","time":"2024-08-16T05:59:37.493Z","logger":"controller","message":"initialized nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:38.378Z","logger":"controller","message":"initialized nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:49.497Z","logger":"controller","message":"deleted node",
{"level":"INFO","time":"2024-08-16T05:59:49.706Z","logger":"controller","message":"deleted nodeclaim",
{"level":"INFO","time":"2024-08-16T06:08:45.766Z","logger":"controller","message":"found provisionable pod(s)",
{"level":"INFO","time":"2024-08-16T06:08:45.766Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)",
{"level":"INFO","time":"2024-08-16T06:08:45.777Z","logger":"controller","message":"created nodeclaim",
{"level":"INFO","time":"2024-08-16T06:08:48.176Z","logger":"controller","message":"launched nodeclaim",
{"level":"INFO","time":"2024-08-16T06:09:12.703Z","logger":"controller","message":"registered nodeclaim",

ibalat · 2024-08-16T11:20:25Z

new update: not deletable node (although terminationGracePeriod: 5m and passed more time) show some events, maybe it can help

Node's nodeclaim have events below:

pods in node are waiting "Terminating" state and don't have any event or log at describe.

After I deleted nodeclaim manually, node deleted (But passed graceperiodtime).

jigisha620 · 2024-08-16T17:29:05Z

TerminationGracePeriod would not work if delete has not been called against the nodeClaim. In your case node went to NotReady state but nothing initiated it's deletion. I was able to reproduce something similar on my end where my node becomes NotReady due to Kubelet stopped posting node status. However, pods got rescheduled onto a different node. That makes me wonder if the pods you are running have some pre-stop hook that's preventing them from terminating?

ibalat · 2024-08-19T05:33:01Z

No prestop hook, finalizer or another thing. Just waiting like at screenshots.

jigisha620 · 2024-08-19T23:06:10Z

This is not necessarily an issue from Karpenter. To investigate further, we will have to take a look at the kubelet logs to know why pods remained stuck at Terminating. Since you are using an eks ami, you can run a script that's on your worker node at /etc/eks called log-collector-script which would help us get the kubelet logs. If you have AWS premium support then you can open a ticket to investigate those logs or you can send them over and I can try looking into them.

ibalat · 2024-08-28T06:08:42Z

when it happens, I couldn't login EC2, it doesn't response. But I could get stdout, it below.

[  423.390531] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1344160e_dca0_4e9d_be15_ea0b63efb5b2.slice/cri-containerd-496edffa072b6d7835989a0dfbce3c3071
1a32903c757baf4fcd460c9479f3a8.scope,task=java,pid=22199,uid=1001
[  423.412634] Out of memory: Killed process 22199 (java) total-vm:3657848kB, anon-rss:338056kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:964kB oom_score_adj:1000
[  425.563371] oom_reaper: reaped process 22199 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        2024-08-14T13:38:15+00:00

suraj2410 · 2024-08-28T13:58:34Z

we see this too many times

JacobHenner · 2024-08-29T23:17:59Z

@ibalat @suraj2410

What do the disk IOPS, disk idle time, and memory metrics look like for the affected hosts? Could this be the problem described in bottlerocket-os/bottlerocket#4075 (comment)? (applicable to Bottlerocket, but also observed with AL2).

ibalat · 2024-09-01T17:32:39Z

I had removed karpenter and reinstalled cluster autoscaler. But I can test it again in this week. After test, I will share results with you

dcherniv · 2024-11-12T14:21:29Z

This is a common thing with most kubernetes providers/autoscalers. For some reason the general position of k8s (as a whole) is to not touch nodes that are stuck like this. This is a philosophical dilemma essentially.
"Do we want to keep the stuck nodes for troubleshooting or do we want to force terminate them?"
And there are good arguments for both points. In my humble opinion nodes that stop posting status for longer than certain threshold should be force-terminated.
If you are running kubernetes at scale and your apps and nodes are properly HA you don't really care what happens to any given node. It was the k8s promise after all, cattle not pets.
I, personally, have no interest in troubleshooting solar flares, memory flipped bits and reasons why an OOMKill or kernel.pid_max exhaustion causes kubelet to go into weird state, provided my other nodes are healthy.

paalkr · 2024-12-02T23:41:39Z

We also experience this quite frequently, but only with nodes that Karpenter has scheduled for disruption. If I totally disallow Karpenter from replacing nodes by setting the node budget to 0, the issue does not occur at all.

apiVersion: karpenter.sh/v1
kind: NodePool
...
spec:
  disruption:
    budgets:
    - nodes: "0"
...

If I let Karpenter disrupt nodes, then we see the issue reappearing very frequently.

Karpenter version 1.0.2
eks version 1.29

GnatorX · 2025-01-21T18:58:53Z

@ibalat Have you attempted cutting a ticket with AWS to investigate how the node got partitioned from the control plane?

engedaam · 2025-01-21T18:59:10Z

/assign @garvinp-stripe

k8s-ci-robot · 2025-01-21T18:59:13Z

@engedaam: GitHub didn't allow me to assign the following users: garvinp-stripe.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @garvinp-stripe

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

engedaam · 2025-01-21T19:14:23Z

/assign @GnatorX

ibalat · 2025-01-23T14:56:12Z

Have you attempted cutting a ticket with AWS to investigate how the node got partitioned from the control plane?

No, I had removed karpenter because of the issues and couldn't try again.

GnatorX · 2025-01-24T00:08:23Z

Is this issue only showing up with Karpenter?

What is weird to me is that this is the normal way with how autoscaler handles partitioned nodes as mentioned by @dcherniv
#1573 (comment)

Official docs:
https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller.

However I wonder if you have something configured differently on your nodes between Karpenter vs cluster-autoscaler for node termination.
https://karpenter.sh/docs/concepts/disruption/#termination-controller
Are you running things that might be important to the node's connectivity that isn't tolerating Karpenter's taints?

ibalat added the kind/bug Categorizes issue or PR as related to a bug. label Aug 14, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 14, 2024

This was referenced Aug 14, 2024

Karpenter NotReady nodes are not deprovisioned. aws/karpenter-provider-aws#4277

Closed

Node Repair #750

Open

diranged mentioned this issue Sep 12, 2024

Node NotReady Disruption Controller #1659

Open

k8s-ci-robot assigned GnatorX Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node is "NotReady" and waiting at "Terminating" for hours #1573

Node is "NotReady" and waiting at "Terminating" for hours #1573

ibalat commented Aug 14, 2024 •

edited

Loading

k8s-ci-robot commented Aug 14, 2024

ibalat commented Aug 14, 2024 •

edited

Loading

sftim commented Aug 15, 2024

ibalat commented Aug 15, 2024

jigisha620 commented Aug 15, 2024

ibalat commented Aug 15, 2024

jigisha620 commented Aug 16, 2024

ibalat commented Aug 16, 2024

ibalat commented Aug 16, 2024 •

edited

Loading

jigisha620 commented Aug 16, 2024

ibalat commented Aug 19, 2024

jigisha620 commented Aug 19, 2024

ibalat commented Aug 28, 2024

suraj2410 commented Aug 28, 2024

JacobHenner commented Aug 29, 2024

ibalat commented Sep 1, 2024

dcherniv commented Nov 12, 2024

paalkr commented Dec 2, 2024

GnatorX commented Jan 21, 2025

engedaam commented Jan 21, 2025

k8s-ci-robot commented Jan 21, 2025

engedaam commented Jan 21, 2025

ibalat commented Jan 23, 2025

GnatorX commented Jan 24, 2025 •

edited

Loading

Node is "NotReady" and waiting at "Terminating" for hours #1573

Node is "NotReady" and waiting at "Terminating" for hours #1573

Comments

ibalat commented Aug 14, 2024 • edited Loading

Description

k8s-ci-robot commented Aug 14, 2024

ibalat commented Aug 14, 2024 • edited Loading

sftim commented Aug 15, 2024

ibalat commented Aug 15, 2024

jigisha620 commented Aug 15, 2024

ibalat commented Aug 15, 2024

jigisha620 commented Aug 16, 2024

ibalat commented Aug 16, 2024

ibalat commented Aug 16, 2024 • edited Loading

jigisha620 commented Aug 16, 2024

ibalat commented Aug 19, 2024

jigisha620 commented Aug 19, 2024

ibalat commented Aug 28, 2024

suraj2410 commented Aug 28, 2024

JacobHenner commented Aug 29, 2024

ibalat commented Sep 1, 2024

dcherniv commented Nov 12, 2024

paalkr commented Dec 2, 2024

GnatorX commented Jan 21, 2025

engedaam commented Jan 21, 2025

k8s-ci-robot commented Jan 21, 2025

engedaam commented Jan 21, 2025

ibalat commented Jan 23, 2025

GnatorX commented Jan 24, 2025 • edited Loading

ibalat commented Aug 14, 2024 •

edited

Loading

ibalat commented Aug 14, 2024 •

edited

Loading

ibalat commented Aug 16, 2024 •

edited

Loading

GnatorX commented Jan 24, 2025 •

edited

Loading