help request: apisix-etcd-2 just says completed and not running. What does it mean and what should I do? #10774

techcheckri · 2024-01-07T20:46:43Z

Description

My APISIX seems to work normally. When I apply some pods they are created.

But since two days ago the apisix-etcd-2 pod says:

NAME                                         READY   STATUS      RESTARTS
apisix-etcd-2                                0/1     Completed   0

kubectl -n ingress-apisix describe pod apisix-etcd-2
Name:             apisix-etcd-2
Namespace:        ingress-apisix
Priority:         0
Service Account:  default
Node:             aks-userpool-xxxxxxx-vmss000000/xx.xxx.x.xxxx
Start Time:       Sat, 06 Jan 2024 15:40:14 +0100
Labels:           app.kubernetes.io/instance=apisix
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=etcd
                  controller-revision-hash=apisix-etcd-xxxx
                  helm.sh/chart=etcd-8.7.7
                  statefulset.kubernetes.io/pod-name=apisix-etcd-2
Annotations:      checksum/token-secret: xxxxxxxx
Status:           Succeeded
IP:               xx.xxx.x.xxx
IPs:
  IP:           xx.xx.x.xx
Controlled By:  StatefulSet/apisix-etcd
Containers:
  etcd:
    Container ID:   containerd://xxxxxx
    Image:          docker.io/bitnami/etcd:3.5.7-debian-11-r14
    Image ID:       docker.io/bitnami/etcd@sha256:xxxxx
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 06 Jan 2024 15:40:19 +0100
      Finished:     Sun, 07 Jan 2024 07:18:05 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-etcd-2 (v1:metadata.name)
      MY_STS_NAME:                       apisix-etcd
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_AUTH_TOKEN:                   jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-etcd-headless.ingress-apisix.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xxx (ro)
Conditions:
  Type               Status
  DisruptionTarget   True 
  Initialized        True 
  Ready              False 
  ContainersReady    False 
  PodScheduled       True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-etcd-2
    ReadOnly:   false
  etcd-jwt-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apisix-etcd-jwt-token
    Optional:    false
  kube-api-access-5mkqf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

What happened?

What can I do to run apisix-etcd-2 again?

How can I prevent this in the future?

Please help!

Environment

Running on Azure, so most does not apply

APISIX version (run apisix version): -bash: apisix: Kommando nicht gefunden.
Operating system (run uname -a): Linux kubctl 4.19.0 change: added doc of how to load plugin. #1 SMP Wed Jul 12 12:00:44 MSK 2023 x86_64 GNU/Linux
OpenResty / Nginx version (run openresty -V or nginx -V):
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
APISIX Dashboard version, if relevant:
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):

The text was updated successfully, but these errors were encountered:

hanqingwu · 2024-01-08T01:12:03Z

@techcheckri , can you show the etcd pod logs , by kubectl -n ingress-apisix logs apisix-etcd-2

techcheckri · 2024-01-08T09:16:02Z

kubectl -n ingress-apisix logs  apisix-etcd-2
unable to retrieve container logs for containerd://xxxxx

By now apisix-etcd-1 is the same completed

NAME                                         READY   STATUS                   RESTARTS   AGE
apisix-etcd-0                                1/1     Running                  0          42h
apisix-etcd-1                                0/1     Completed                0          42h
apisix-etcd-2                                0/1     Completed                0          42h

kubectl -n ingress-apisix logs  apisix-etcd-1
unable to retrieve container logs for containerd:/xxxxx

kubectl -n ingress-apisix describe pod apisix-etcd-1
Name:             apisix-etcd-1
Namespace:        ingress-apisix
Priority:         0
Service Account:  default
Node:             aks-userpool-33171771-vmss000000/xx.xxx.x.xxx
Start Time:       Sat, 06 Jan 2024 15:41:20 +0100
Labels:           app.kubernetes.io/instance=apisix
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=etcd
                  controller-revision-hash=apisix-etcd-xxx
                  helm.sh/chart=etcd-8.7.7
                  statefulset.kubernetes.io/pod-name=apisix-etcd-1
Annotations:      checksum/token-secret: xxx
Status:           Succeeded
IP:               xx.xxx.x.xx
IPs:
  IP:           x.xxx.x.xx
Controlled By:  StatefulSet/apisix-etcd
Containers:
  etcd:
    Container ID:   containerd://xxxd
    Image:          docker.io/bitnami/etcd:3.5.7-debian-11-r14
    Image ID:       docker.io/bitnami/etcd@sha256:xxx
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 06 Jan 2024 15:41:43 +0100
      Finished:     Mon, 08 Jan 2024 08:27:47 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-etcd-1 (v1:metadata.name)
      MY_STS_NAME:                       apisix-etcd
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_AUTH_TOKEN:                   jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-etcd-headless.ingress-apisix.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xl4mx (ro)
Conditions:
  Type               Status
  DisruptionTarget   True 
  Initialized        True 
  Ready              False 
  ContainersReady    False 
  PodScheduled       True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-etcd-1
    ReadOnly:   false
  etcd-jwt-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apisix-etcd-jwt-token
    Optional:    false
  kube-api-access-xl4mx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

techcheckri · 2024-01-09T07:39:51Z

So, now all three apisix-etcd are gone.

apisix-etcd-0                                0/1     Completed                0          2d16h
apisix-etcd-1                                0/1     Completed                0          2d16h
apisix-etcd-2                                0/1     Completed                0          2d16h

What can I do to restart them?

Does nobody have any idea what I could do?

Do I have to reinstall APISIX or what can I do to get the system going again?

Please help!

hanqingwu · 2024-01-09T08:11:38Z

how do you install this apixsix cluster ?
maybe you should check install steps .

techcheckri · 2024-01-09T17:15:42Z

My route to install is like this:

https://apisix.apache.org/docs/ingress-controller/deployments/azure/

I use the helm way:

helm install apisix apisix/apisix \
  --set gateway.type=LoadBalancer \
  --set ingress-controller.enabled=true \
  --set gateway.tls.enabled=true \
  --create-namespace \
  --namespace ingress-apisix \
  --set ingress-controller.config.apisix.serviceNamespace=ingress-apisix \
  --set ingress-controller.config.apisix.adminAPIVersion=$ADMIN_API_VERSION \
  --version 1.8.0

So there is nothing special. All vanilla.

Could you please maybe explain what it means if apisix-etcd is going into the completed state if you know or point me to some documentation about it? I did not find anything about it.

I would like to understand what is going on.

hanqingwu · 2024-01-10T01:06:41Z

I have never seen etcd completed state before, so my suggest is check etcd logs to see what happen.

techcheckri · 2024-01-10T06:33:24Z

Problem is as could be seen before one can not get logs of terminated pods.

kubectl -n ingress-apisix logs -f apisix-etcd-0 -p
Error from server (BadRequest): previous terminated container "etcd" in pod "apisix-etcd-0" not found

So what I can interpret from your statement is that etcd completed is abnormal behavior rather than just having finished a job normally as conveyed by the exit code 0. As seen exit code was 0:

Name:             apisix-etcd-1
Namespace:        ingress-apisix
Priority:         0
Service Account:  default
    State:          Terminated
      Reason:       Completed
      Exit Code:    0

Abnormal termination should have some kind of error code or at least not stop with 0, right?

So should the observed behavior of my etcd pods then not be seen as a bug and this report here be a bug report rather than a help request?

hanqingwu · 2024-01-10T11:09:44Z

You can try docker ps -a ， then try to get docker logs about etcd .
We should find out cause first then determine whether it is bug

techcheckri · 2024-01-11T09:51:33Z

My Pods run on Azure, not locally. Or do I misunderstand you?

techcheckri · 2024-01-12T18:51:52Z

So how do I attach logs to the apisix-etcd pods so I can post them here?

hanqingwu · 2024-01-16T07:26:34Z

If you use containerd , so you can try crictl ps -a | grep etcd to get etcd containerid xxxx , then crictl logs xxxx

shreemaan-abhishek · 2024-01-23T05:20:19Z

I will close this since this is a problem with etcd/kubernetes.

github-project-automation bot added this to Apache APISIX backlog Jan 7, 2024

github-project-automation bot moved this to 📋 Backlog in Apache APISIX backlog Jan 7, 2024

shreemaan-abhishek closed this as completed Jan 23, 2024

github-project-automation bot moved this from 📋 Backlog to ✅ Done in Apache APISIX backlog Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help request: apisix-etcd-2 just says completed and not running. What does it mean and what should I do? #10774

help request: apisix-etcd-2 just says completed and not running. What does it mean and what should I do? #10774

techcheckri commented Jan 7, 2024

hanqingwu commented Jan 8, 2024

techcheckri commented Jan 8, 2024

techcheckri commented Jan 9, 2024

hanqingwu commented Jan 9, 2024

techcheckri commented Jan 9, 2024

hanqingwu commented Jan 10, 2024

techcheckri commented Jan 10, 2024

hanqingwu commented Jan 10, 2024 •

edited

Loading

techcheckri commented Jan 11, 2024

techcheckri commented Jan 12, 2024

hanqingwu commented Jan 16, 2024

shreemaan-abhishek commented Jan 23, 2024 •

edited

Loading

help request: apisix-etcd-2 just says completed and not running. What does it mean and what should I do? #10774

help request: apisix-etcd-2 just says completed and not running. What does it mean and what should I do? #10774

Comments

techcheckri commented Jan 7, 2024

Description

Environment

hanqingwu commented Jan 8, 2024

techcheckri commented Jan 8, 2024

techcheckri commented Jan 9, 2024

hanqingwu commented Jan 9, 2024

techcheckri commented Jan 9, 2024

hanqingwu commented Jan 10, 2024

techcheckri commented Jan 10, 2024

hanqingwu commented Jan 10, 2024 • edited Loading

techcheckri commented Jan 11, 2024

techcheckri commented Jan 12, 2024

hanqingwu commented Jan 16, 2024

shreemaan-abhishek commented Jan 23, 2024 • edited Loading

hanqingwu commented Jan 10, 2024 •

edited

Loading

shreemaan-abhishek commented Jan 23, 2024 •

edited

Loading