Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help request: apisix-etcd-2 just says completed and not running. What does it mean and what should I do? #10774

Closed
techcheckri opened this issue Jan 7, 2024 · 12 comments

Comments

@techcheckri
Copy link

Description

My APISIX seems to work normally. When I apply some pods they are created.

But since two days ago the apisix-etcd-2 pod says:

NAME                                         READY   STATUS      RESTARTS
apisix-etcd-2                                0/1     Completed   0
kubectl -n ingress-apisix describe pod apisix-etcd-2
Name:             apisix-etcd-2
Namespace:        ingress-apisix
Priority:         0
Service Account:  default
Node:             aks-userpool-xxxxxxx-vmss000000/xx.xxx.x.xxxx
Start Time:       Sat, 06 Jan 2024 15:40:14 +0100
Labels:           app.kubernetes.io/instance=apisix
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=etcd
                  controller-revision-hash=apisix-etcd-xxxx
                  helm.sh/chart=etcd-8.7.7
                  statefulset.kubernetes.io/pod-name=apisix-etcd-2
Annotations:      checksum/token-secret: xxxxxxxx
Status:           Succeeded
IP:               xx.xxx.x.xxx
IPs:
  IP:           xx.xx.x.xx
Controlled By:  StatefulSet/apisix-etcd
Containers:
  etcd:
    Container ID:   containerd://xxxxxx
    Image:          docker.io/bitnami/etcd:3.5.7-debian-11-r14
    Image ID:       docker.io/bitnami/etcd@sha256:xxxxx
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 06 Jan 2024 15:40:19 +0100
      Finished:     Sun, 07 Jan 2024 07:18:05 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-etcd-2 (v1:metadata.name)
      MY_STS_NAME:                       apisix-etcd
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_AUTH_TOKEN:                   jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-etcd-headless.ingress-apisix.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xxx (ro)
Conditions:
  Type               Status
  DisruptionTarget   True 
  Initialized        True 
  Ready              False 
  ContainersReady    False 
  PodScheduled       True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-etcd-2
    ReadOnly:   false
  etcd-jwt-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apisix-etcd-jwt-token
    Optional:    false
  kube-api-access-5mkqf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

What happened?

What can I do to run apisix-etcd-2 again?

How can I prevent this in the future?

Please help!

Environment

Running on Azure, so most does not apply

  • APISIX version (run apisix version): -bash: apisix: Kommando nicht gefunden.
  • Operating system (run uname -a): Linux kubctl 4.19.0 change: added doc of how to load plugin. #1 SMP Wed Jul 12 12:00:44 MSK 2023 x86_64 GNU/Linux
  • OpenResty / Nginx version (run openresty -V or nginx -V):
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
  • APISIX Dashboard version, if relevant:
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):
@hanqingwu
Copy link
Contributor

@techcheckri , can you show the etcd pod logs , by kubectl -n ingress-apisix logs apisix-etcd-2

@techcheckri
Copy link
Author

kubectl -n ingress-apisix logs  apisix-etcd-2
unable to retrieve container logs for containerd://xxxxx

By now apisix-etcd-1 is the same completed

NAME                                         READY   STATUS                   RESTARTS   AGE
apisix-etcd-0                                1/1     Running                  0          42h
apisix-etcd-1                                0/1     Completed                0          42h
apisix-etcd-2                                0/1     Completed                0          42h
kubectl -n ingress-apisix logs  apisix-etcd-1
unable to retrieve container logs for containerd:/xxxxx
kubectl -n ingress-apisix describe pod apisix-etcd-1
Name:             apisix-etcd-1
Namespace:        ingress-apisix
Priority:         0
Service Account:  default
Node:             aks-userpool-33171771-vmss000000/xx.xxx.x.xxx
Start Time:       Sat, 06 Jan 2024 15:41:20 +0100
Labels:           app.kubernetes.io/instance=apisix
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=etcd
                  controller-revision-hash=apisix-etcd-xxx
                  helm.sh/chart=etcd-8.7.7
                  statefulset.kubernetes.io/pod-name=apisix-etcd-1
Annotations:      checksum/token-secret: xxx
Status:           Succeeded
IP:               xx.xxx.x.xx
IPs:
  IP:           x.xxx.x.xx
Controlled By:  StatefulSet/apisix-etcd
Containers:
  etcd:
    Container ID:   containerd://xxxd
    Image:          docker.io/bitnami/etcd:3.5.7-debian-11-r14
    Image ID:       docker.io/bitnami/etcd@sha256:xxx
    Ports:          2379/TCP, 2380/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 06 Jan 2024 15:41:43 +0100
      Finished:     Mon, 08 Jan 2024 08:27:47 +0100
    Ready:          False
    Restart Count:  0
    Liveness:       exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=30s #success=1 #failure=5
    Readiness:      exec [/opt/bitnami/scripts/etcd/healthcheck.sh] delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:
      BITNAMI_DEBUG:                     false
      MY_POD_IP:                          (v1:status.podIP)
      MY_POD_NAME:                       apisix-etcd-1 (v1:metadata.name)
      MY_STS_NAME:                       apisix-etcd
      ETCDCTL_API:                       3
      ETCD_ON_K8S:                       yes
      ETCD_START_FROM_SNAPSHOT:          no
      ETCD_DISASTER_RECOVERY:            no
      ETCD_NAME:                         $(MY_POD_NAME)
      ETCD_DATA_DIR:                     /bitnami/etcd/data
      ETCD_LOG_LEVEL:                    info
      ALLOW_NONE_AUTHENTICATION:         yes
      ETCD_AUTH_TOKEN:                   jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
      ETCD_ADVERTISE_CLIENT_URLS:        http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2379,http://apisix-etcd.ingress-apisix.svc.cluster.local:2379
      ETCD_LISTEN_CLIENT_URLS:           http://0.0.0.0:2379
      ETCD_INITIAL_ADVERTISE_PEER_URLS:  http://$(MY_POD_NAME).apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_LISTEN_PEER_URLS:             http://0.0.0.0:2380
      ETCD_INITIAL_CLUSTER_TOKEN:        etcd-cluster-k8s
      ETCD_INITIAL_CLUSTER_STATE:        existing
      ETCD_INITIAL_CLUSTER:              apisix-etcd-0=http://apisix-etcd-0.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-1=http://apisix-etcd-1.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380,apisix-etcd-2=http://apisix-etcd-2.apisix-etcd-headless.ingress-apisix.svc.cluster.local:2380
      ETCD_CLUSTER_DOMAIN:               apisix-etcd-headless.ingress-apisix.svc.cluster.local
    Mounts:
      /bitnami/etcd from data (rw)
      /opt/bitnami/etcd/certs/token/ from etcd-jwt-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xl4mx (ro)
Conditions:
  Type               Status
  DisruptionTarget   True 
  Initialized        True 
  Ready              False 
  ContainersReady    False 
  PodScheduled       True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-apisix-etcd-1
    ReadOnly:   false
  etcd-jwt-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  apisix-etcd-jwt-token
    Optional:    false
  kube-api-access-xl4mx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

@techcheckri
Copy link
Author

So, now all three apisix-etcd are gone.

apisix-etcd-0                                0/1     Completed                0          2d16h
apisix-etcd-1                                0/1     Completed                0          2d16h
apisix-etcd-2                                0/1     Completed                0          2d16h

What can I do to restart them?

Does nobody have any idea what I could do?

Do I have to reinstall APISIX or what can I do to get the system going again?

Please help!

@hanqingwu
Copy link
Contributor

how do you install this apixsix cluster ?
maybe you should check install steps .

@techcheckri
Copy link
Author

My route to install is like this:

https://apisix.apache.org/docs/ingress-controller/deployments/azure/

I use the helm way:

helm install apisix apisix/apisix \
  --set gateway.type=LoadBalancer \
  --set ingress-controller.enabled=true \
  --set gateway.tls.enabled=true \
  --create-namespace \
  --namespace ingress-apisix \
  --set ingress-controller.config.apisix.serviceNamespace=ingress-apisix \
  --set ingress-controller.config.apisix.adminAPIVersion=$ADMIN_API_VERSION \
  --version 1.8.0

So there is nothing special. All vanilla.

Could you please maybe explain what it means if apisix-etcd is going into the completed state if you know or point me to some documentation about it? I did not find anything about it.

I would like to understand what is going on.

@hanqingwu
Copy link
Contributor

I have never seen etcd completed state before, so my suggest is check etcd logs to see what happen.

@techcheckri
Copy link
Author

Problem is as could be seen before one can not get logs of terminated pods.

kubectl -n ingress-apisix logs -f apisix-etcd-0 -p
Error from server (BadRequest): previous terminated container "etcd" in pod "apisix-etcd-0" not found

So what I can interpret from your statement is that etcd completed is abnormal behavior rather than just having finished a job normally as conveyed by the exit code 0. As seen exit code was 0:

Name:             apisix-etcd-1
Namespace:        ingress-apisix
Priority:         0
Service Account:  default
    State:          Terminated
      Reason:       Completed
      Exit Code:    0

Abnormal termination should have some kind of error code or at least not stop with 0, right?

So should the observed behavior of my etcd pods then not be seen as a bug and this report here be a bug report rather than a help request?

@hanqingwu
Copy link
Contributor

hanqingwu commented Jan 10, 2024

You can try docker ps -a , then try to get docker logs about etcd .
We should find out cause first then determine whether it is bug

@techcheckri
Copy link
Author

My Pods run on Azure, not locally. Or do I misunderstand you?

@techcheckri
Copy link
Author

So how do I attach logs to the apisix-etcd pods so I can post them here?

@hanqingwu
Copy link
Contributor

If you use containerd , so you can try crictl ps -a | grep etcd to get etcd containerid xxxx , then crictl logs xxxx

@shreemaan-abhishek
Copy link
Contributor

shreemaan-abhishek commented Jan 23, 2024

I will close this since this is a problem with etcd/kubernetes.

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Apache APISIX backlog Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants