Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatly found and lost a service endpoints. #3060

Closed
JerryChaox opened this issue Sep 9, 2018 · 18 comments
Closed

Repeatly found and lost a service endpoints. #3060

JerryChaox opened this issue Sep 9, 2018 · 18 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@JerryChaox
Copy link

JerryChaox commented Sep 9, 2018

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):


Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

NGINX Ingress controller version: 0.19.0

Kubernetes version (use kubectl version): 1.11.0

Environment:

  • Cloud provider or hardware configuration: Alibaba Public Cloud
  • OS (e.g. from /etc/os-release): CentOS 7
  • Kernel (e.g. uname -a): 3.10.0-693.2.2.el7.x86_64 Basic structure  #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
  • Others:

What happened:
nginx controller repeatly found a service endpoint and lost a service endpoint, then nginx reloads the configuration frequently.

What you expected to happen:
Makes service endpoint stable

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

I0909 04:29:16.138981       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033203", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:29:42.754942       6 controller.go:169] Changes handled by the dynamic configuration, skipping backend reload.
I0909 04:29:42.755821       6 nginx.go:837] Posting to http://localhost:18080/configuration/backends: [{"name":"dev-spring-boot-terminal-manager-8761","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":8761,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}},{"name":"upstream-default-backend","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":0,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}}]
I0909 04:29:42.757864       6 socket.go:330] removing ingresses [] from metrics
I0909 04:29:42.760178       6 controller.go:204] Dynamic reconfiguration succeeded.
I0909 04:29:42.760311       6 controller.go:188] removing SSL certificate metrics for [] hosts
I0909 04:29:49.030152       6 status.go:362] updating Ingress dev/ingress-nginx status to [{172.18.114.108 }]
I0909 04:29:49.033845       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033299", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:30:16.138744       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033349", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:30:49.030156       6 status.go:362] updating Ingress dev/ingress-nginx status to [{172.18.114.108 }]
I0909 04:30:49.033415       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033404", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:31:16.138939       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033453", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
W0909 04:31:19.578826       6 controller.go:366] Service "dev/spring-boot-terminal-manager" does not have any active Endpoint
W0909 04:31:19.578871       6 controller.go:806] Service "dev/spring-boot-terminal-manager" does not have any active Endpoint.
I0909 04:31:19.578934       6 controller.go:171] Configuration changes detected, backend reload required.
I0909 04:31:19.579842       6 util.go:68] rlimit.max=65536
I0909 04:31:19.579854       6 nginx.go:525] Maximum number of open file descriptors: 31744
I0909 04:31:19.662178       6 nginx.go:635] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-09-09 04:26:42.562192346 +0000
+++ /tmp/new-nginx-cfg697540580 2018-09-09 04:31:19.660050936 +0000
@@ -1,5 +1,5 @@

-# Configuration checksum: 17603421293401709395
+# Configuration checksum: 10535667221524200029

 # setup custom paths that do not require root access
 pid /tmp/nginx.pid;
@@ -414,7 +414,7 @@

                        port_in_redirect off;

-                       set $proxy_upstream_name "dev-spring-boot-terminal-manager-8761";
+                       set $proxy_upstream_name "";

                        client_max_body_size                    "1m";

@@ -467,9 +467,8 @@
                        proxy_next_upstream                     error timeout;
                        proxy_next_upstream_tries               3;

-                       proxy_pass http://upstream_balancer;
-
-                       proxy_redirect                          off;
+                       # No endpoints available for the request
+                       return 503;

                }

I0909 04:31:19.698606       6 controller.go:187] Backend successfully reloaded.
I0909 04:31:19.698973       6 nginx.go:837] Posting to http://localhost:18080/configuration/backends: [{"name":"upstream-default-backend","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":0,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"127.0.0.1","port":"8181","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}}]
I0909 04:31:19.700595       6 controller.go:204] Dynamic reconfiguration succeeded.
I0909 04:31:19.700748       6 socket.go:330] removing ingresses [] from metrics
I0909 04:31:19.703846       6 controller.go:188] removing SSL certificate metrics for [] hosts
I0909 04:31:22.912367       6 controller.go:171] Configuration changes detected, backend reload required.
I0909 04:31:22.913326       6 util.go:68] rlimit.max=65536
I0909 04:31:22.913344       6 nginx.go:525] Maximum number of open file descriptors: 31744
I0909 04:31:22.985809       6 nginx.go:635] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-09-09 04:31:19.661050744 +0000
+++ /tmp/new-nginx-cfg127948470 2018-09-09 04:31:22.984413393 +0000
@@ -1,5 +1,5 @@

-# Configuration checksum: 10535667221524200029
+# Configuration checksum: 13790196543230584327

 # setup custom paths that do not require root access
 pid /tmp/nginx.pid;
@@ -414,7 +414,7 @@

                        port_in_redirect off;

-                       set $proxy_upstream_name "";
+                       set $proxy_upstream_name "dev-spring-boot-terminal-manager-8761";

                        client_max_body_size                    "1m";

@@ -467,8 +467,9 @@
                        proxy_next_upstream                     error timeout;
                        proxy_next_upstream_tries               3;

-                       # No endpoints available for the request
-                       return 503;
+                       proxy_pass http://upstream_balancer;
+
+                       proxy_redirect                          off;

                }

I0909 04:31:23.021703       6 controller.go:187] Backend successfully reloaded.
I0909 04:31:23.022740       6 nginx.go:837] Posting to http://localhost:18080/configuration/backends: [{"name":"dev-spring-boot-terminal-manager-8761","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":8761,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}},{"name":"upstream-default-backend","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":0,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}}]
I0909 04:31:23.025174       6 socket.go:330] removing ingresses [] from metrics
I0909 04:31:23.027011       6 controller.go:204] Dynamic reconfiguration succeeded.
I0909 04:31:23.027235       6 controller.go:188] removing SSL certificate metrics for [] hosts
I0909 04:31:49.030350       6 status.go:362] updating Ingress dev/ingress-nginx status to [{172.18.114.108 }]
I0909 04:31:49.033461       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033548", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:32:16.138406       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033598", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx

My resources

NAME                                                READY     STATUS    RESTARTS   AGE       IP             NODE
pod/nginx-ingress-controller-75b8cf8fbc-zz8gl       1/1       Running   0          16m       10.244.2.205   izwz98qmbokd3aji6kfguoz
pod/redis-master-55cc5f7b96-pzwtp                   1/1       Running   0          17d       10.244.1.131   izwz90c5nb7ulrekxzpmdwz
pod/spring-boot-terminal-manager-7574d575bf-wsj74   1/1       Running   0          1d        10.244.1.155   izwz90c5nb7ulrekxzpmdwz

NAME                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE       SELECTOR
service/ingress-nginx                  ClusterIP   10.107.150.21    <none>        80/TCP,443/TCP   2h        app=ingress-nginx
service/redis-master                   ClusterIP   10.99.70.196     <none>        6379/TCP         17d       name=redis-master
service/spring-boot-terminal-manager   ClusterIP   10.108.120.239   <none>        8761/TCP         15m       app=spring-boot-terminal-manager

NAME                               HOSTS                  ADDRESS          PORTS     AGE
ingress.extensions/ingress-nginx   foo.terminal-manager   172.18.114.108   80        1h

Yaml config:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: dev
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ingress-nginx
  template:
    metadata:
      labels:
        app: ingress-nginx
      annotations:
        prometheus.io/port: '10254'
        prometheus.io/scrape: 'true'
    spec:
      serviceAccountName: nginx-ingress-serviceaccount
      nodeSelector:
        kubernetes.io/hostname: izwz98qmbokd3aji6kfguoz
      containers:
        - name: nginx-ingress-controller
          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.19.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/spring-boot-terminal-manager
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --report-node-internal-ip-address=true
            - --annotations-prefix=nginx.ingress.kubernetes.io
            - --v=2
          securityContext:
            capabilities:
                drop:
                - ALL
                add:
                - NET_BIND_SERVICE
            # www-data -> 33
            runAsUser: 33
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
          - name: http
            containerPort: 80
          - name: https
            containerPort: 443
          - name: status
            containerPort: 18080
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 80
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 80
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1

apiVersion: v1
kind: Service
metadata:
  name: spring-boot-terminal-manager
  namespace: dev
  labels:
    app: spring-boot-terminal-manager
spec:
  ports:
  - port: 8761
    protocol: TCP
    targetPort: 8761
    name: http
  selector:
    app: spring-boot-terminal-manager

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-nginx
  namespace: dev
  labels:
    app: ingress-nginx
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: foo.terminal-manager
    http:
      paths:
      - backend:
          serviceName: spring-boot-terminal-manager
          servicePort: 8761
        path: /
@aledbf
Copy link
Member

aledbf commented Sep 10, 2018

@JerryChaox the spring-boot-terminal-manager has probes? Please check the pod logs for failures.
If you see this behavior it means there's something wrong with the health checks

@JerryChaox
Copy link
Author

@aledbf I have not set the probe. Is it the root cause?

@zhuleiandy888
Copy link

I have a same problem, my webserver return http 503 error code.
ENV:
NGINX Ingress controller version: 0.15.0
Kubernetes version (use kubectl version): 1.9.8
OS: centos7.2

is this a bug ?

@JerryChaox
Copy link
Author

JerryChaox commented Sep 21, 2018

@aledbf I have add a probes to my service and the pods didn't log any failure but nginx-controller still log "Service **** does not have any active EndPoint.".

@JerryChaox
Copy link
Author

183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:51 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.008 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.008 200 75a043dc9ba9dfa31d56b4ba93bbbb27
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:51 +0000] "GET /health_data HTTP/1.1" 200 5108 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.009 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.009 200 5fa6c6c947aea6e95f8f170bee4cfe26
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:56 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.009 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.009 200 d303f5531b8e40ec71fea06bfe9b6851
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:56 +0000] "GET /health_data HTTP/1.1" 200 5105 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.010 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.009 200 d52806286531c79d805db07eb29a13c2
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:01 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.008 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.008 200 d9ff10fd028812e1fefd9df6c93f3452
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:01 +0000] "GET /health_data HTTP/1.1" 200 5107 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.010 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.011 200 f15ad7098212ffd767dfd90d5face853
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:06 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.009 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.008 200 5403797e8081ca3eb12d2c2e5e420dd7
W0921 07:02:06.406278       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:02:06.406372       6 controller.go:171] Configuration changes detected, backend reload required.
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:06 +0000] "GET /health_data HTTP/1.1" 200 5106 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.011 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.011 200 e3a5c429440eda7d7521d03287012785
I0921 07:02:06.512918       6 controller.go:187] Backend successfully reloaded.
I0921 07:02:06.519054       6 controller.go:204] Dynamic reconfiguration succeeded.
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:11 +0000] "GET /toplevel_data HTTP/1.1" 503 615 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.000 [] - - - - 6cc5041fbd17a24eff911de8fb6caaa5
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:11 +0000] "GET /health_data HTTP/1.1" 503 615 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.000 [] - - - - 61c9da5a76083f4a507f731f2dea01fe
W0921 07:02:12.177108       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:02:15.510630       6 controller.go:171] Configuration changes detected, backend reload required.
I0921 07:02:15.615510       6 controller.go:187] Backend successfully reloaded.
I0921 07:02:15.622120       6 controller.go:204] Dynamic reconfiguration succeeded.
W0921 07:04:16.626457       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:04:16.626564       6 controller.go:171] Configuration changes detected, backend reload required.
I0921 07:04:16.731334       6 controller.go:187] Backend successfully reloaded.
I0921 07:04:16.737416       6 controller.go:204] Dynamic reconfiguration succeeded.
W0921 07:04:19.959767       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:04:23.293183       6 controller.go:171] Configuration changes detected, backend reload required.
I0921 07:04:23.406815       6 controller.go:187] Backend successfully reloaded.
I0921 07:04:23.412610       6 controller.go:204] Dynamic reconfiguration succeeded.

The behaviour is still present in the rook-ceph-mgr-dashboard service.

@zhuleiandy888
Copy link

zhuleiandy888 commented Sep 25, 2018

@JerryChaox @aledbf

W0925 01:07:25.478161 7 controller.go:359] Service "ingress-nginx/default-http-backend" does not have any active Endpoint
I0925 01:07:25.478408 7 controller.go:169] Configuration changes detected, backend reload required.
I0925 01:07:25.593527 7 controller.go:185] Backend successfully reloaded.
172.30.10.1 - [172.30.10.1] - - [25/Sep/2018:01:07:27 +0000] "POST /api/common/message/get-captchaid HTTP/1.1" 502 668 "http://test-zhike.vhall.com/setAccount" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0" 815 3.011 [default-test-marketing-fe-80] 172.30.10.11:80 668 3.011 502 427e4784a830ca5c180767f84e0a36de
I0925 01:07:28.811708 7 controller.go:169] Configuration changes detected, backend reload required.
I0925 01:07:28.969324 7 controller.go:185] Backend successfully reloaded.

=============================================================

W0925 00:45:53.408222 7 controller.go:359] Service "ingress-nginx/default-http-backend" does not have any active Endpoint
W0925 00:45:53.408300 7 controller.go:797] Service "default/test-marketing-web" does not have any active Endpoint.
I0925 00:45:53.408469 7 controller.go:169] Configuration changes detected, backend reload required.
I0925 00:45:53.518670 7 controller.go:185] Backend successfully reloaded.
172.30.10.1 - [172.30.10.1] - - [25/Sep/2018:00:45:53 +0000] "POST /api/common/message/get-captchaid HTTP/1.1" 502 668 "http://test-zhike.vhall.com/setAccount" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0" 815 3.007 [default-test-marketing-fe-80] 172.30.20.3:80 668 3.007 502 a7d5583d007813e574a7a9a4e6843487

Ingress version: 0.17.1
K8S version: 1.9.8
In 0.15.0 , i had a 503 error code, in 0.17.1 the 503 error has gone, but i have a 502 error on reload.
The endpoints of service are always running, I've been testing that there are endpoints.
i think this is a BUG.

@lomocc
Copy link

lomocc commented Sep 27, 2018

same problem:

upstream upstream_balancer {
  server 0.0.0.1; # placeholder
  balancer_by_lua_block {
    balancer.balance()
  }		
  keepalive 32;
}
proxy_pass http://upstream_balancer;

my yml:

apiVersion: v1
kind: Namespace
metadata:
  name: seafile
---
apiVersion: v1
kind: Service
metadata:
  name: seafile
  namespace: seafile
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: seafile
  namespace: seafile
subsets:
  - addresses:
    - ip: 119.27.169.191
    ports:
    - port: 8000
      protocol: TCP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: seafile
  namespace: seafile
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: seafile.lomo.cc
    http:
      paths:
      - path: /
        backend:
          serviceName: seafile
          servicePort: 80

How to fix?

@aledbf
Copy link
Member

aledbf commented Sep 27, 2018

@zhuleiandy888 in your case, please update to 0.19.0. There was an error in 0.15 that triggered reloads #2636

@zhuleiandy888
Copy link

@aledbf Thanks! Now it looks like nothing is wrong with version 0.19.0. I'll keep testing it for a while.

@JerryChaox
Copy link
Author

This issue still occurs in my case with 0.19.0

@zhuleiandy888
Copy link

@JerryChaox to check your k8s cluster status of nodes, check the last event info of nodes. be sure that kube-controller-manager service hasn't error log, the nodes report self status to master is normal.
If nodes status is "NodeNotReady", all pods status of this node is false, the endpoints will be lost. then ingress will generate a warning, like this “Service "xxxxx/xxxxxxxxx" does not have any active Endpoint”.

To check if the heartbeat timeout is reported from the node, check that the parameter "--node-monitor-grace-period" and "--node-monitor-period" settings are reasonable.

I hope that will help you.

@JerryChaox
Copy link
Author

@zhuleiandy888 @aledbf I found that the nginx controller lost endpoint info when the kube-controller-manager print error as follows:

E1003 17:53:12.421259       1 node_lifecycle_controller.go:889] Error updating node izwz90c5nb7ulrekxzpmdwz: Operation cannot be fulfilled on nodes "izwz90c5nb7ulrekxzpmdwz": the object has been modified; please apply your changes to the latest version and try again
E1003 17:54:12.522140       1 node_lifecycle_controller.go:889] Error updating node izwz90c5nb7ulrekxzpmdwz: Operation cannot be fulfilled on nodes "izwz90c5nb7ulrekxzpmdwz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:01:37.712320       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:02:07.758132       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:02:37.791448       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:03:47.896906       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:04:17.933976       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:04:47.974186       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:05:18.011322       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:07:38.241021       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:08:08.278313       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:08:38.318800       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:10:48.526181       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again

I hava no idea to debug from this Infromation. Could anyone give me some help?

@rainbowBPF2
Copy link

@JerryChaox to check your k8s cluster status of nodes, check the last event info of nodes. be sure that kube-controller-manager service hasn't error log, the nodes report self status to master is normal.
If nodes status is "NodeNotReady", all pods status of this node is false, the endpoints will be lost. then ingress will generate a warning, like this “Service "xxxxx/xxxxxxxxx" does not have any active Endpoint”.

To check if the heartbeat timeout is reported from the node, check that the parameter "--node-monitor-grace-period" and "--node-monitor-period" settings are reasonable.

I hope that will help you.

Helps a lot for understanding its internal mechanism.

Recently, our cluster also encounters this problem.
Back end pods running OK, Service itself OK, but the endpoints link service and pods is missing.
With ingress controller log shows " controller.go:869] service **** does not have any active endpoints "
Such case lasts for about one minutes, ingress return 503 frequently to USER.
After that, k8s cluster will again reload back these missing endpints, everything back to normal.

During these 503 time, no deployment update and health check is OK .
Still don't know why.

@rainbowBPF2
Copy link

Our Cluster version : 1.6.6
Ingress Controller Version: Release: 0.9.0-beta.17

Thanks for the explanation , and I will try use a higher version ingress-controller. @aledbf

One more guess, will the problem related to internal endpoints cache ? if exists. Just guess.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 12, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants