Repeatly found and lost a service endpoints. #3060

JerryChaox · 2018-09-09T04:40:12Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

NGINX Ingress controller version: 0.19.0

Kubernetes version (use kubectl version): 1.11.0

Environment:

Cloud provider or hardware configuration: Alibaba Public Cloud
OS (e.g. from /etc/os-release): CentOS 7
Kernel (e.g. uname -a): 3.10.0-693.2.2.el7.x86_64 Basic structure #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubeadm
Others:

What happened:
nginx controller repeatly found a service endpoint and lost a service endpoint, then nginx reloads the configuration frequently.

What you expected to happen:
Makes service endpoint stable

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

I0909 04:29:16.138981       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033203", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:29:42.754942       6 controller.go:169] Changes handled by the dynamic configuration, skipping backend reload.
I0909 04:29:42.755821       6 nginx.go:837] Posting to http://localhost:18080/configuration/backends: [{"name":"dev-spring-boot-terminal-manager-8761","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":8761,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}},{"name":"upstream-default-backend","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":0,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}}]
I0909 04:29:42.757864       6 socket.go:330] removing ingresses [] from metrics
I0909 04:29:42.760178       6 controller.go:204] Dynamic reconfiguration succeeded.
I0909 04:29:42.760311       6 controller.go:188] removing SSL certificate metrics for [] hosts
I0909 04:29:49.030152       6 status.go:362] updating Ingress dev/ingress-nginx status to [{172.18.114.108 }]
I0909 04:29:49.033845       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033299", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:30:16.138744       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033349", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:30:49.030156       6 status.go:362] updating Ingress dev/ingress-nginx status to [{172.18.114.108 }]
I0909 04:30:49.033415       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033404", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:31:16.138939       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033453", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
W0909 04:31:19.578826       6 controller.go:366] Service "dev/spring-boot-terminal-manager" does not have any active Endpoint
W0909 04:31:19.578871       6 controller.go:806] Service "dev/spring-boot-terminal-manager" does not have any active Endpoint.
I0909 04:31:19.578934       6 controller.go:171] Configuration changes detected, backend reload required.
I0909 04:31:19.579842       6 util.go:68] rlimit.max=65536
I0909 04:31:19.579854       6 nginx.go:525] Maximum number of open file descriptors: 31744
I0909 04:31:19.662178       6 nginx.go:635] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-09-09 04:26:42.562192346 +0000
+++ /tmp/new-nginx-cfg697540580 2018-09-09 04:31:19.660050936 +0000
@@ -1,5 +1,5 @@

-# Configuration checksum: 17603421293401709395
+# Configuration checksum: 10535667221524200029

 # setup custom paths that do not require root access
 pid /tmp/nginx.pid;
@@ -414,7 +414,7 @@

                        port_in_redirect off;

-                       set $proxy_upstream_name "dev-spring-boot-terminal-manager-8761";
+                       set $proxy_upstream_name "";

                        client_max_body_size                    "1m";

@@ -467,9 +467,8 @@
                        proxy_next_upstream                     error timeout;
                        proxy_next_upstream_tries               3;

-                       proxy_pass http://upstream_balancer;
-
-                       proxy_redirect                          off;
+                       # No endpoints available for the request
+                       return 503;

                }

I0909 04:31:19.698606       6 controller.go:187] Backend successfully reloaded.
I0909 04:31:19.698973       6 nginx.go:837] Posting to http://localhost:18080/configuration/backends: [{"name":"upstream-default-backend","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":0,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"127.0.0.1","port":"8181","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}}]
I0909 04:31:19.700595       6 controller.go:204] Dynamic reconfiguration succeeded.
I0909 04:31:19.700748       6 socket.go:330] removing ingresses [] from metrics
I0909 04:31:19.703846       6 controller.go:188] removing SSL certificate metrics for [] hosts
I0909 04:31:22.912367       6 controller.go:171] Configuration changes detected, backend reload required.
I0909 04:31:22.913326       6 util.go:68] rlimit.max=65536
I0909 04:31:22.913344       6 nginx.go:525] Maximum number of open file descriptors: 31744
I0909 04:31:22.985809       6 nginx.go:635] NGINX configuration diff:
--- /etc/nginx/nginx.conf       2018-09-09 04:31:19.661050744 +0000
+++ /tmp/new-nginx-cfg127948470 2018-09-09 04:31:22.984413393 +0000
@@ -1,5 +1,5 @@

-# Configuration checksum: 10535667221524200029
+# Configuration checksum: 13790196543230584327

 # setup custom paths that do not require root access
 pid /tmp/nginx.pid;
@@ -414,7 +414,7 @@

                        port_in_redirect off;

-                       set $proxy_upstream_name "";
+                       set $proxy_upstream_name "dev-spring-boot-terminal-manager-8761";

                        client_max_body_size                    "1m";

@@ -467,8 +467,9 @@
                        proxy_next_upstream                     error timeout;
                        proxy_next_upstream_tries               3;

-                       # No endpoints available for the request
-                       return 503;
+                       proxy_pass http://upstream_balancer;
+
+                       proxy_redirect                          off;

                }

I0909 04:31:23.021703       6 controller.go:187] Backend successfully reloaded.
I0909 04:31:23.022740       6 nginx.go:837] Posting to http://localhost:18080/configuration/backends: [{"name":"dev-spring-boot-terminal-manager-8761","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":8761,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}},{"name":"upstream-default-backend","service":{"metadata":{"creationTimestamp":null},"spec":{"ports":[{"name":"http","protocol":"TCP","port":8761,"targetPort":8761}],"selector":{"app":"spring-boot-terminal-manager"},"clusterIP":"10.108.120.239","type":"ClusterIP","sessionAffinity":"None"},"status":{"loadBalancer":{}}},"port":0,"secure":false,"secureCACert":{"secret":"","caFilename":"","pemSha":""},"sslPassthrough":false,"endpoints":[{"address":"10.244.1.155","port":"8761","maxFails":0,"failTimeout":0}],"sessionAffinityConfig":{"name":"","cookieSessionAffinity":{"name":"","hash":""}}}]
I0909 04:31:23.025174       6 socket.go:330] removing ingresses [] from metrics
I0909 04:31:23.027011       6 controller.go:204] Dynamic reconfiguration succeeded.
I0909 04:31:23.027235       6 controller.go:188] removing SSL certificate metrics for [] hosts
I0909 04:31:49.030350       6 status.go:362] updating Ingress dev/ingress-nginx status to [{172.18.114.108 }]
I0909 04:31:49.033461       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033548", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx
I0909 04:32:16.138406       6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"dev", Name:"ingress-nginx", UID:"e2b49702-b3df-11e8-ab68-00163e024753", APIVersion:"extensions/v1beta1", ResourceVersion:"3033598", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress dev/ingress-nginx

My resources

NAME                                                READY     STATUS    RESTARTS   AGE       IP             NODE
pod/nginx-ingress-controller-75b8cf8fbc-zz8gl       1/1       Running   0          16m       10.244.2.205   izwz98qmbokd3aji6kfguoz
pod/redis-master-55cc5f7b96-pzwtp                   1/1       Running   0          17d       10.244.1.131   izwz90c5nb7ulrekxzpmdwz
pod/spring-boot-terminal-manager-7574d575bf-wsj74   1/1       Running   0          1d        10.244.1.155   izwz90c5nb7ulrekxzpmdwz

NAME                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE       SELECTOR
service/ingress-nginx                  ClusterIP   10.107.150.21    <none>        80/TCP,443/TCP   2h        app=ingress-nginx
service/redis-master                   ClusterIP   10.99.70.196     <none>        6379/TCP         17d       name=redis-master
service/spring-boot-terminal-manager   ClusterIP   10.108.120.239   <none>        8761/TCP         15m       app=spring-boot-terminal-manager

NAME                               HOSTS                  ADDRESS          PORTS     AGE
ingress.extensions/ingress-nginx   foo.terminal-manager   172.18.114.108   80        1h

Yaml config:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: dev
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ingress-nginx
  template:
    metadata:
      labels:
        app: ingress-nginx
      annotations:
        prometheus.io/port: '10254'
        prometheus.io/scrape: 'true'
    spec:
      serviceAccountName: nginx-ingress-serviceaccount
      nodeSelector:
        kubernetes.io/hostname: izwz98qmbokd3aji6kfguoz
      containers:
        - name: nginx-ingress-controller
          image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.19.0
          args:
            - /nginx-ingress-controller
            - --default-backend-service=$(POD_NAMESPACE)/spring-boot-terminal-manager
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            - --report-node-internal-ip-address=true
            - --annotations-prefix=nginx.ingress.kubernetes.io
            - --v=2
          securityContext:
            capabilities:
                drop:
                - ALL
                add:
                - NET_BIND_SERVICE
            # www-data -> 33
            runAsUser: 33
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          ports:
          - name: http
            containerPort: 80
          - name: https
            containerPort: 443
          - name: status
            containerPort: 18080
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 80
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /healthz
              port: 80
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1

apiVersion: v1
kind: Service
metadata:
  name: spring-boot-terminal-manager
  namespace: dev
  labels:
    app: spring-boot-terminal-manager
spec:
  ports:
  - port: 8761
    protocol: TCP
    targetPort: 8761
    name: http
  selector:
    app: spring-boot-terminal-manager

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-nginx
  namespace: dev
  labels:
    app: ingress-nginx
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: foo.terminal-manager
    http:
      paths:
      - backend:
          serviceName: spring-boot-terminal-manager
          servicePort: 8761
        path: /

The text was updated successfully, but these errors were encountered:

aledbf · 2018-09-10T12:20:25Z

@JerryChaox the spring-boot-terminal-manager has probes? Please check the pod logs for failures.
If you see this behavior it means there's something wrong with the health checks

JerryChaox · 2018-09-10T12:46:22Z

@aledbf I have not set the probe. Is it the root cause?

zhuleiandy888 · 2018-09-19T07:42:18Z

I have a same problem, my webserver return http 503 error code.
ENV:
NGINX Ingress controller version: 0.15.0
Kubernetes version (use kubectl version): 1.9.8
OS: centos7.2

is this a bug ?

JerryChaox · 2018-09-21T03:52:35Z

@aledbf I have add a probes to my service and the pods didn't log any failure but nginx-controller still log "Service **** does not have any active EndPoint.".

JerryChaox · 2018-09-21T07:08:04Z

183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:51 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.008 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.008 200 75a043dc9ba9dfa31d56b4ba93bbbb27
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:51 +0000] "GET /health_data HTTP/1.1" 200 5108 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.009 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.009 200 5fa6c6c947aea6e95f8f170bee4cfe26
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:56 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.009 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.009 200 d303f5531b8e40ec71fea06bfe9b6851
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:01:56 +0000] "GET /health_data HTTP/1.1" 200 5105 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.010 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.009 200 d52806286531c79d805db07eb29a13c2
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:01 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.008 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.008 200 d9ff10fd028812e1fefd9df6c93f3452
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:01 +0000] "GET /health_data HTTP/1.1" 200 5107 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.010 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.011 200 f15ad7098212ffd767dfd90d5face853
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:06 +0000] "GET /toplevel_data HTTP/1.1" 200 113 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.009 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 113 0.008 200 5403797e8081ca3eb12d2c2e5e420dd7
W0921 07:02:06.406278       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:02:06.406372       6 controller.go:171] Configuration changes detected, backend reload required.
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:06 +0000] "GET /health_data HTTP/1.1" 200 5106 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.011 [rook-ceph-rook-ceph-mgr-dashboard-7000] 10.244.1.189:7000 28530 0.011 200 e3a5c429440eda7d7521d03287012785
I0921 07:02:06.512918       6 controller.go:187] Backend successfully reloaded.
I0921 07:02:06.519054       6 controller.go:204] Dynamic reconfiguration succeeded.
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:11 +0000] "GET /toplevel_data HTTP/1.1" 503 615 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 389 0.000 [] - - - - 6cc5041fbd17a24eff911de8fb6caaa5
183.36.80.191 - [183.36.80.191] - - [21/Sep/2018:07:02:11 +0000] "GET /health_data HTTP/1.1" 503 615 "http://rook.jyskm.com/health" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 387 0.000 [] - - - - 61c9da5a76083f4a507f731f2dea01fe
W0921 07:02:12.177108       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:02:15.510630       6 controller.go:171] Configuration changes detected, backend reload required.
I0921 07:02:15.615510       6 controller.go:187] Backend successfully reloaded.
I0921 07:02:15.622120       6 controller.go:204] Dynamic reconfiguration succeeded.
W0921 07:04:16.626457       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:04:16.626564       6 controller.go:171] Configuration changes detected, backend reload required.
I0921 07:04:16.731334       6 controller.go:187] Backend successfully reloaded.
I0921 07:04:16.737416       6 controller.go:204] Dynamic reconfiguration succeeded.
W0921 07:04:19.959767       6 controller.go:806] Service "rook-ceph/rook-ceph-mgr-dashboard" does not have any active Endpoint.
I0921 07:04:23.293183       6 controller.go:171] Configuration changes detected, backend reload required.
I0921 07:04:23.406815       6 controller.go:187] Backend successfully reloaded.
I0921 07:04:23.412610       6 controller.go:204] Dynamic reconfiguration succeeded.

The behaviour is still present in the rook-ceph-mgr-dashboard service.

zhuleiandy888 · 2018-09-25T03:22:12Z

@JerryChaox @aledbf

W0925 01:07:25.478161 7 controller.go:359] Service "ingress-nginx/default-http-backend" does not have any active Endpoint
I0925 01:07:25.478408 7 controller.go:169] Configuration changes detected, backend reload required.
I0925 01:07:25.593527 7 controller.go:185] Backend successfully reloaded.
172.30.10.1 - [172.30.10.1] - - [25/Sep/2018:01:07:27 +0000] "POST /api/common/message/get-captchaid HTTP/1.1" 502 668 "http://test-zhike.vhall.com/setAccount" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0" 815 3.011 [default-test-marketing-fe-80] 172.30.10.11:80 668 3.011 502 427e4784a830ca5c180767f84e0a36de
I0925 01:07:28.811708 7 controller.go:169] Configuration changes detected, backend reload required.
I0925 01:07:28.969324 7 controller.go:185] Backend successfully reloaded.

=============================================================

W0925 00:45:53.408222 7 controller.go:359] Service "ingress-nginx/default-http-backend" does not have any active Endpoint
W0925 00:45:53.408300 7 controller.go:797] Service "default/test-marketing-web" does not have any active Endpoint.
I0925 00:45:53.408469 7 controller.go:169] Configuration changes detected, backend reload required.
I0925 00:45:53.518670 7 controller.go:185] Backend successfully reloaded.
172.30.10.1 - [172.30.10.1] - - [25/Sep/2018:00:45:53 +0000] "POST /api/common/message/get-captchaid HTTP/1.1" 502 668 "http://test-zhike.vhall.com/setAccount" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0" 815 3.007 [default-test-marketing-fe-80] 172.30.20.3:80 668 3.007 502 a7d5583d007813e574a7a9a4e6843487

Ingress version: 0.17.1
K8S version: 1.9.8
In 0.15.0 , i had a 503 error code, in 0.17.1 the 503 error has gone, but i have a 502 error on reload.
The endpoints of service are always running, I've been testing that there are endpoints.
i think this is a BUG.

lomocc · 2018-09-27T06:07:42Z

same problem:

upstream upstream_balancer {
  server 0.0.0.1; # placeholder
  balancer_by_lua_block {
    balancer.balance()
  }		
  keepalive 32;
}

proxy_pass http://upstream_balancer;

my yml:

apiVersion: v1
kind: Namespace
metadata:
  name: seafile
---
apiVersion: v1
kind: Service
metadata:
  name: seafile
  namespace: seafile
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  name: seafile
  namespace: seafile
subsets:
  - addresses:
    - ip: 119.27.169.191
    ports:
    - port: 8000
      protocol: TCP
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: seafile
  namespace: seafile
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: seafile.lomo.cc
    http:
      paths:
      - path: /
        backend:
          serviceName: seafile
          servicePort: 80

How to fix?

aledbf · 2018-09-27T13:36:36Z

@zhuleiandy888 in your case, please update to 0.19.0. There was an error in 0.15 that triggered reloads #2636

zhuleiandy888 · 2018-09-29T05:22:49Z

@aledbf Thanks! Now it looks like nothing is wrong with version 0.19.0. I'll keep testing it for a while.

JerryChaox · 2018-09-29T09:47:04Z

This issue still occurs in my case with 0.19.0

zhuleiandy888 · 2018-09-29T10:22:49Z

@JerryChaox to check your k8s cluster status of nodes, check the last event info of nodes. be sure that kube-controller-manager service hasn't error log, the nodes report self status to master is normal.
If nodes status is "NodeNotReady", all pods status of this node is false, the endpoints will be lost. then ingress will generate a warning, like this “Service "xxxxx/xxxxxxxxx" does not have any active Endpoint”.

To check if the heartbeat timeout is reported from the node, check that the parameter "--node-monitor-grace-period" and "--node-monitor-period" settings are reasonable.

I hope that will help you.

JerryChaox · 2018-10-03T18:19:07Z

@zhuleiandy888 @aledbf I found that the nginx controller lost endpoint info when the kube-controller-manager print error as follows:

E1003 17:53:12.421259       1 node_lifecycle_controller.go:889] Error updating node izwz90c5nb7ulrekxzpmdwz: Operation cannot be fulfilled on nodes "izwz90c5nb7ulrekxzpmdwz": the object has been modified; please apply your changes to the latest version and try again
E1003 17:54:12.522140       1 node_lifecycle_controller.go:889] Error updating node izwz90c5nb7ulrekxzpmdwz: Operation cannot be fulfilled on nodes "izwz90c5nb7ulrekxzpmdwz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:01:37.712320       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:02:07.758132       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:02:37.791448       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:03:47.896906       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:04:17.933976       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:04:47.974186       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:05:18.011322       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:07:38.241021       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:08:08.278313       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:08:38.318800       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again
E1003 18:10:48.526181       1 node_lifecycle_controller.go:889] Error updating node izwz95wx4ufpnolro1iekmz: Operation cannot be fulfilled on nodes "izwz95wx4ufpnolro1iekmz": the object has been modified; please apply your changes to the latest version and try again

I hava no idea to debug from this Infromation. Could anyone give me some help?

rainbowBPF2 · 2018-12-13T12:46:52Z

@JerryChaox to check your k8s cluster status of nodes, check the last event info of nodes. be sure that kube-controller-manager service hasn't error log, the nodes report self status to master is normal.
If nodes status is "NodeNotReady", all pods status of this node is false, the endpoints will be lost. then ingress will generate a warning, like this “Service "xxxxx/xxxxxxxxx" does not have any active Endpoint”.

To check if the heartbeat timeout is reported from the node, check that the parameter "--node-monitor-grace-period" and "--node-monitor-period" settings are reasonable.

I hope that will help you.

Helps a lot for understanding its internal mechanism.

Recently, our cluster also encounters this problem.
Back end pods running OK, Service itself OK, but the endpoints link service and pods is missing.
With ingress controller log shows " controller.go:869] service **** does not have any active endpoints "
Such case lasts for about one minutes, ingress return 503 frequently to USER.
After that, k8s cluster will again reload back these missing endpints, everything back to normal.

During these 503 time, no deployment update and health check is OK .
Still don't know why.

rainbowBPF2 · 2018-12-13T12:52:48Z

Our Cluster version ： 1.6.6
Ingress Controller Version: Release: 0.9.0-beta.17

Thanks for the explanation , and I will try use a higher version ingress-controller. @aledbf

One more guess, will the problem related to internal endpoints cache ? if exists. Just guess.

fejta-bot · 2019-03-13T13:30:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-04-12T14:02:54Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-05-12T14:47:37Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-05-12T14:47:44Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 12, 2019

k8s-ci-robot closed this as completed May 12, 2019

mconigliaro mentioned this issue Feb 27, 2024

Reoccurrence of Service does not have any active Endpoint [when it actually does] #9932

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeatly found and lost a service endpoints. #3060

Repeatly found and lost a service endpoints. #3060

JerryChaox commented Sep 9, 2018 •

edited

Loading

aledbf commented Sep 10, 2018

JerryChaox commented Sep 10, 2018

zhuleiandy888 commented Sep 19, 2018

JerryChaox commented Sep 21, 2018 •

edited

Loading

JerryChaox commented Sep 21, 2018

zhuleiandy888 commented Sep 25, 2018 •

edited

Loading

lomocc commented Sep 27, 2018 •

edited

Loading

aledbf commented Sep 27, 2018

zhuleiandy888 commented Sep 29, 2018

JerryChaox commented Sep 29, 2018

zhuleiandy888 commented Sep 29, 2018

JerryChaox commented Oct 3, 2018

rainbowBPF2 commented Dec 13, 2018

rainbowBPF2 commented Dec 13, 2018

fejta-bot commented Mar 13, 2019

fejta-bot commented Apr 12, 2019

fejta-bot commented May 12, 2019

k8s-ci-robot commented May 12, 2019

Repeatly found and lost a service endpoints. #3060

Repeatly found and lost a service endpoints. #3060

Comments

JerryChaox commented Sep 9, 2018 • edited Loading

aledbf commented Sep 10, 2018

JerryChaox commented Sep 10, 2018

zhuleiandy888 commented Sep 19, 2018

JerryChaox commented Sep 21, 2018 • edited Loading

JerryChaox commented Sep 21, 2018

zhuleiandy888 commented Sep 25, 2018 • edited Loading

lomocc commented Sep 27, 2018 • edited Loading

aledbf commented Sep 27, 2018

zhuleiandy888 commented Sep 29, 2018

JerryChaox commented Sep 29, 2018

zhuleiandy888 commented Sep 29, 2018

JerryChaox commented Oct 3, 2018

rainbowBPF2 commented Dec 13, 2018

rainbowBPF2 commented Dec 13, 2018

fejta-bot commented Mar 13, 2019

fejta-bot commented Apr 12, 2019

fejta-bot commented May 12, 2019

k8s-ci-robot commented May 12, 2019

JerryChaox commented Sep 9, 2018 •

edited

Loading

JerryChaox commented Sep 21, 2018 •

edited

Loading

zhuleiandy888 commented Sep 25, 2018 •

edited

Loading

lomocc commented Sep 27, 2018 •

edited

Loading