Controller first boot (or two) fails when combined with istio-proxy #1968

Stono · 2018-01-24T08:35:32Z

Hey,
I've observed this behaviour, which is the container failing to boot one or two times with every update we do.

❯ kn logs -p nginx-ingress-controller-58688765db-hxlzq nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.0
  Build:      git-0836cb30
  Repository: https://github.com/aledbf/ingress
-------------------------------------------------------------------------------

I0124 08:33:25.301191       6 main.go:181] Creating API client for https://10.178.0.1:443
F0124 08:33:25.303541       6 main.go:236] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.178.0.1:443/version: dial tcp 10.178.0.1:443: getsockopt: connection refused
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md

Then it'll crash, and come back to life:

❯ kn logs nginx-ingress-controller-58688765db-hxlzq nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.0
  Build:      git-0836cb30
  Repository: https://github.com/aledbf/ingress
-------------------------------------------------------------------------------

I0124 08:33:27.712235       7 main.go:181] Creating API client for https://10.178.0.1:443
I0124 08:33:27.723704       7 main.go:193] Running in Kubernetes Cluster version v1.8+ (v1.8.6-gke.0) - git (clean) commit ee9a97661f14ee0b1ca31d6edd30480c89347c79 - platform linux/amd64

Any ideas?

The text was updated successfully, but these errors were encountered:

aledbf · 2018-01-24T10:59:24Z

@Stono please add the flag --v=10 to increase the verbosity and be able to see the reason

aledbf · 2018-01-24T10:59:49Z

@Stono usually this means there's some networking issue

Stono · 2018-01-24T14:15:15Z

Hey @aledbf - I added that -v=10, it does look like connectivity, i just don't understand why after a crash it then works?

It's like that initial call to the k8 master to get the version fails?

❯ kn logs -p nginx-ingress-controller-5d978f6589-d6jn6 nginx-ingress-controller --follow
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.0
  Build:      git-0836cb30
  Repository: https://github.com/aledbf/ingress
-------------------------------------------------------------------------------

I0124 14:12:54.035114      11 main.go:181] Creating API client for https://10.192.0.1:443
I0124 14:12:54.035922      11 round_trippers.go:417] curl -k -v -XGET  -H "Accept: application/vnd.kubernetes.protobuf, */*" -H "User-Agent: nginx-ingress-controller/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer **omitted**" https://10.192.0.1:443/version
I0124 14:12:54.036319      11 round_trippers.go:436] GET https://10.192.0.1:443/version  in 0 milliseconds
I0124 14:12:54.036707      11 round_trippers.go:442] Response Headers:
F0124 14:12:54.036739      11 main.go:236] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md

aledbf · 2018-01-24T15:12:34Z

Please check if the node where it works it is the same where it fails

Stono · 2018-01-24T15:20:26Z

@aledbf I've confirmed the node is the same

Stono · 2018-02-14T14:08:46Z

@aledbf I've finally got to the bottom of this.
I'm using istio on our cluster, which wraps nginx-ingress-controller in envoy, what basically happens is this:

proxy_init container starts and adds iptables rules to intercept outbound traffic
nginx-ingress-controller and envoy start at the same time
nginx-ingress-controller attempts to read kubernetes.default/version before istio-proxy has become available, resulting in the connection refused message.

I've actually worked around this by having a custom entrypoint on the nginx-ingress-controller, but it's a bit hacky and I'm thinking it'd be nice if nginx-ingress-controller itself implemented a retry when checking the version of kubernetes, what do you think?

#!/bin/bash
echo "Starting..."
sleep 5

MAX_TRIES=10
CURRENT_ATTEMPT=0
echo "Waiting for Kubernetes API to be available (istio-proxy bollocks)...."
until $(curl -k --output /dev/null --silent --fail https://kubernetes.default/version); do
  CURRENT_ATTEMPT=$((CURRENT_ATTEMPT+1))
  printf '.'
  sleep 2
    if [ "$CURRENT_ATTEMPT" = "$MAX_TRIES" ]; then
      echo "Failed to get kubernetes version after $MAX_TRIES"
      exit 1
    fi
done

echo "Booting nginx...($*)"
exec $*

aledbf · 2018-02-14T19:07:22Z

@Stono please try quay.io/aledbf/nginx-ingress-controller:0.327

Stono · 2018-02-14T19:15:21Z

No joy I'm afriad @aledbf:

As you can see, 1 restart on both instances of my controller:

ingress-nginx-external-controller-9c4d5dbf6-t5nv8    3/3       Running   1          4m
ingress-nginx-external-controller-9c4d5dbf6-vkksc    3/3       Running   1          1m

Looking at the first try logs:

❯ kn logs ingress-nginx-external-controller-9c4d5dbf6-t5nv8 -c nginx-ingress-controller -p
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-c8f6af39
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

I0214 19:09:31.897256       7 flags.go:159] Watching for ingress class: nginx-external
W0214 19:09:31.897299       7 flags.go:162] only Ingress with class "nginx-external" will be processed by this ingress controller
I0214 19:09:31.897919       7 main.go:183] Creating API client for https://10.192.0.1:443
F0214 19:09:31.898967       7 main.go:257] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md

Then the second:

❯ kn logs ingress-nginx-external-controller-9c4d5dbf6-vkksc nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-c8f6af39
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

I0214 19:13:16.080976       7 flags.go:159] Watching for ingress class: nginx-external
W0214 19:13:16.081024       7 flags.go:162] only Ingress with class "nginx-external" will be processed by this ingress controller
I0214 19:13:16.081594       7 main.go:183] Creating API client for https://10.192.0.1:443
I0214 19:13:16.088368       7 main.go:214] Running in Kubernetes Cluster version v1.9+ (v1.9.2-gke.1) - git (clean) commit 4ce7af72d8d343ea2f7680348852db641ff573af - platform linux/amd64

This is marginally better than before (typically I had a couple of pod crashes)! From looking at your code though if i'm reading it right you're only doing 50ms backoffs? Probably needs to be a bit longer to give the other istio pod time to boot + pull its configuration from pilot

aledbf · 2018-02-14T19:23:36Z

@Stono try quay.io/aledbf/nginx-ingress-controller:0.328

Stono · 2018-02-14T19:28:50Z

Still the same :'( @aledbf

ingress-nginx-external-controller-865d96c779-8gkv5   3/3       Running   1          1m
ingress-nginx-external-controller-865d96c779-8rstv   3/3       Running   1          1m

❯ kn logs -p ingress-nginx-external-controller-865d96c779-8rstv nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-5cbf17b0
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

I0214 19:25:20.052413       8 flags.go:159] Watching for ingress class: nginx-external
W0214 19:25:20.052467       8 flags.go:162] only Ingress with class "nginx-external" will be processed by this ingress controller
I0214 19:25:20.053087       8 main.go:183] Creating API client for https://10.192.0.1:443
F0214 19:25:20.054335       8 main.go:257] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md

Looking at the time between the logs though Creating API client for was at 19:25:20.053087 and then the error was at 19:25:20.054335, i'm not sure that backoff is actually working? Looks like it's failing off the first try?

Perhaps some more logs showing "Trying..." or something?

aledbf · 2018-02-14T20:07:57Z

@Stono please use quay.io/aledbf/nginx-ingress-controller:0.329 (log added)

Stono · 2018-02-14T20:38:15Z

Will check it out in the morning, thanks!

…

On 14 Feb 2018 8:08 pm, "Manuel Alejandro de Brito Fontes" < ***@***.***> wrote: @Stono <https://github.com/stono> please use quay.io/aledbf/nginx-ingress- controller:0.329 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1968 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABaviUIO0fJzd1ZwXXqr14JV9T15VwHaks5tUz0hgaJpZM4Rq4_m> .

Stono · 2018-02-15T08:23:50Z

Yup, only seeing the error once @aledbf:

❯ kn logs ingress-nginx-external-controller-8469d889b6-bmpsp nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-0050b556
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

I0215 08:22:24.551884       9 flags.go:159] Watching for ingress class: nginx-external
W0215 08:22:24.551941       9 flags.go:162] only Ingress with class "nginx-external" will be processed by this ingress controller
I0215 08:22:24.552528       9 main.go:183] Creating API client for https://10.192.0.1:443
I0215 08:22:24.553173       9 main.go:202] trying to discover Kubernetes version
E0215 08:22:24.553547       9 main.go:208] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
F0215 08:22:24.553576       9 main.go:259] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused

I'd expect to be seeing unexpected error discovering Kubernetes version multiple times

aledbf · 2018-02-15T17:20:07Z

@Stono please use quay.io/aledbf/nginx-ingress-controller:0.330

aledbf · 2018-02-15T17:26:14Z

and also quay.io/aledbf/nginx-ingress-controller:0.331

aledbf · 2018-02-15T17:27:29Z

@Stono I think 0.331 is the one. I am sorry to bother with the tests but I cannot reproduce this issue

Edit: the log about the retries only appears if you use --v=2

Stono · 2018-02-15T17:44:40Z

It's OK @aledbf - no joy with 331 either, this is running with --v=2 too:

❯ kn logs ingress-nginx-external-controller-78c899b4fd-9sq7w -p nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-7b1ce6b8
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

I0215 17:43:32.061905       8 flags.go:159] Watching for ingress class: nginx-external
W0215 17:43:32.061990       8 flags.go:162] only Ingress with class "nginx-external" will be processed by this ingress controller
I0215 17:43:32.062623       8 main.go:183] Creating API client for https://10.192.0.1:443
I0215 17:43:32.063314       8 main.go:195] trying to discover Kubernetes version
I0215 17:43:32.063717       8 main.go:201] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
F0215 17:43:32.063826       8 main.go:252] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md

aledbf · 2018-02-15T17:56:25Z

@Stono last test, I promise :)
quay.io/aledbf/nginx-ingress-controller:0.332

Stono · 2018-02-15T18:02:49Z

Your retry works now, it's just still a bit too fast haha.

I0215 18:01:28.731695      12 main.go:183] Creating API client for https://10.192.0.1:443
I0215 18:01:28.732774      12 main.go:196] trying to discover Kubernetes version
I0215 18:01:28.733456      12 main.go:204] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 18:01:29.233693      12 main.go:196] trying to discover Kubernetes version
I0215 18:01:29.234104      12 main.go:204] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 18:01:29.733624      12 main.go:196] trying to discover Kubernetes version
I0215 18:01:29.734010      12 main.go:204] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 18:01:30.233654      12 main.go:196] trying to discover Kubernetes version
I0215 18:01:30.233955      12 main.go:204] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 18:01:30.733637      12 main.go:196] trying to discover Kubernetes version
I0215 18:01:30.733953      12 main.go:204] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 18:01:30.733973      12 main.go:196] trying to discover Kubernetes version
I0215 18:01:30.734077      12 main.go:204] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
F0215 18:01:30.734091      12 main.go:255] Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration). Reason: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/ingress-nginx/blob/master/docs/troubleshooting.md

aledbf · 2018-02-15T18:08:26Z

quay.io/aledbf/nginx-ingress-controller:0.333

Stono · 2018-02-15T18:08:50Z

I've gotta nip out for a few hours but will test that when i'm back - sorry!

Stono · 2018-02-15T21:56:31Z

It worked, yay! Nice work @aledbf

❯ kn logs ingress-nginx-external-controller-75cb9967b4-7zl82 -c nginx-ingress-controller
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-08392422
  Repository: https://github.com/aledbf/ingress-nginx
-------------------------------------------------------------------------------

I0215 21:55:19.010824      11 flags.go:159] Watching for ingress class: nginx-external
W0215 21:55:19.010919      11 flags.go:162] only Ingress with class "nginx-external" will be processed by this ingress controller
I0215 21:55:19.011609      11 main.go:183] Creating API client for https://10.192.0.1:443
I0215 21:55:19.012289      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:19.012642      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:20.099269      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:20.099665      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:21.652031      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:21.652529      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:24.006623      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:24.006988      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:27.660130      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:27.660612      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:33.170121      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:33.170525      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:41.219573      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:41.219952      11 main.go:211] unexpected error discovering Kubernetes version: Get https://10.192.0.1:443/version: dial tcp 10.192.0.1:443: getsockopt: connection refused
I0215 21:55:53.461305      11 main.go:203] trying to discover Kubernetes version
I0215 21:55:53.469122      11 main.go:218] Running in Kubernetes Cluster version v1.9+ (v1.9.2-gke.1) - git (clean) commit 4ce7af72d8d343ea2f7680348852db641ff573af - platform linux/amd6
4
I0215 21:55:53.471588      11 main.go:86] validated ingress-nginx/default-http-backend as the default backend
I0215 21:55:53.473840      11 main.go:107] service ingress-nginx/ingress-nginx-external validated as source of Ingress status
I0215 21:55:53.644723      11 stat_collector.go:77] starting new nginx stats collector for Ingress controller running in namespace  (class nginx-external)
I0215 21:55:53.644758      11 stat_collector.go:78] collector extracting information from port 18080
I0215 21:55:53.657144      11 nginx.go:245] starting Ingress controller

aledbf · 2018-02-15T22:05:00Z

@Stono this takes too long. Is this "normal"?

aledbf · 2018-02-15T22:05:28Z

@Stono if you remove the istio proxy this there is no problem right?

Stono · 2018-02-15T22:15:33Z

@aledbf actually that is a really good point, I didn't realise just how long that was (30 seconds). Maybe someone from the Istio team can offer us support (I've raised a cross issue)

Init container implements IPTables rules which route all traffic to istio-proxy:15001
nginx-ingress-controller boots and isn't able to speak to the kubernetes api server until istio-proxy is available to handle the traffic

But yes, to answer your question without istio-proxy it boots instantly, first time (we have two clusters, one is istio, one is not)

aledbf · 2018-02-15T22:19:24Z

But yes, to answer your question without istio-proxy it boots instantly, first time (we have two clusters, one is istio, one is not)

👍

aledbf mentioned this issue Feb 14, 2018

Retry initial connection to the Kubernetes cluster #2089

Merged

Stono mentioned this issue Feb 15, 2018

(I believe) The startup time of istio-proxy causes some communications issues for up to 30s with the kubernetes API istio/istio#3533

Closed

Stono changed the title ~~Controller first boot (or two) usually fails~~ Controller first boot (or two) fails when combined with istio-proxy Feb 15, 2018

aledbf closed this as completed in #2089 Feb 17, 2018

aledbf mentioned this issue Feb 21, 2018

Proposal: Make nginx ingress ready for use with service meshes #2126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller first boot (or two) fails when combined with istio-proxy #1968

Controller first boot (or two) fails when combined with istio-proxy #1968

Stono commented Jan 24, 2018

aledbf commented Jan 24, 2018

aledbf commented Jan 24, 2018

Stono commented Jan 24, 2018 •

edited

Loading

aledbf commented Jan 24, 2018

Stono commented Jan 24, 2018

Stono commented Feb 14, 2018 •

edited

Loading

aledbf commented Feb 14, 2018

Stono commented Feb 14, 2018 •

edited

Loading

aledbf commented Feb 14, 2018

Stono commented Feb 14, 2018 •

edited

Loading

aledbf commented Feb 14, 2018 •

edited

Loading

Stono commented Feb 14, 2018 via email

Stono commented Feb 15, 2018

aledbf commented Feb 15, 2018

aledbf commented Feb 15, 2018

aledbf commented Feb 15, 2018 •

edited

Loading

Stono commented Feb 15, 2018

aledbf commented Feb 15, 2018

Stono commented Feb 15, 2018 •

edited

Loading

aledbf commented Feb 15, 2018

Stono commented Feb 15, 2018

Stono commented Feb 15, 2018 •

edited

Loading

aledbf commented Feb 15, 2018

aledbf commented Feb 15, 2018

Stono commented Feb 15, 2018 •

edited

Loading

aledbf commented Feb 15, 2018

Controller first boot (or two) fails when combined with istio-proxy #1968

Controller first boot (or two) fails when combined with istio-proxy #1968

Comments

Stono commented Jan 24, 2018

aledbf commented Jan 24, 2018

aledbf commented Jan 24, 2018

Stono commented Jan 24, 2018 • edited Loading

aledbf commented Jan 24, 2018

Stono commented Jan 24, 2018

Stono commented Feb 14, 2018 • edited Loading

aledbf commented Feb 14, 2018

Stono commented Feb 14, 2018 • edited Loading

aledbf commented Feb 14, 2018

Stono commented Feb 14, 2018 • edited Loading

aledbf commented Feb 14, 2018 • edited Loading

Stono commented Feb 14, 2018 via email

Stono commented Feb 15, 2018

aledbf commented Feb 15, 2018

aledbf commented Feb 15, 2018

aledbf commented Feb 15, 2018 • edited Loading

Stono commented Feb 15, 2018

aledbf commented Feb 15, 2018

Stono commented Feb 15, 2018 • edited Loading

aledbf commented Feb 15, 2018

Stono commented Feb 15, 2018

Stono commented Feb 15, 2018 • edited Loading

aledbf commented Feb 15, 2018

aledbf commented Feb 15, 2018

Stono commented Feb 15, 2018 • edited Loading

aledbf commented Feb 15, 2018

Stono commented Jan 24, 2018 •

edited

Loading

Stono commented Feb 14, 2018 •

edited

Loading

Stono commented Feb 14, 2018 •

edited

Loading

Stono commented Feb 14, 2018 •

edited

Loading

aledbf commented Feb 14, 2018 •

edited

Loading

aledbf commented Feb 15, 2018 •

edited

Loading

Stono commented Feb 15, 2018 •

edited

Loading

Stono commented Feb 15, 2018 •

edited

Loading

Stono commented Feb 15, 2018 •

edited

Loading