NGINX 502 Bad Gateway when using a single replication #4375

BuddhiWathsala · 2019-07-29T16:26:28Z

Is this a request for help? No

What keywords did you search in NGINX Ingress controller issues before filing this one? nginx-ingress, zero-downtime, single replica

Is this a BUG REPORT or FEATURE REQUEST? : BUG REPORT

NGINX Ingress controller version: 0.23.0

Kubernetes version (use kubectl version): v1.15.0

Environment: minikube version: v1.2.0

What happened:
I need to deploy an HTTP app with zero-downtime. Here I have a restriction of using single pod only. So I had problems with some of the HTTP requests get 502 bad gateway when I was using an NGINX ingress.

I followed these answers given in these two issues(#489 and #322). The answers to these issues work fine when I was using more than a single replica. But for a single replica, NGINX still have a slight downtime which is less than 1 millisecond.

The lifecycle spec and rolling updates spec of my deployment set as below according to the answers given by the above issues.

spec:
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate
    ...
    spec:
        ....
        lifecycle:
          preStop:
            exec:
              command:
              - sleep
              - "30"

Note that I have config maps that mount to this deployment. I'm not sure that would affect to this downtime or not.

Also, I refer the following blogs but it didn't work for this single pod scenario.
[1]: https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes
[2]: http://rahmonov.me/posts/zero-downtime-deployment-with-kubernetes/

What you expected to happen:
Pod receives HTTP request without a downtime.

How to reproduce it :

Create a deployment with one pod which has HTTP receiver.
Create a service with a Cluster IP.
Set up NGINX ingress to access the above port.
Send continuous requests.
Change the deployment spec and apply that changes using kubectl apply. Then you will see missing HTTP requests.

Anything else we need to know:

According to the blog [2], we can achieve zero downtime even with single replication of pod without ingress. But why it cannot be achievable when I used NGINX ingress?

The text was updated successfully, but these errors were encountered:

dcherniv · 2019-07-31T14:22:41Z

@BuddhiWathsala can you change the service to the loadbalancer type and run two tests simultaneously, one hitting the ingress the other hitting the loadbalancer endpoint.
It would curious to see whether the issue is with pod being unavailable or the nginx itself.

BuddhiWathsala · 2019-08-01T04:48:58Z

@dcherniv I ran the two tests. When we are using single replica the LoadBalancer configuration also have a downtime. As far as I understand the problem resides in pod level. When we do some changes to the deployment, then the existing pod will start to terminate. The HTTP connections that currently established with that pod give 502 bad gateway because that pod can no longer be able to process requests.

But I have a confusion why this problem might not arise when I have multiple pods. When I have multiple pods, did NGINX controller intelligently redirect error requests to other available pods without returning an error to the user?

dcherniv · 2019-08-01T18:52:40Z

@BuddhiWathsala
Is this a deployment or a statefulset?
The behavior you are describing applies to statefulsets where a new pod will not be brought up until old one is terminated. Looking at your spec you do have update strategy setup properly. This strategy will bring up a new pod and ONLY then terminate the old one. If the currently pod terminates before the new one is brought up then thats the problem. That being said i cannot reproduce with the below spec, in any case this is probably not the issue with the nginx ingress controller which we ruled out by hitting LoadBalancer directly

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: echoserver
spec:
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
    type: RollingUpdate   
  replicas: 1
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
      - image: gcr.io/google_containers/echoserver:1.0
        imagePullPolicy: Always
        name: echoserver
        ports:
        - containerPort: 8080

with following spec

BuddhiWathsala · 2019-08-02T04:47:48Z

@dcherniv I have a deployment. Also, I agree with your argument. As far as I understand now the problem might be in pod level. Which means at the termination time pod has some HTTP connections that already being established. When pod receives SIGTERM, pod terminate immediately and pod unable to send responses to the already established connections. Therefore we get 502 response.

Since this was not an issue in ingress-nginx I can close the issue.

But I have a thing to clarify as I asked previously. I don't understand why this 502 response did not receive when I have 2 pods?.

When I have 2 pods, did NGINX controller intelligently redirect error requests to another available pod without returning an error to the user?

nic-6443 · 2019-08-11T04:05:52Z

@BuddhiWathsala When idempotent request (for example GET method) fail, NGINX can config to retry. In that scenario ,upstream field in access log of retryed request will have multi value.You can check it.

fejta-bot · 2019-11-09T04:37:21Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-12-09T05:20:57Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-01-08T06:06:51Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-01-08T06:06:59Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2019

k8s-ci-robot closed this as completed Jan 8, 2020

emilyaherbert mentioned this issue Aug 21, 2020

Nginx can send a 502 between dispatcher rolling deployment updates emilyaherbert/containerless#98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGINX 502 Bad Gateway when using a single replication #4375

NGINX 502 Bad Gateway when using a single replication #4375

BuddhiWathsala commented Jul 29, 2019 •

edited

Loading

dcherniv commented Jul 31, 2019

BuddhiWathsala commented Aug 1, 2019

dcherniv commented Aug 1, 2019

BuddhiWathsala commented Aug 2, 2019

nic-6443 commented Aug 11, 2019 •

edited

Loading

fejta-bot commented Nov 9, 2019

fejta-bot commented Dec 9, 2019

fejta-bot commented Jan 8, 2020

k8s-ci-robot commented Jan 8, 2020

NGINX 502 Bad Gateway when using a single replication #4375

NGINX 502 Bad Gateway when using a single replication #4375

Comments

BuddhiWathsala commented Jul 29, 2019 • edited Loading

dcherniv commented Jul 31, 2019

BuddhiWathsala commented Aug 1, 2019

dcherniv commented Aug 1, 2019

BuddhiWathsala commented Aug 2, 2019

nic-6443 commented Aug 11, 2019 • edited Loading

fejta-bot commented Nov 9, 2019

fejta-bot commented Dec 9, 2019

fejta-bot commented Jan 8, 2020

k8s-ci-robot commented Jan 8, 2020

BuddhiWathsala commented Jul 29, 2019 •

edited

Loading

nic-6443 commented Aug 11, 2019 •

edited

Loading