Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay needed before calling asyncServerStop()? #31

Closed
alanchristensen opened this issue Jan 31, 2018 · 7 comments
Closed

Delay needed before calling asyncServerStop()? #31

alanchristensen opened this issue Jan 31, 2018 · 7 comments
Labels

Comments

@alanchristensen
Copy link

alanchristensen commented Jan 31, 2018

I tried to use terminus for some projects at work but during an application update some requests were getting dropped.

I was able to fix this by adding a delay before calling asyncServerStop().

Have you guys seen this as well?

I've created https://github.com/alanchristensen/kubeplayground to help you reproduce what I'm seeing.

Once you got the app running, throw some load at it using wrk:

wrk -d 2m http://localhost:8080/ -H 'Connection: Close'

and then run the update script to rebuild the image and deploy it.

In wrk I then see:

Running 2m test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   206.54ms    6.36ms 270.10ms   91.07%
    Req/Sec    23.85      9.68    50.00     64.11%
  1378 requests in 2.00m, 199.16KB read
  Socket errors: connect 0, read 14, write 0, timeout 0
Requests/sec:     11.47
Transfer/sec:      1.66KB

The 14 read errors are requests that get dropped. I think what's going on is that Kubernetes will continue to send traffic to a container for a short bit even after it sends the container SIGTERM.

@gergelyke
Copy link
Collaborator

gergelyke commented Feb 1, 2018

Thanks a lot for the report Alan!

@alanchristensen
Copy link
Author

@gergelyke I've pushed a change to my kubeplayground repo to fix a logging statement.

I can eliminate those read errors by changing the process.on(signal, cleanup); to process.on(signal, () => { setTimeout(cleanup, 10000); });

@gergelyke gergelyke added the bug label Feb 9, 2018
@JacopoDaeli
Copy link
Contributor

I am looking into this @alanchristensen @gergelyke.

@gergelyke
Copy link
Collaborator

@JacopoDaeli do you have any update on this one?

@JacopoDaeli
Copy link
Contributor

@gergelyke, unfortunately, I did not have time yet to look into this. But I will look into this asap.

@peterkuiper
Copy link
Contributor

peterkuiper commented Jun 8, 2018

There is a long discussion here. This "lag" is caused by kube-proxy. So, your pod is shutting down and is still receiving traffic.

There is no synchronization between the SIGTERM handler and the health check. If I understand correctly, your health check should fail if you receive a SIGTERM signal and should wait a little bit before shutting down. This way, traffic that came into your pod after the SIGTERM will still be handled, see this comment.

Some people are using preStop handlers in their k8s config which, I think, is basically what you are doing with the setTimeout. This is a bit "hacky" but it seems to work.

A solution could be to add a flag isShutdown or whatever and do something like this in your health check:

const terminus = require('@godaddy/terminus');
...

if (terminus.isShutdown) {
  // fail health check
}

@gergelyke
Copy link
Collaborator

ah, thanks a lot @peterkuiper for this!

I'll put together a PR addressing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants