Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve probe setup #183

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Improve probe setup #183

wants to merge 1 commit into from

Conversation

totycro
Copy link
Contributor

@totycro totycro commented Feb 12, 2025

Liveness probe is now a TCP probe. This has the effect that if the server is overloaded and does not respond to HTTP requests in time that it will not be killed as long as the server still listens on the TCP port (i.e. the server is actually alive). Therefore it can finish the requests (this shouldn't take forever due to gunicorn request timeouts).

The readiness probe now is an HTTP probe, so if the server is overloaded, it won't receive any new requests.

Some http health endpoints also check related services like the DB. So in case the DB is down, you usually wouldn't want the service to be killed. This is achieved in this setup since only the readiness probe fails.

Instead of initialDelaySeconds, a startup HTTP probe is used which queries the service every second, so that it's marked as available as soon as it's actually available. This is vital for one of our use cases where we scale the service to zero and only activate it on the first request, i.e. users wait for the service to actually become available.

We have used a setup like this successfully, but in other use cases also other configurations could be useful, so this could also be made configurable via values. This setup is probably a reasonable default configuration though.

Liveness probe is now a TCP probe. This has the effect that if the
server is overloaded and does not respond to HTTP requests in time that
it will not be killed as long as the server still listens on the TCP
port (i.e. the server is actually alive). Therefore it can finish the
requests (this shouldn't take forever due to gunicorn request timeouts).

The readiness probe now is an HTTP probe, so if the server is
overloaded, it won't receive any new requests.

Instead of initialDelaySeconds, a startup HTTP probe is used which
queries the service every second, so that it's marked as available as
soon as it's actually available. This is vital for one of our use cases
where we scale the service to zero and only activate it on the first
request, i.e. users wait for the service to actually become available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant