Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure grpc keepalive on all clients #3431

Merged

Conversation

pracucci
Copy link
Contributor

What this PR does:
We're not enabling gRPC keepalive for inter-service gRPC connections 😱 I guess it's time to do it. A sane hardcoded default should be fine for all cases. The gRPC storage plugin also had a timeout of 10m which looks a mistake.

New gRPC clients introduced for the query-scheduler (#3374) will benefit this too.

Which issue(s) this PR fixes:
N/A

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pstibrany pstibrany merged commit 7fe8ef6 into cortexproject:master Nov 2, 2020
@pracucci pracucci deleted the ensure-grpc-keepalive-on-all-clients branch November 2, 2020 07:48
@pstibrany
Copy link
Contributor

pstibrany commented Nov 18, 2020

These settings are too low, and will cause early disconnect of clients on idle connections.

Cortex gRPC server uses 5 minutes as minimum time between pings. If client sends pings more often, it gets "ping strike". When client is greater than 2 "ping strikes", it is disconnected. With 20s keepalive ping on idle connection, this happens after 1 minute.

weaveworks/common#205 will allow to configure lower settings on the gRPC server.

@pstibrany
Copy link
Contributor

Ping strikes are implemented in

, with hard-coded max value of 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants