-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingester: reduce default -server.grpc-max-concurrent-streams to 500 #5666
ingester: reduce default -server.grpc-max-concurrent-streams to 500 #5666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to follow the usual workflow which is:
- First, test the change at Grafana Labs via a CLI flag override
- If everything is fine in production and we're happy with it, then upstream the default config change
The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase |
benchstat 100-max-concurrent-streams.txt 10000-max-concurrent-streams.txt 100-max-concurrent-streams.txt:4: missing iteration count 10000-max-concurrent-streams.txt:4: missing iteration count goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/ingester/client │ 100-max-concurrent-streams.txt │ 10000-max-concurrent-streams.txt │ │ sec/op │ sec/op vs base │ IngesterClient_ConcurrentStreams-10 5.909 ± 14% 6.066 ± 13% ~ (p=0.529 n=10) │ 100-max-concurrent-streams.txt │ 10000-max-concurrent-streams.txt │ │ max-heap │ max-heap vs base │ IngesterClient_ConcurrentStreams-10 1.942G ± 4% 2.072G ± 1% +6.69% (p=0.001 n=10) Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
fc665cd
to
a9e395d
Compare
this is again ready for a review. I changed the value from 100 to 500, since that's what we estimated as an upper bound to the number of gRPC streams that an ingester might have to handle from distributors. We've been running this for a few weeks at grafana labs and don't see any problems related to latency, so it should be safe to make this change for everyone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -28,6 +28,7 @@ Entries should include a reference to the Pull Request that introduced the chang | |||
|
|||
## main / unreleased | |||
|
|||
* [CHANGE] Reduce `-server.grpc-max-concurrent-streams` from 1000 to 500 for ingester and to 100 for all components. #5666 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and to 100 for all components
Is this changed in this PR? I can't se it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is because I removed the parameter in the YAML config and set it in the ingester statefulSet. The YAML config used to say 1000
and be used on all components. The CLI args in the ingester statefulSet only apply to the ingester
Context
-server.grpc-max-concurrent-streams
controls the number of concurrent streams that a single client connection can serve, but doesn't affect the total number of connections that the ingester can serve. Currently each distributor holds only a single open client connection to each ingester.What this PR does
This PR sets the value of
-server.grpc-max-concurrent-streams
to 500. We have been running with this configuration at GL for a few weeks and haven't seen negative effects. The positive effects you can see in the benchmark below.Why?
The reason for reducing the value is that having a high value for max-concurrent-streams may overwhelm the ingester due to too many inflight streams. It can lead to an out-of-memory error in extreme cases.
For example with 200 distributors and 10,000 streams per client distributor, one ingester could be serving 2,000,000 gRPC requests concurrently. gRPC requests come with overhead and this concurrency may be enough to run the ingester out of memory.
This effectively pushes back the impact of an influx of requests to the distributor. This should be a safer option to handling that in the ingester since scaling distributors is faster and cheaper, and since a crashing distributor recover much more quickly than a crashing ingester.
Benchmark
Details
I ran a benchmark simulating a single distributor client and a single ingester. The distributor tries to send 100,000 requests concurrently. The ingester implementation immediately rejects each request and logs a line to stdout (using go-kit/log). I ran the benchmark with two settings for the ingester max-concurrent-streams: 100 and 10,000. Instead of recording the heap of the process as I recorded the RSS, which also includes the memory overhead of goroutines.
The RSS of the ingester process was 3x with the higher concurrency. Surprisingly, the latency to process 100K requests was >3x with increased concurrency. I suspect the reduction in latency is due to reduced contention and context-switching, although I wasn't able to verify that.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]