-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jsonnet: set GOMAXPROCS on ingesters #9273
Conversation
We've been experimenting with GOMAXPROCS on ingesters at GL for the past few months. The test included all production cells and was comparing ingester-zone-a and zone-b (with GOMAXPROCS) to zone-c (without GOMAXPROCS) This is the summary of results: > * Push & Query outlier pods are significantly better with GOMAXPROCS. This makes the cell more resilient to bad nodes and to inefficient NUMA node assignment; this was the goal of the > * QueryStream performance under sudden load is worse > * CPU usage is generally slightly better. Reduced available cores via GOMAXPROCS explains this > * heap might appear slightly increased. However, reduced CPU time (due to fewer cores, see point above) for garbage collection can explain why peak heap can be left to rise before GC kicks in. This is generally safe and should not lead to OOMs for example > > To me GOMAXPROCS is a tradeoff between lower stable-state resource usage and resilience to outlier infrastructure **and** being able to handle sudden usage peaks without increasing latency. With the help of query-scheduler scheduling improvements (prioritizing queries that wouldn't touch slowed-down components) and autoscaling (more frequent ingester vertical autoscaling, querier autoscaling) I think we can accept the increased QueryStream latency. Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
operations/mimir/ingester.libsonnet
Outdated
// double the requests, but with headroom of at least 3 and at most 6 CPU | ||
// too little headroom can lead to throttling | ||
// too much headroom can lead to setting the limit above the available CPU cores | ||
requests + std.max(3, std.min(6, requests)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this potentially set it to more than what is available for the host/container? E.g. if a VM has only 4 cores, and there is only one container running on the host (kinda line in AWS fargate), we set maxprocs to 7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good point 🙂 And Peter already thought about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know that 👍
Maybe it's worth it to link to clamping PR from the comment, to make sure this doesn't trigger anyone later
// double the requests, but with headroom of at least 3 and at most 6 CPU | |
// too little headroom can lead to throttling | |
// too much headroom can lead to setting the limit above the available CPU cores | |
requests + std.max(3, std.min(6, requests)), | |
// double the requests, but with headroom of at least 3 and at most 6 CPU | |
// too little headroom can lead to throttling | |
// too much headroom can lead to setting the limit above the available CPU cores | |
// if the value exceeds the number of available CPU cores on the host, mimir caps the maxprocs on application start (ref grafana/mimir#8201) | |
requests + std.max(3, std.min(6, requests)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 works for me, although I'd expand the "headroom" comment.
Also, CI seems to be unhappy.
Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Dimitar Dimitrov <[email protected]>
i blindly copied from the internal jsonnet, but I realized it's wrong. Corrected now |
Signed-off-by: Dimitar Dimitrov <[email protected]>
What this PR does
We've been experimenting with GOMAXPROCS on ingesters at GL for the past few months. The test included all production cells and was comparing ingester-zone-a and zone-b (with GOMAXPROCS) to zone-c (without GOMAXPROCS)
This is the summary of results:
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.