-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added playbook for CortexAllocatingTooMuchMemory #345
Conversation
Signed-off-by: Marco Pracucci <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice improvements.
cortex-mixin/alerts/alerts.libsonnet
Outdated
@@ -479,7 +479,7 @@ | |||
}, | |||
annotations: { | |||
message: ||| | |||
High QPS for ingesters, add more ingesters. | |||
Ingesters in {{ $labels.namespace }} have an high samples/sec rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative: Ingesters in {{ $labels.namespace }} ingest too many samples per second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better!
cortex-mixin/docs/playbooks.md
Outdated
- Cortex ingesters are a stateful service | ||
- Having 2+ ingesters `OOMKilled` may cause a cluster outage | ||
- Ingester memory baseline usage is primarily influenced by memory allocated by the process (mostly go heap) and mmap-ed files (used by TSDB) | ||
- Ingester memory short spikes are primarily influenced by queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when cutting new blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
cortex-mixin/docs/playbooks.md
Outdated
- Having 2+ ingesters `OOMKilled` may cause a cluster outage | ||
- Ingester memory baseline usage is primarily influenced by memory allocated by the process (mostly go heap) and mmap-ed files (used by TSDB) | ||
- Ingester memory short spikes are primarily influenced by queries | ||
- A pod gets `OOMKilled` once it's working set memory reaches the configured limit, so it's important to prevent ingesters memory utilization (working set memory) from getting close to the limit (we need to keep at least 30% room for spikes due to queries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it's working set" -> "its working set"
cortex-mixin/docs/playbooks.md
Outdated
``` | ||
kubectl -n <namespace> delete pod ingester-XXX | ||
``` | ||
- Restarting an ingester typically reduces the memory allocated by mmap-ed files. Such memory could be reallocated again, but may let you gain more time while working on a longer term solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Restarting an ingester typically reduces the memory allocated by mmap-ed files. Such memory could be reallocated again, but may let you gain more time while working on a longer term solution | |
- Restarting an ingester typically reduces the memory allocated by mmap-ed files. After the restart, ingester may allocate this memory again over time, but it may give more time while working on a longer term solution |
Signed-off-by: Marco Pracucci <[email protected]>
Thanks @pstibrany for your valuable feedback! Applied all changes. |
…or-CortexAllocatingTooMuchMemory Added playbook for CortexAllocatingTooMuchMemory
What this PR does:
Added playbook for CortexAllocatingTooMuchMemory. I've also changed a bit the CortexAllocatingTooMuchMemory and CortexProvisioningTooManyWrites messages.
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]