Skip to content

Commit

Permalink
Merge pull request #321 from grafana/lower-ingester-restarts-severity
Browse files Browse the repository at this point in the history
Lower CortexIngesterRestarts severity
  • Loading branch information
pracucci authored Jun 8, 2021
2 parents bf9729e + 2624c08 commit e7cbfe4
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* [CHANGE] `namespace` template variable in dashboards now only selects namespaces for selected clusters. #311
* [CHANGE] Alertmanager: mounted overrides configmap to alertmanager too. #315
* [CHANGE] Memcached: upgraded memcached from `1.5.17` to `1.6.9`. #316
* [CHANGE] `CortexIngesterRestarts` alert severity changed from `critical` to `warning`. #321
* [CHANGE] Store-gateway: increased memory request and limit respectively from 6GB / 6GB to 12GB / 18GB. #322
* [CHANGE] Store-gateway: increased `-blocks-storage.bucket-store.max-chunk-pool-bytes` from 2GB (default) to 12GB. #322
* [ENHANCEMENT] cortex-mixin: Make `cluster_namespace_deployment:kube_pod_container_resource_requests_{cpu_cores,memory_bytes}:sum` backwards compatible with `kube-state-metrics` v2.0.0. #317
Expand Down
7 changes: 5 additions & 2 deletions cortex-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -198,10 +198,13 @@
{
alert: 'CortexIngesterRestarts',
expr: |||
changes(process_start_time_seconds{job=~".+(cortex|ingester.*)"}[30m]) > 1
changes(process_start_time_seconds{job=~".+(cortex|ingester.*)"}[30m]) >= 2
|||,
labels: {
severity: 'critical',
// This alert is on a cause not symptom. A couple of ingesters restarts may be suspicious but
// not necessarily an issue (eg. may happen because of the K8S node autoscaler), so we're
// keeping the alert as warning as a signal in case of an outage.
severity: 'warning',
},
annotations: {
message: '{{ $labels.job }}/{{ $labels.instance }} has restarted {{ printf "%.2f" $value }} times in the last 30 mins.',
Expand Down

0 comments on commit e7cbfe4

Please sign in to comment.