Skip to content

Commit

Permalink
Relaxed the hash ring heartbeat period and timeout for distributor, i…
Browse files Browse the repository at this point in the history
…ngester, store-gateway and compactor.

These are values which help reduce the pressure on a KV store or reduce the CPU spent by memberlist in passing messages.
The tradeoff is that abrupt shutdowns/crashes of components will take longer to detect by peers.
We've been running with these values at Grafana Labs for some time and haven't seen problems.

Signed-off-by: Dimitar Dimitrov <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
  • Loading branch information
dimitarvdimitrov authored and pracucci committed Jan 17, 2024
1 parent ac3cb58 commit a677f2f
Show file tree
Hide file tree
Showing 79 changed files with 766 additions and 5 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,14 @@
* [CHANGE] rollout-operator: remove default CPU limit. #7066
* [CHANGE] Store-gateway: Increase `JAEGER_REPORTER_MAX_QUEUE_SIZE` from the default (100) to 1000, to avoid dropping tracing spans. #7068
* [CHANGE] Query-frontend, ingester, ruler, backend and write instances: Increase `JAEGER_REPORTER_MAX_QUEUE_SIZE` from the default (100), to avoid dropping tracing spans. #7086
* [CHANGE] Ring: relaxed the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor: #6860
* `-distributor.ring.heartbeat-period` set to `1m`
* `-distributor.ring.heartbeat-timeout` set to `4m`
* `-ingester.ring.heartbeat-period` set to `2m`
* `-store-gateway.sharding-ring.heartbeat-period` set to `1m`
* `-store-gateway.sharding-ring.heartbeat-timeout` set to `4m`
* `-compactor.ring.heartbeat-period` set to `1m`
* `-compactor.ring.heartbeat-timeout` set to `4m`
* [FEATURE] Added support for the following root-level settings to configure the list of matchers to apply to node affinity: #6782 #6829
* `alertmanager_node_affinity_matchers`
* `compactor_node_affinity_matchers`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ config:
ring:
instance_availability_zone:
num_tokens:
heartbeat_timeout:
unregister_on_shutdown:
distributor:
ha_tracker:
Expand Down
9 changes: 9 additions & 0 deletions operations/helm/charts/mimir-distributed/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,15 @@ Entries should include a reference to the Pull Request that introduced the chang
## main / unreleased

* [CHANGE] Rollout-operator: remove default CPU limit. #7125
* [CHANGE] Ring: relaxed the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor: #6860
* `-distributor.ring.heartbeat-period` set to `1m`
* `-distributor.ring.heartbeat-timeout` set to `4m`
* `-ingester.ring.heartbeat-period` set to `2m`
* `-ingester.ring.heartbeat-timeout` set to `10m`
* `-store-gateway.sharding-ring.heartbeat-period` set to `1m`
* `-store-gateway.sharding-ring.heartbeat-timeout` set to `4m`
* `-compactor.ring.heartbeat-period` set to `1m`
* `-compactor.ring.heartbeat-timeout` set to `4m`
* [ENHANCEMENT] Add `jaegerReporterMaxQueueSize` Helm value for all components where configuring `JAEGER_REPORTER_MAX_QUEUE_SIZE` makes sense, and override the Jaeger client's default value of 100 for components expected to generate many trace spans. #7068 #7086
* [ENHANCEMENT] Rollout-operator: upgraded to v0.10.1. #7125
* [ENHANCEMENT] Query-frontend: configured `-shutdown-delay`, `-server.grpc.keepalive.max-connection-age` and termination grace period to reduce the likelihood of queries hitting terminated query-frontends. #7129
Expand Down
11 changes: 11 additions & 0 deletions operations/helm/charts/mimir-distributed/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,13 @@ mimir:
data_dir: "/data"
sharding_ring:
wait_stability_min_duration: 1m
heartbeat_period: 1m
heartbeat_timeout: 4m
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
Expand Down Expand Up @@ -292,6 +299,8 @@ mimir:
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
heartbeat_period: 2m
heartbeat_timeout: 10m
{{- if .Values.ingester.zoneAwareReplication.enabled }}
zone_awareness_enabled: true
{{- end }}
Expand Down Expand Up @@ -376,6 +385,8 @@ mimir:
{{- if .Values.store_gateway.zoneAwareReplication.enabled }}
kvstore:
prefix: multi-zone/
heartbeat_period: 1m
heartbeat_timeout: 4m
{{- end }}
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
cache_results: true
grpc_client_config:
Expand Down Expand Up @@ -192,6 +198,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
kvstore:
store: memberlist
num_tokens: 512
Expand Down Expand Up @@ -292,6 +300,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: gateway-enterprise-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand Down Expand Up @@ -99,6 +105,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down Expand Up @@ -147,6 +155,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: gateway-nginx-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand All @@ -61,6 +67,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down Expand Up @@ -103,6 +111,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: graphite-enabled-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand Down Expand Up @@ -123,6 +129,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
cache_results: true
parallelize_shardable_queries: true
Expand All @@ -74,6 +80,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down Expand Up @@ -114,6 +122,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: metamonitoring-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand All @@ -61,6 +67,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: openshift-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand Down Expand Up @@ -77,6 +83,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down Expand Up @@ -117,6 +125,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: scheduler-name-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand All @@ -46,6 +52,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down Expand Up @@ -80,6 +88,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
cache_results: true
parallelize_shardable_queries: true
Expand All @@ -74,6 +80,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down Expand Up @@ -114,6 +122,8 @@ data:
store_gateway:
sharding_ring:
kvstore:
heartbeat_period: 1m
heartbeat_timeout: 4m
prefix: multi-zone/
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,14 @@ data:
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
wait_stability_min_duration: 1m
symbols_flushers_concurrency: 4
distributor:
sharding_ring:
heartbeat_period: 1m
heartbeat_timeout: 4m
frontend:
parallelize_shardable_queries: true
scheduler_address: test-enterprise-configmap-values-mimir-query-scheduler-headless.citestns.svc:9095
Expand Down Expand Up @@ -99,6 +105,8 @@ data:
ingester:
ring:
final_sleep: 0s
heartbeat_period: 2m
heartbeat_timeout: 10m
num_tokens: 512
tokens_file_path: /data/tokens
unregister_on_shutdown: false
Expand Down
Loading

0 comments on commit a677f2f

Please sign in to comment.