Skip to content

Commit

Permalink
Krajo/merge from main to sparsehistograms (#4390)
Browse files Browse the repository at this point in the history
* Helm: nginx HPA and tests kubeversion fixes (#4299)

* Helm: fix Kubernetes override for nginx HPA

The template did not take into account the override "kubeVersionOverride".
Fix by using the mimir template implemented for this reason.

Signed-off-by: György Krajcsovits <[email protected]>

* Helm: fix missing Kubernetes version overrides in tests

The golden record tests need a fixed version because helm uses the version
of the default context and can produce different results between
contributor's machine and the CI environment.

Add logic to test build to inject the minimal version if not found in the
values file. Mainly because we cannot have a version override in the
small and large values files.

Signed-off-by: György Krajcsovits <[email protected]>
Co-authored-by: Jon Kartago Lamida <[email protected]>

* Ruler: load more tenants in parallel during startup (#4258)

* Ruler: load more tenants in parallel during startup

* add more tests

* fix lint

* Apply suggestions from code review

Co-authored-by: Marco Pracucci <[email protected]>

* Ingester: fix OOO blocks labelling (#4297)

* Ingester: fix OOO blocks labelling

This fixes a bug where the OutOfOrderExternalLabel
was being added to all blocks instead of the ones coming
from OOO data, when the feature flag was enabled.

* Changelog

* PR number to changelog

* Update previous changelog entry instead

* Ruler: load more tenants in parallel during startup

* fix context

* improve unittest

---------

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Nicolás Pazos <[email protected]>

* Change language to match the math. (#4356)

* Upgrade mimir-prometheus to get a fast regexp path optimization (#4357)

* Upgrade mimir-prometheus to get a fast regexp path optimization

Signed-off-by: Marco Pracucci <[email protected]>

* Added CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

---------

Signed-off-by: Marco Pracucci <[email protected]>

* Fix typo in the docs URL for migrating from Cortex (#4358)

* Remove forced paragraph break. (#4359)

* Bump actions/setup-go to v3 to resolve Node.js 12 deprecation warning. (#4361)

* Improve flaky `TestIngesterWithShippingDisabledDeletesBlocksOnlyAfterRetentionExpires` (#4362)

* Use more specific assertion to include more information in test failures.

See #4198.

* Reduce flakiness of test by extending retention period.

This gives the rest of the test more time to retrieve `oldBlocks`
before any of the blocks is removed.

* Add asynchronous validation scaffolding for block upload (#3411)

* Add asynchronous validation scaffolding for block upload

* addressed lint errors

* Update pkg/compactor/block_upload.go

Co-authored-by: Arve Knudsen <[email protected]>

* Update pkg/compactor/block_upload.go

Co-authored-by: Arve Knudsen <[email protected]>

* Update pkg/compactor/block_upload.go

Co-authored-by: Arve Knudsen <[email protected]>

* Update pkg/compactor/block_upload.go

Co-authored-by: Arve Knudsen <[email protected]>

* Update pkg/compactor/block_upload.go

Co-authored-by: Arve Knudsen <[email protected]>

* Update pkg/compactor/block_upload.go

Co-authored-by: Arve Knudsen <[email protected]>

* enable block upload for dev testing

* fixed validation errors, added debug log messages

* fixed cancelled context issue

* changed name of flag to disable complete block upload

* addressed reviewer feedback

* addressed reviewer feedback

* Address some review comments, WIP

* Small spacing cleanup

* Transition to in-memory bucket for block finish test

* Async validation test coordination, adding configuration flags

* Small comment and flag fix

* Swap config strategy, test still needs separation

* Docs + lint

* Review comments, begin separating tests

* Finish validateAndComplete test

* Update docs

* Remove docker compose arguments

* Regenerate rather than modify

* Review, add test for periodicValidationUpdater

* Make validateAndComplete test clearer, add upload meta check

* Set missing cancelContext

* Add sleep as suggested

* Configure data directory

* fixed compactor data dir in e2e test

* Add changelog entry

* Split into two entries

* Missing entry number

* Update CHANGELOG.md

---------

Co-authored-by: Arve Knudsen <[email protected]>
Co-authored-by: Andy Asp <[email protected]>
Co-authored-by: Andy Asp <[email protected]>

* Jsonnet: honor the minimum shard size configured (#4363)

Signed-off-by: Marco Pracucci <[email protected]>

* [CHANGE] Ruler: set default `evaluation-delay-duration` to 1m (#4250)

* change the default evaluation delay of ruler to 1m

* revert the changes in ruler test

* change integration test to set ruler default value

* fix integration tests

* Update integration/configs.go

Co-authored-by: Marco Pracucci <[email protected]>

* Update CHANGELOG.md

Co-authored-by: Peter Štibraný <[email protected]>

---------

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]>

* [Chore] Update jsonnet manifest create query frontend discovery only when it is necessary (#4353)

* [Chore] update jsonnet manifest, avoid setting querier.frontend-address or create query-frontend-discovery when deployement mode is microserivces or query-scheduler is enabled

* linter and changelog

* Helm: fix parity with jsonnet on query frontend headless service

Do not generate query-frontend-headless service if query scheduler
 is enabled

Signed-off-by: György Krajcsovits <[email protected]>

* Apply suggestions from code review

Co-authored-by: Marco Pracucci <[email protected]>

* correct changelog

* regenerate helm golden files

* Update CHANGELOG.md

---------

Signed-off-by: György Krajcsovits <[email protected]>
Co-authored-by: György Krajcsovits <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>

* Remove block validation mimirtool changelog entry (#4369)

* Spread TSDB head compaction over the configured interval (#4364)

* Spread TSDB head compaction over the configured interval

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed unit test

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestion from code review

Signed-off-by: Marco Pracucci <[email protected]>

* Fix typo in CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

* Fix typo in CHANGELOG entry

Signed-off-by: Marco Pracucci <[email protected]>

---------

Signed-off-by: Marco Pracucci <[email protected]>

* Fix port number values. (#4368)

* Ruler: change deployment max surge and max unavailable to reduce ownership spillover (#4381)

* Ruler: change deployment max surge and max unavailable to reduce ownership spillover

Signed-off-by: Marco Pracucci <[email protected]>

* Apply suggestions from code review

Co-authored-by: Dimitar Dimitrov <[email protected]>

---------

Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Dimitar Dimitrov <[email protected]>

* Move "Note:" about cross-zone costs to "Costs" (#4370)

This note was in an unrelated section.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Change default -blocks-storage.tsdb.retention-period from 24h to 13h (#4382)

Signed-off-by: Marco Pracucci <[email protected]>

* Support histograms in pkg/storage and update other breakages (#4354)

* Support histograms in pkg/storage and update other breakages

Signed-off-by: Ganesh Vernekar <[email protected]>

---------

Signed-off-by: György Krajcsovits <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Oleg Zaytsev <[email protected]>
Signed-off-by: Ganesh Vernekar <[email protected]>
Co-authored-by: Jon Kartago Lamida <[email protected]>
Co-authored-by: ying-jeanne <[email protected]>
Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Nicolás Pazos <[email protected]>
Co-authored-by: Ursula Kallio <[email protected]>
Co-authored-by: l3ioo <[email protected]>
Co-authored-by: Charles Korn <[email protected]>
Co-authored-by: Vernon Miller <[email protected]>
Co-authored-by: Arve Knudsen <[email protected]>
Co-authored-by: Andy Asp <[email protected]>
Co-authored-by: Andy Asp <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Dimitar Dimitrov <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>
Co-authored-by: Ganesh Vernekar <[email protected]>
  • Loading branch information
16 people authored Mar 6, 2023
1 parent 4b43206 commit e4a37f9
Show file tree
Hide file tree
Showing 118 changed files with 1,189 additions and 1,494 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test-build-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ jobs:
test_group_total: [4]
steps:
- name: Upgrade golang
uses: actions/setup-go@v2
uses: actions/setup-go@v3
with:
go-version: 1.20.1
- name: Check out repository
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

### Grafana Mimir

* [CHANGE] Ingester: changed default value of `-blocks-storage.tsdb.retention-period` from `24h` to `13h`. If you're running Mimir with a custom configuration and you're overriding `-querier.query-store-after` to a value greater than the default `12h` then you should increase `-blocks-storage.tsdb.retention-period` accordingly. #4382
* [CHANGE] Ruler: changed default value of `-ruler.evaluation-delay-duration` option from 0 to 1m. #4250
* [CHANGE] Querier: Errors with status code `422` coming from the store-gateway are propagated and not converted to the consistency check error anymore. #4100
* [CHANGE] Store-gateway: When a query hits `max_fetched_chunks_per_query` and `max_fetched_series_per_query` limits, an error with the status code `422` is created and returned. #4056
* [CHANGE] Packaging: Migrate FPM packaging solution to NFPM. Rationalize packages dependencies and add package for all binaries. #3911
Expand Down Expand Up @@ -34,6 +36,7 @@ Querying with using `{__mimir_storage__="ephemeral"}` selector no longer works.
* [FEATURE] Query-frontend: Introduce experimental `-query-frontend.query-sharding-target-series-per-shard` to allow query sharding to take into account cardinality of similar requests executed previously. This feature uses the same cache that's used for results caching. #4121 #4177 #4188 #4254
* [ENHANCEMENT] Go: update go to 1.20.1. #4266
* [ENHANCEMENT] Ingester: added `out_of_order_blocks_external_label_enabled` shipper option to label out-of-order blocks before shipping them to cloud storage. #4182 #4297
* [ENHANCEMENT] Ruler: introduced concurrency when loading per-tenant rules configuration. This improvement is expected to speed up the ruler start up time in a Mimir cluster with a large number of tenants. #4258
* [ENHANCEMENT] Compactor: Add `reason` label to `cortex_compactor_runs_failed_total`. The value can be `shutdown` or `error`. #4012
* [ENHANCEMENT] Store-gateway: enforce `max_fetched_series_per_query`. #4056
* [ENHANCEMENT] Docs: use long flag names in runbook commands. #4088
Expand All @@ -51,6 +54,10 @@ Querying with using `{__mimir_storage__="ephemeral"}` selector no longer works.
* [ENHANCEMENT] Store-gateway: add a `stage` label to the metrics `cortex_bucket_store_series_data_fetched`, `cortex_bucket_store_series_data_size_fetched_bytes`, `cortex_bucket_store_series_data_touched`, `cortex_bucket_store_series_data_size_touched_bytes`. This label only applies to `data_type="chunks"`. For `fetched` metrics with `data_type="chunks"` the `stage` label has 2 values: `fetched` - the chunks or bytes that were fetched from the cache or the object store, `refetched` - the chunks or bytes that had to be refetched from the cache or the object store because their size was underestimated during the first fetch. For `touched` metrics with `data_type="chunks"` the `stage` label has 2 values: `processed` - the chunks or bytes that were read from the fetched chunks or bytes and were processed in memory, `returned` - the chunks or bytes that were selected from the processed bytes to satisfy the query. #4227 #4316
* [ENHANCEMENT] Compactor: improve the partial block check related to `compactor.partial-block-deletion-delay` to potentially issue less requests to object storage. #4246
* [ENHANCEMENT] Memcached: added `-*.memcached.min-idle-connections-headroom-percentage` support to configure the minimum number of idle connections to keep open as a percentage (0-100) of the number of recently used idle connections. This feature is disabled when set to a negative value (default), which means idle connections are kept open indefinitely. #4249
* [ENHANCEMENT] Querier and store-gateway: optimized regular expression label matchers with case insensitive alternate operator. #4340 #4357
* [ENHANCEMENT] Compactor: added the experimental flag `-compactor.block-upload.block-validation-enabled` with the default `true` to configure whether block validation occurs on backfilled blocks. #3411
* [ENHANCEMENT] Ingester: apply a jitter to the first TSDB head compaction interval configured via `-blocks-storage.tsdb.head-compaction-interval`. Subsequent checks will happen at the configured interval. This should help to spread the TSDB head compaction among different ingesters over the configured interval. #4364
* [ENHANCEMENT] Ingester: the maximum accepted value for `-blocks-storage.tsdb.head-compaction-interval` has been increased from 5m to 15m. #4364
* [BUGFIX] Ingester: remove series from ephemeral storage even if there are no persistent series. #4052
* [BUGFIX] Store-gateway: return `Canceled` rather than `Aborted` or `Internal` error when the calling querier cancels a label names or values request, and return `Internal` if processing the request fails for another reason. #4061
* [BUGFIX] Ingester: reuse memory when ingesting ephemeral series. #4072
Expand All @@ -77,10 +84,13 @@ Querying with using `{__mimir_storage__="ephemeral"}` selector no longer works.

### Jsonnet

* [CHANGE] Create the `query-frontend-discovery` service only when Mimir is deployed in microservice mode without query-scheduler. #4353
* [CHANGE] Add results cache backend config to `ruler-query-frontend` configuration to allow cache reuse for cardinality-estimation based sharding. #4257
* [CHANGE] Ruler: changed ruler deployment max surge from `0` to `50%`, and max unavailable from `1` to `0`. #4381
* [ENHANCEMENT] Add support for ruler auto-scaling. #4046
* [ENHANCEMENT] Add optional `weight` param to `newQuerierScaledObject` and `newRulerQuerierScaledObject` to allow running multiple querier deployments on different node types. #4141
* [ENHANCEMENT] Add support for query-frontend and ruler-query-frontend auto-scaling. #4199
* [BUGFIX] Shuffle sharding: when applying user class limits, honor the minimum shard size configured in `$._config.shuffle_sharding.*`. #4363

### Mimirtool

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Grafana Mimir is an open source software project that provides a scalable long-t
If you're migrating to Grafana Mimir, refer to the following documents:

- [Migrating from Thanos or Prometheus to Grafana Mimir](https://grafana.com/docs/mimir/latest/migration-guide/migrating-from-thanos-or-prometheus/).
- [Migrating from Cortex to Grafana Mimir](https://grafana.com/docs/mimir/latest/migration-guide/migrating-from-cortex/)
- [Migrating from Cortex to Grafana Mimir](https://grafana.com/docs/mimir/latest/migration-guide/migrate-from-cortex/)

## Deploying Grafana Mimir

Expand Down
27 changes: 24 additions & 3 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3275,7 +3275,7 @@
"required": false,
"desc": "Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldDefaultValue": 60000000000,
"fieldFlag": "ruler.evaluation-delay-duration",
"fieldType": "duration"
},
Expand Down Expand Up @@ -5829,7 +5829,7 @@
"required": false,
"desc": "TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks.",
"fieldValue": null,
"fieldDefaultValue": 86400000000000,
"fieldDefaultValue": 46800000000000,
"fieldFlag": "blocks-storage.tsdb.retention-period",
"fieldType": "duration"
},
Expand Down Expand Up @@ -5859,7 +5859,7 @@
"kind": "field",
"name": "head_compaction_interval",
"required": false,
"desc": "How frequently ingesters try to compact TSDB head. Block is only created if data covers smallest block range. Must be greater than 0 and max 5 minutes.",
"desc": "How frequently the ingester checks whether the TSDB head should be compacted and, if so, triggers the compaction. Mimir applies a jitter to the first check, while subsequent checks will happen at the configured interval. Block is only created if data covers smallest block range. The configured interval must be between 0 and 15 minutes.",
"fieldValue": null,
"fieldDefaultValue": 60000000000,
"fieldFlag": "blocks-storage.tsdb.head-compaction-interval",
Expand Down Expand Up @@ -6703,6 +6703,27 @@
"fieldFlag": "compactor.compaction-jobs-order",
"fieldType": "string",
"fieldCategory": "advanced"
},
{
"kind": "block",
"name": "block_upload",
"required": false,
"desc": "",
"blockEntries": [
{
"kind": "field",
"name": "block_validation_enabled",
"required": false,
"desc": "Validate blocks before finalizing a block upload",
"fieldValue": null,
"fieldDefaultValue": true,
"fieldFlag": "compactor.block-upload.block-validation-enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
}
],
"fieldValue": null,
"fieldDefaultValue": null
}
],
"fieldValue": null,
Expand Down
8 changes: 5 additions & 3 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -504,7 +504,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.tsdb.head-compaction-idle-timeout duration
If TSDB head is idle for this duration, it is compacted. Note that up to 25% jitter is added to the value to avoid ingesters compacting concurrently. 0 means disabled. (default 1h0m0s)
-blocks-storage.tsdb.head-compaction-interval duration
How frequently ingesters try to compact TSDB head. Block is only created if data covers smallest block range. Must be greater than 0 and max 5 minutes. (default 1m0s)
How frequently the ingester checks whether the TSDB head should be compacted and, if so, triggers the compaction. Mimir applies a jitter to the first check, while subsequent checks will happen at the configured interval. Block is only created if data covers smallest block range. The configured interval must be between 0 and 15 minutes. (default 1m0s)
-blocks-storage.tsdb.head-postings-for-matchers-cache-force
[experimental] Force the cache to be used for postings for matchers in the Head and OOOHead, even if it's not a concurrent (query-sharding) call.
-blocks-storage.tsdb.head-postings-for-matchers-cache-size int
Expand All @@ -518,7 +518,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.tsdb.out-of-order-capacity-max int
[experimental] Maximum capacity for out of order chunks, in samples between 1 and 255. (default 32)
-blocks-storage.tsdb.retention-period duration
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 24h0m0s)
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 13h0m0s)
-blocks-storage.tsdb.series-hash-cache-max-size-bytes uint
Max size - in bytes - of the in-memory series hash cache. The cache is shared across all tenants and it's used only when query sharding is enabled. (default 1073741824)
-blocks-storage.tsdb.ship-concurrency int
Expand Down Expand Up @@ -629,6 +629,8 @@ Usage of ./cmd/mimir/mimir:
Number of Go routines to use when downloading blocks for compaction and uploading resulting blocks. (default 8)
-compactor.block-upload-enabled
Enable block upload API for the tenant.
-compactor.block-upload.block-validation-enabled
[experimental] Validate blocks before finalizing a block upload (default true)
-compactor.blocks-retention-period duration
Delete blocks containing samples older than the specified retention period. Also used by query-frontend to avoid querying beyond the retention period. 0 to disable.
-compactor.cleanup-concurrency int
Expand Down Expand Up @@ -1754,7 +1756,7 @@ Usage of ./cmd/mimir/mimir:
-ruler.enabled-tenants comma-separated-list-of-strings
Comma separated list of tenants whose rules this ruler can evaluate. If specified, only these tenants will be handled by ruler, otherwise this ruler can process rules from all tenants. Subject to sharding.
-ruler.evaluation-delay-duration duration
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed.
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed. (default 1m)
-ruler.evaluation-interval duration
How frequently to evaluate rules (default 1m0s)
-ruler.external.url string
Expand Down
4 changes: 2 additions & 2 deletions cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.tsdb.dir string
Directory to store TSDBs (including WAL) in the ingesters. This directory is required to be persisted between restarts. (default "./tsdb/")
-blocks-storage.tsdb.retention-period duration
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 24h0m0s)
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 13h0m0s)
-common.storage.azure.account-key string
Azure storage account key
-common.storage.azure.account-name string
Expand Down Expand Up @@ -528,7 +528,7 @@ Usage of ./cmd/mimir/mimir:
-ruler.enable-api
Enable the ruler config API. (default true)
-ruler.evaluation-delay-duration duration
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed.
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed. (default 1m)
-ruler.external.url string
URL of alerts return path.
-ruler.max-rule-groups-per-tenant int
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ Zone-aware replication in the ingester ensures that Grafana Mimir replicates eac
2. Roll out ingesters so that each ingester replica runs with a configured zone.
3. Set the `-ingester.ring.zone-awareness-enabled=true` CLI flag or its respective YAML configuration parameter for distributors, ingesters, and queriers.

> **Note:** The requests that the distributors receive are usually compressed, and the requests that the distributors send to the ingesters are uncompressed by default.
> This can result in increased cross-zone bandwidth costs (because at least two ingesters will be in different availability zones).
> If this cost is a concern, you can compress those requests by setting the `-ingester.client.grpc-compression` CLI flag, or its respective YAML configuration parameter, to `snappy` or `gzip` in the distributors.
## Configuring store-gateway blocks replication

To enable zone-aware replication for the store-gateways, refer to [Zone awareness]({{< relref "../architecture/components/store-gateway.md#zone-awareness" >}}).
Expand All @@ -70,7 +66,7 @@ With a replication factor of 3, which is the default, deploy the Grafana Mimir c
Deploying Grafana Mimir clusters to more zones than the configured replication factor does not have a negative impact.
Deploying Grafana Mimir clusters to fewer zones than the configured replication factor can cause writes to the replica to be missed, or can cause writes to fail completely.

If there are no more than `floor(replication factor / 2)` zones with failing replicas, reads and writes can withstand zone failures.
If there are fewer than `floor(replication factor / 2)` zones with failing replicas, reads and writes can withstand zone failures.

## Unbalanced zones

Expand All @@ -82,6 +78,10 @@ When replica counts are unbalanced, zones with fewer replicas have higher resour
Most cloud providers charge for inter-availability zone networking.
Deploying Grafana Mimir with zone-aware replication across multiple cloud provider availability zones likely results in additional networking costs.

> **Note:** The requests that the distributors receive are usually compressed, and the requests that the distributors send to the ingesters are uncompressed by default.
> This can result in increased cross-zone bandwidth costs (because at least two ingesters will be in different availability zones).
> If this cost is a concern, you can compress those requests by setting the `-ingester.client.grpc-compression` CLI flag, or its respective YAML configuration parameter, to `snappy` or `gzip` in the distributors.
## Kubernetes operator for simplifying rollouts of zone-aware components

The [Kubernetes Rollout Operator](https://github.com/grafana/rollout-operator) is a Kubernetes operator that makes it easier for you to manage multi-availability-zone rollouts. Consider using the Kubernetes Rollout Operator when you run Grafana Mimir on Kubernetes with zone awareness enabled.
Expand Down
4 changes: 2 additions & 2 deletions docs/sources/mimir/operators-guide/get-started/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ weight: 10

You can get started with Grafana Mimir _imperatively_ or _declaratively_:

- **Imperatively**: The written instructions that follow contain commands to help you start a single Mimir process. You would need to perform the commands again to start another Mimir process.<p>
- **Imperatively**: The written instructions that follow contain commands to help you start a single Mimir process. You would need to perform the commands again to start another Mimir process.
- **Declaratively**: The following video tutorial uses `docker-compose` to deploy multiple Mimir processes. Therefore, if you want to deploy multiple Mimir processes later, the majority of the configuration work will have already been done.

{{< vimeo 691947043 >}}
Expand Down Expand Up @@ -178,7 +178,7 @@ metrics:
In a new terminal, run a local Grafana server using Docker:

```bash
docker run --rm --name=grafana --network=host grafana/grafana
docker run --rm --name=grafana -p 3000:3000 grafana/grafana
```

### Add Grafana Mimir as a Prometheus data source
Expand Down
Loading

0 comments on commit e4a37f9

Please sign in to comment.