Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Krajo/merge from main to sparsehistograms #4390

Merged
merged 20 commits into from
Mar 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
48c9386
Helm: nginx HPA and tests kubeversion fixes (#4299)
krajorama Mar 2, 2023
2aeb8fb
Ruler: load more tenants in parallel during startup (#4258)
ying-jeanne Mar 2, 2023
68be264
Change language to match the math. (#4356)
osg-grafana Mar 2, 2023
b5b7c3b
Upgrade mimir-prometheus to get a fast regexp path optimization (#4357)
pracucci Mar 2, 2023
6b3e5b8
Fix typo in the docs URL for migrating from Cortex (#4358)
l3ioo Mar 2, 2023
caba95c
Remove forced paragraph break. (#4359)
osg-grafana Mar 3, 2023
ffac574
Bump actions/setup-go to v3 to resolve Node.js 12 deprecation warning…
charleskorn Mar 3, 2023
52b1776
Improve flaky `TestIngesterWithShippingDisabledDeletesBlocksOnlyAfter…
charleskorn Mar 3, 2023
921eb30
Add asynchronous validation scaffolding for block upload (#3411)
aldernero Mar 3, 2023
a98e77c
Jsonnet: honor the minimum shard size configured (#4363)
pracucci Mar 3, 2023
0100cc9
[CHANGE] Ruler: set default `evaluation-delay-duration` to 1m (#4250)
ying-jeanne Mar 3, 2023
ef7c1b3
[Chore] Update jsonnet manifest create query frontend discovery only …
ying-jeanne Mar 3, 2023
d98bc78
Remove block validation mimirtool changelog entry (#4369)
andyasp Mar 3, 2023
6030ca6
Spread TSDB head compaction over the configured interval (#4364)
pracucci Mar 3, 2023
651a8ed
Fix port number values. (#4368)
osg-grafana Mar 3, 2023
fa098e9
Ruler: change deployment max surge and max unavailable to reduce owne…
pracucci Mar 6, 2023
aadf312
Move "Note:" about cross-zone costs to "Costs" (#4370)
colega Mar 6, 2023
35b6661
Change default -blocks-storage.tsdb.retention-period from 24h to 13h …
pracucci Mar 6, 2023
a033254
Support histograms in pkg/storage and update other breakages (#4354)
codesome Mar 6, 2023
5ed7275
Merge branch 'main' into sparsehistogram
krajorama Mar 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test-build-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ jobs:
test_group_total: [4]
steps:
- name: Upgrade golang
uses: actions/setup-go@v2
uses: actions/setup-go@v3
with:
go-version: 1.20.1
- name: Check out repository
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

### Grafana Mimir

* [CHANGE] Ingester: changed default value of `-blocks-storage.tsdb.retention-period` from `24h` to `13h`. If you're running Mimir with a custom configuration and you're overriding `-querier.query-store-after` to a value greater than the default `12h` then you should increase `-blocks-storage.tsdb.retention-period` accordingly. #4382
* [CHANGE] Ruler: changed default value of `-ruler.evaluation-delay-duration` option from 0 to 1m. #4250
* [CHANGE] Querier: Errors with status code `422` coming from the store-gateway are propagated and not converted to the consistency check error anymore. #4100
* [CHANGE] Store-gateway: When a query hits `max_fetched_chunks_per_query` and `max_fetched_series_per_query` limits, an error with the status code `422` is created and returned. #4056
* [CHANGE] Packaging: Migrate FPM packaging solution to NFPM. Rationalize packages dependencies and add package for all binaries. #3911
Expand Down Expand Up @@ -34,6 +36,7 @@ Querying with using `{__mimir_storage__="ephemeral"}` selector no longer works.
* [FEATURE] Query-frontend: Introduce experimental `-query-frontend.query-sharding-target-series-per-shard` to allow query sharding to take into account cardinality of similar requests executed previously. This feature uses the same cache that's used for results caching. #4121 #4177 #4188 #4254
* [ENHANCEMENT] Go: update go to 1.20.1. #4266
* [ENHANCEMENT] Ingester: added `out_of_order_blocks_external_label_enabled` shipper option to label out-of-order blocks before shipping them to cloud storage. #4182 #4297
* [ENHANCEMENT] Ruler: introduced concurrency when loading per-tenant rules configuration. This improvement is expected to speed up the ruler start up time in a Mimir cluster with a large number of tenants. #4258
* [ENHANCEMENT] Compactor: Add `reason` label to `cortex_compactor_runs_failed_total`. The value can be `shutdown` or `error`. #4012
* [ENHANCEMENT] Store-gateway: enforce `max_fetched_series_per_query`. #4056
* [ENHANCEMENT] Docs: use long flag names in runbook commands. #4088
Expand All @@ -51,6 +54,10 @@ Querying with using `{__mimir_storage__="ephemeral"}` selector no longer works.
* [ENHANCEMENT] Store-gateway: add a `stage` label to the metrics `cortex_bucket_store_series_data_fetched`, `cortex_bucket_store_series_data_size_fetched_bytes`, `cortex_bucket_store_series_data_touched`, `cortex_bucket_store_series_data_size_touched_bytes`. This label only applies to `data_type="chunks"`. For `fetched` metrics with `data_type="chunks"` the `stage` label has 2 values: `fetched` - the chunks or bytes that were fetched from the cache or the object store, `refetched` - the chunks or bytes that had to be refetched from the cache or the object store because their size was underestimated during the first fetch. For `touched` metrics with `data_type="chunks"` the `stage` label has 2 values: `processed` - the chunks or bytes that were read from the fetched chunks or bytes and were processed in memory, `returned` - the chunks or bytes that were selected from the processed bytes to satisfy the query. #4227 #4316
* [ENHANCEMENT] Compactor: improve the partial block check related to `compactor.partial-block-deletion-delay` to potentially issue less requests to object storage. #4246
* [ENHANCEMENT] Memcached: added `-*.memcached.min-idle-connections-headroom-percentage` support to configure the minimum number of idle connections to keep open as a percentage (0-100) of the number of recently used idle connections. This feature is disabled when set to a negative value (default), which means idle connections are kept open indefinitely. #4249
* [ENHANCEMENT] Querier and store-gateway: optimized regular expression label matchers with case insensitive alternate operator. #4340 #4357
* [ENHANCEMENT] Compactor: added the experimental flag `-compactor.block-upload.block-validation-enabled` with the default `true` to configure whether block validation occurs on backfilled blocks. #3411
* [ENHANCEMENT] Ingester: apply a jitter to the first TSDB head compaction interval configured via `-blocks-storage.tsdb.head-compaction-interval`. Subsequent checks will happen at the configured interval. This should help to spread the TSDB head compaction among different ingesters over the configured interval. #4364
* [ENHANCEMENT] Ingester: the maximum accepted value for `-blocks-storage.tsdb.head-compaction-interval` has been increased from 5m to 15m. #4364
* [BUGFIX] Ingester: remove series from ephemeral storage even if there are no persistent series. #4052
* [BUGFIX] Store-gateway: return `Canceled` rather than `Aborted` or `Internal` error when the calling querier cancels a label names or values request, and return `Internal` if processing the request fails for another reason. #4061
* [BUGFIX] Ingester: reuse memory when ingesting ephemeral series. #4072
Expand All @@ -77,10 +84,13 @@ Querying with using `{__mimir_storage__="ephemeral"}` selector no longer works.

### Jsonnet

* [CHANGE] Create the `query-frontend-discovery` service only when Mimir is deployed in microservice mode without query-scheduler. #4353
* [CHANGE] Add results cache backend config to `ruler-query-frontend` configuration to allow cache reuse for cardinality-estimation based sharding. #4257
* [CHANGE] Ruler: changed ruler deployment max surge from `0` to `50%`, and max unavailable from `1` to `0`. #4381
* [ENHANCEMENT] Add support for ruler auto-scaling. #4046
* [ENHANCEMENT] Add optional `weight` param to `newQuerierScaledObject` and `newRulerQuerierScaledObject` to allow running multiple querier deployments on different node types. #4141
* [ENHANCEMENT] Add support for query-frontend and ruler-query-frontend auto-scaling. #4199
* [BUGFIX] Shuffle sharding: when applying user class limits, honor the minimum shard size configured in `$._config.shuffle_sharding.*`. #4363

### Mimirtool

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Grafana Mimir is an open source software project that provides a scalable long-t
If you're migrating to Grafana Mimir, refer to the following documents:

- [Migrating from Thanos or Prometheus to Grafana Mimir](https://grafana.com/docs/mimir/latest/migration-guide/migrating-from-thanos-or-prometheus/).
- [Migrating from Cortex to Grafana Mimir](https://grafana.com/docs/mimir/latest/migration-guide/migrating-from-cortex/)
- [Migrating from Cortex to Grafana Mimir](https://grafana.com/docs/mimir/latest/migration-guide/migrate-from-cortex/)

## Deploying Grafana Mimir

Expand Down
27 changes: 24 additions & 3 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3275,7 +3275,7 @@
"required": false,
"desc": "Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldDefaultValue": 60000000000,
"fieldFlag": "ruler.evaluation-delay-duration",
"fieldType": "duration"
},
Expand Down Expand Up @@ -5829,7 +5829,7 @@
"required": false,
"desc": "TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks.",
"fieldValue": null,
"fieldDefaultValue": 86400000000000,
"fieldDefaultValue": 46800000000000,
"fieldFlag": "blocks-storage.tsdb.retention-period",
"fieldType": "duration"
},
Expand Down Expand Up @@ -5859,7 +5859,7 @@
"kind": "field",
"name": "head_compaction_interval",
"required": false,
"desc": "How frequently ingesters try to compact TSDB head. Block is only created if data covers smallest block range. Must be greater than 0 and max 5 minutes.",
"desc": "How frequently the ingester checks whether the TSDB head should be compacted and, if so, triggers the compaction. Mimir applies a jitter to the first check, while subsequent checks will happen at the configured interval. Block is only created if data covers smallest block range. The configured interval must be between 0 and 15 minutes.",
"fieldValue": null,
"fieldDefaultValue": 60000000000,
"fieldFlag": "blocks-storage.tsdb.head-compaction-interval",
Expand Down Expand Up @@ -6703,6 +6703,27 @@
"fieldFlag": "compactor.compaction-jobs-order",
"fieldType": "string",
"fieldCategory": "advanced"
},
{
"kind": "block",
"name": "block_upload",
"required": false,
"desc": "",
"blockEntries": [
{
"kind": "field",
"name": "block_validation_enabled",
"required": false,
"desc": "Validate blocks before finalizing a block upload",
"fieldValue": null,
"fieldDefaultValue": true,
"fieldFlag": "compactor.block-upload.block-validation-enabled",
"fieldType": "boolean",
"fieldCategory": "experimental"
}
],
"fieldValue": null,
"fieldDefaultValue": null
}
],
"fieldValue": null,
Expand Down
8 changes: 5 additions & 3 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -504,7 +504,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.tsdb.head-compaction-idle-timeout duration
If TSDB head is idle for this duration, it is compacted. Note that up to 25% jitter is added to the value to avoid ingesters compacting concurrently. 0 means disabled. (default 1h0m0s)
-blocks-storage.tsdb.head-compaction-interval duration
How frequently ingesters try to compact TSDB head. Block is only created if data covers smallest block range. Must be greater than 0 and max 5 minutes. (default 1m0s)
How frequently the ingester checks whether the TSDB head should be compacted and, if so, triggers the compaction. Mimir applies a jitter to the first check, while subsequent checks will happen at the configured interval. Block is only created if data covers smallest block range. The configured interval must be between 0 and 15 minutes. (default 1m0s)
-blocks-storage.tsdb.head-postings-for-matchers-cache-force
[experimental] Force the cache to be used for postings for matchers in the Head and OOOHead, even if it's not a concurrent (query-sharding) call.
-blocks-storage.tsdb.head-postings-for-matchers-cache-size int
Expand All @@ -518,7 +518,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.tsdb.out-of-order-capacity-max int
[experimental] Maximum capacity for out of order chunks, in samples between 1 and 255. (default 32)
-blocks-storage.tsdb.retention-period duration
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 24h0m0s)
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 13h0m0s)
-blocks-storage.tsdb.series-hash-cache-max-size-bytes uint
Max size - in bytes - of the in-memory series hash cache. The cache is shared across all tenants and it's used only when query sharding is enabled. (default 1073741824)
-blocks-storage.tsdb.ship-concurrency int
Expand Down Expand Up @@ -629,6 +629,8 @@ Usage of ./cmd/mimir/mimir:
Number of Go routines to use when downloading blocks for compaction and uploading resulting blocks. (default 8)
-compactor.block-upload-enabled
Enable block upload API for the tenant.
-compactor.block-upload.block-validation-enabled
[experimental] Validate blocks before finalizing a block upload (default true)
-compactor.blocks-retention-period duration
Delete blocks containing samples older than the specified retention period. Also used by query-frontend to avoid querying beyond the retention period. 0 to disable.
-compactor.cleanup-concurrency int
Expand Down Expand Up @@ -1754,7 +1756,7 @@ Usage of ./cmd/mimir/mimir:
-ruler.enabled-tenants comma-separated-list-of-strings
Comma separated list of tenants whose rules this ruler can evaluate. If specified, only these tenants will be handled by ruler, otherwise this ruler can process rules from all tenants. Subject to sharding.
-ruler.evaluation-delay-duration duration
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed.
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed. (default 1m)
-ruler.evaluation-interval duration
How frequently to evaluate rules (default 1m0s)
-ruler.external.url string
Expand Down
4 changes: 2 additions & 2 deletions cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Usage of ./cmd/mimir/mimir:
-blocks-storage.tsdb.dir string
Directory to store TSDBs (including WAL) in the ingesters. This directory is required to be persisted between restarts. (default "./tsdb/")
-blocks-storage.tsdb.retention-period duration
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 24h0m0s)
TSDB blocks retention in the ingester before a block is removed. If shipping is enabled, the retention will be relative to the time when the block was uploaded to storage. If shipping is disabled then its relative to the creation time of the block. This should be larger than the -blocks-storage.tsdb.block-ranges-period, -querier.query-store-after and large enough to give store-gateways and queriers enough time to discover newly uploaded blocks. (default 13h0m0s)
-common.storage.azure.account-key string
Azure storage account key
-common.storage.azure.account-name string
Expand Down Expand Up @@ -528,7 +528,7 @@ Usage of ./cmd/mimir/mimir:
-ruler.enable-api
Enable the ruler config API. (default true)
-ruler.evaluation-delay-duration duration
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed.
Duration to delay the evaluation of rules to ensure the underlying metrics have been pushed. (default 1m)
-ruler.external.url string
URL of alerts return path.
-ruler.max-rule-groups-per-tenant int
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,6 @@ Zone-aware replication in the ingester ensures that Grafana Mimir replicates eac
2. Roll out ingesters so that each ingester replica runs with a configured zone.
3. Set the `-ingester.ring.zone-awareness-enabled=true` CLI flag or its respective YAML configuration parameter for distributors, ingesters, and queriers.

> **Note:** The requests that the distributors receive are usually compressed, and the requests that the distributors send to the ingesters are uncompressed by default.
> This can result in increased cross-zone bandwidth costs (because at least two ingesters will be in different availability zones).
> If this cost is a concern, you can compress those requests by setting the `-ingester.client.grpc-compression` CLI flag, or its respective YAML configuration parameter, to `snappy` or `gzip` in the distributors.

## Configuring store-gateway blocks replication

To enable zone-aware replication for the store-gateways, refer to [Zone awareness]({{< relref "../architecture/components/store-gateway.md#zone-awareness" >}}).
Expand All @@ -70,7 +66,7 @@ With a replication factor of 3, which is the default, deploy the Grafana Mimir c
Deploying Grafana Mimir clusters to more zones than the configured replication factor does not have a negative impact.
Deploying Grafana Mimir clusters to fewer zones than the configured replication factor can cause writes to the replica to be missed, or can cause writes to fail completely.

If there are no more than `floor(replication factor / 2)` zones with failing replicas, reads and writes can withstand zone failures.
If there are fewer than `floor(replication factor / 2)` zones with failing replicas, reads and writes can withstand zone failures.

## Unbalanced zones

Expand All @@ -82,6 +78,10 @@ When replica counts are unbalanced, zones with fewer replicas have higher resour
Most cloud providers charge for inter-availability zone networking.
Deploying Grafana Mimir with zone-aware replication across multiple cloud provider availability zones likely results in additional networking costs.

> **Note:** The requests that the distributors receive are usually compressed, and the requests that the distributors send to the ingesters are uncompressed by default.
> This can result in increased cross-zone bandwidth costs (because at least two ingesters will be in different availability zones).
> If this cost is a concern, you can compress those requests by setting the `-ingester.client.grpc-compression` CLI flag, or its respective YAML configuration parameter, to `snappy` or `gzip` in the distributors.

## Kubernetes operator for simplifying rollouts of zone-aware components

The [Kubernetes Rollout Operator](https://github.com/grafana/rollout-operator) is a Kubernetes operator that makes it easier for you to manage multi-availability-zone rollouts. Consider using the Kubernetes Rollout Operator when you run Grafana Mimir on Kubernetes with zone awareness enabled.
Expand Down
4 changes: 2 additions & 2 deletions docs/sources/mimir/operators-guide/get-started/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ weight: 10

You can get started with Grafana Mimir _imperatively_ or _declaratively_:

- **Imperatively**: The written instructions that follow contain commands to help you start a single Mimir process. You would need to perform the commands again to start another Mimir process.<p>
- **Imperatively**: The written instructions that follow contain commands to help you start a single Mimir process. You would need to perform the commands again to start another Mimir process.
- **Declaratively**: The following video tutorial uses `docker-compose` to deploy multiple Mimir processes. Therefore, if you want to deploy multiple Mimir processes later, the majority of the configuration work will have already been done.

{{< vimeo 691947043 >}}
Expand Down Expand Up @@ -178,7 +178,7 @@ metrics:
In a new terminal, run a local Grafana server using Docker:

```bash
docker run --rm --name=grafana --network=host grafana/grafana
docker run --rm --name=grafana -p 3000:3000 grafana/grafana
```

### Add Grafana Mimir as a Prometheus data source
Expand Down
Loading