From 1e5a7331a2aa23394c5eae0cab55586edf9c5ae2 Mon Sep 17 00:00:00 2001 From: "loki-gh-app[bot]" <160051081+loki-gh-app[bot]@users.noreply.github.com> Date: Thu, 23 Jan 2025 09:35:10 -0500 Subject: [PATCH] docs: Replace SSD sizing tool with cluster tiers for distributed/microservices mode (backport release-3.2.x) (#15913) Co-authored-by: Poyzan <31743851+poyzannur@users.noreply.github.com> --- docs/sources/setup/size/_index.md | 143 +++++++++++------------------- 1 file changed, 54 insertions(+), 89 deletions(-) diff --git a/docs/sources/setup/size/_index.md b/docs/sources/setup/size/_index.md index 162748eb9e3b8..ed052ab517ee6 100644 --- a/docs/sources/setup/size/_index.md +++ b/docs/sources/setup/size/_index.md @@ -1,7 +1,4 @@ --- -_build: - list: false -noindex: true title: Size the cluster menuTitle: Size the cluster description: Provides a tool that generates a Helm Chart values.yaml file based on expected ingestion, retention rate, and node type, to help size your Grafana deployment. @@ -17,73 +14,63 @@ weight: 100 -This tool helps to generate a Helm Charts `values.yaml` file based on specified - expected ingestion, retention rate and node type. It will always configure a - [scalable]({{< relref "../../get-started/deployment-modes#simple-scalable" >}}) deployment. The storage needs to be configured after generation. +This section is a guide to size base resource needs of a Loki cluster. +Based on the expected ingestion volume, Loki clusters can be categorised into three tiers. Recommendations below are based on p90 resource utilisations of the relevant components. Each tab represents a different tier. +Please use this document as a rough guide to specify CPU and Memory requests in your deployment. This is only documented for [microservices/distributed](https://grafana.com/docs/loki//get-started/deployment-modes/#microservices-mode) mode at this time. + +Query resource needs can greatly vary with usage patterns and correct configurations. General notes on Query Performance: +- The rule of thumb is to run as small and as many queriers as possible. Unoptimised queries can easily require 10x of the suggested querier resources below in all tiers. Running horizontal autoscaling will be most cost effective solution to meet the demand. +- Use this [blog post](https://grafana.com/blog/2023/12/28/the-concise-guide-to-loki-how-to-get-the-most-out-of-your-query-performance/) to adopt best practices for optimised query performance. +- Parallel-querier and related components can be sized the same along with queriers to start, depending on how much Loki rules are used. +- Large Loki clusters benefit from a disk based caching solution, memcached-extstore. Please see the detailed [blog post](https://grafana.com/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/) and read more about [memcached/nvm-caching here](https://memcached.org/blog/nvm-caching/). +- If you’re running a cluster that handles less than 30TB/day (~1PB/month) ingestion, we do not recommend configuring memcached-extstore. The additional operational complexity does not justify the savings. + +These are the node types we suggest from various cloud providers. Please see the relevant specifications in your provider documentation.
- - - -
- - GB/day -
- -
- - days -
- -
- - -
- -
- - - - - - - - - - - - - - - -
Read ReplicasWrite ReplicasNodesCoresMemory
{{ clusterSize.TotalReadReplicas }}{{ clusterSize.TotalWriteReplicas }}{{ clusterSize.TotalNodes}}{{ clusterSize.TotalCoresRequest}}{{ clusterSize.TotalMemoryRequest}} GB
-
- - Generate and download values file - -
- - Defines the log volume in gigabytes, ie 1e+9 bytes, expected to be ingested each day. - - - Defines the node type of the Kubernetes cluster. Is a vendor or type - missing? If so, add it to pkg/sizing/node.go. - - - Defines how long the ingested logs should be kept. - - - Defines the expected query performance. Basic is sized for a max query throughput of around 3GB/s. Super aims for 25% more throughput. - -
+ +
+{{< tabs >}} +{{< tab-content name="Less than 100TB/month (3TB/day)" >}} +| Component | CPU Request | Memory Request (Gi)| Base Replicas | Total CPU Req |Total Mem Req (Gi)| +|------------------|-------------|-------------------|----------------|----------------|-----------------| +| Ingester | 2 | 4 | 6 | 12 | 36 | +| Distributor | 2 | 0.5 | 4 | 8 | 2 | +| Index gateway | 0.5 | 2 | 4 | 2 | 8 | +| Querier | 1 | 1 | 10 | 10 | 10 | +| Query-frontend | 1 | 2 | 2 | 2 | 4 | +| Query-scheduler | 1 | 0.5 | 2 | 2 | 1 | +| Compactor | 2 | 10 | 1 (Singleton) | 2 | 10 | +{{< /tab-content >}} +{{< tab-content name="100TB to 1PB /month (3-30TB/day)" >}} +| Component | CPU Request | Memory Request (Gi)| Base Replicas | Total CPU Req |Total Mem Req (Gi)| +|------------------|-------------|-------------------|----------------|----------------|-----------------| +| Ingester | 2 | 6 | 90 | 180 | 540 | +| Distributor | 2 | 1 | 40 | 80 | 40 | +| Index gateway | 0.5 | 4 | 10 | 5 | 40 | +| Querier | 1.5 | 2 | 100 | 150 | 200 | +| Query-frontend | 1 | 2 | 8 | 8 | 16 | +| Query-scheduler | 1 | 0.5 | 2 | 2 | 1 | +| Compactor | 6 | 20 | 1 (Singleton) | 6 | 20 | +{{< /tab-content >}} +{{< tab-content name="~1PB/month (30TB/day)" >}} +| Component | CPU Request | Memory Request (Gi)| Base Replicas | Total CPU Req |Total Mem Req (Gi)| +|------------------|-------------|-------------------|----------------|----------------|-----------------| +| Ingester | 4 | 8 | 150 | 600 | 1200 | +| Distributor | 2 | 1 | 100 | 200 | 100 | +| Index gateway | 1 | 4 | 20 | 20 | 80 | +| Querier | 1.5 | 3 | 250 | 375 | 750 | +| Query-frontend | 1 | 4 | 16 | 16 | 64 | +| Query-scheduler | 2 | 0.5 | 2 | 4 | 1 | +| Compactor | 6 | 40 | 1 (Singleton) | 6 | 40 | +{{< /tab-content >}} +{{< /tabs >}} + + @@ -118,11 +97,7 @@ createApp({ return { nodes: ["Loading..."], node: "Loading...", - bytesDayIngest: null, - retention: null, - queryperf: 'Basic', help: null, - clusterSize: null } }, @@ -159,20 +134,10 @@ createApp({ const url = `${API_URL}/nodes` this.nodes = await (await fetch(url,{mode: 'cors'})).json() }, - async calculateClusterSize() { - if (this.node == 'Loading...' || this.bytesDayIngest== null || this.retention == null) { - return - } - const url = `${API_URL}/cluster?${this.queryString}` - this.clusterSize = await (await fetch(url,{mode: 'cors'})).json() - } }, watch: { node: 'calculateClusterSize', - bytesDayIngest: 'calculateClusterSize', - retention: 'calculateClusterSize', - queryperf: 'calculateClusterSize' } }).mount('#app')