Skip to content

Commit b4d44f8

Browse files
feat: Update Loki monitoring docs to new meta monitoring helm (#13176)
Co-authored-by: J Stickler <[email protected]>
1 parent a08ee68 commit b4d44f8

File tree

2 files changed

+382
-257
lines changed

2 files changed

+382
-257
lines changed

docs/sources/setup/install/helm/monitor-and-alert/with-grafana-cloud.md

+232-66
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Configure monitoring and alerting of Loki using Grafana Cloud
2+
title: Monitor Loki with Grafana Cloud
33
menuTitle: Monitor Loki with Grafana Cloud
4-
description: Configuring monitoring and alerts for Loki using Grafana Cloud.
4+
description: Configuring monitoring for Loki using Grafana Cloud.
55
aliases:
66
- ../../../../installation/helm/monitor-and-alert/with-grafana-cloud
77
weight: 200
@@ -12,89 +12,255 @@ keywords:
1212
- grafana cloud
1313
---
1414

15-
# Configure monitoring and alerting of Loki using Grafana Cloud
15+
# Monitor Loki with Grafana Cloud
1616

17-
This topic will walk you through using Grafana Cloud to monitor a Loki installation that is installed with the Helm chart. This approach leverages many of the chart's _self monitoring_ features, but instead of sending logs back to Loki itself, it sends them to a Grafana Cloud Logs instance. This approach also does not require the installation of the Prometheus Operator and instead sends metrics to a Grafana Cloud Metrics instance. Using Grafana Cloud to monitor Loki has the added benefit of being able to troubleshoot problems with Loki when the Helm installed Loki is down, as the logs will still be available in the Grafana Cloud Logs instance.
17+
This guide will walk you through using Grafana Cloud to monitor a Loki installation set up with the `meta-monitoring` Helm chart. This method takes advantage of many of the chart's self-monitoring features, sending metrics, logs, and traces from the Loki deployment to Grafana Cloud. Monitoring Loki with Grafana Cloud offers the added benefit of troubleshooting Loki issues even when the Helm-installed Loki is down, as the telemetry data will remain available in the Grafana Cloud instance.
1818

19-
**Before you begin:**
19+
These instructions are based off the [meta-monitoring-chart repository](https://github.com/grafana/meta-monitoring-chart/tree/main).
20+
21+
## Before you begin
2022

2123
- Helm 3 or above. See [Installing Helm](https://helm.sh/docs/intro/install/).
2224
- A Grafana Cloud account and stack (including Cloud Grafana, Cloud Metrics, and Cloud Logs).
23-
- [Grafana Kubernetes Monitoring using Agent Flow](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/) configured for the Kubernetes cluster.
2425
- A running Loki deployment installed in that Kubernetes cluster via the Helm chart.
2526

26-
**Prequisites for Monitoring Loki:**
27+
## Configure the meta namespace
28+
29+
The meta-monitoring stack will be installed in a separate namespace called `meta`. To create this namespace, run the following command:
30+
31+
```bash
32+
kubectl create namespace meta
33+
```
34+
35+
## Grafana Cloud Connection Credentials
2736

28-
You must setup the Grafana Kubernetes Integration following the instructions in [Grafana Kubernetes Monitoring using Agent Flow](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/) as this will install necessary components for collecting metrics about your Kubernetes cluster and sending them to Grafana Cloud. Many of the dashboards installed as a part of the Loki integration rely on these metrics.
37+
The meta-monitoring stack sends metrics, logs, and traces to Grafana Cloud. This requires that you know your connection credentials to Grafana Cloud. To obtain connection credentials, follow the steps below:
2938

30-
Walking through this installation will create two Grafana Agent configurations, one for metrics and one for logs, that will add the external label `cluster: cloud`. In order for the Dashboards in the self-hosted Grafana Loki integration to work, the cluster name needs to match your Helm installation name. If you installed Loki using the command `helm install best-loki-cluster grafana/loki`, you would need to change the `cluster` value in both Grafana Agent configurations from `cloud` to `best-loki-cluster` when setting up the Grafana Kubernetes integration.
39+
1. Create a new Cloud Access Policy in Grafana Cloud.
40+
1. Sign into [Grafana Cloud](https://grafana.com/auth/sign-in/).
41+
1. In the main menu, select **Security > Access Policies**.
42+
1. Click **Create access policy**.
43+
1. Give the policy a **Name** and select the following permissions:
44+
- Metrics: Write
45+
- Logs: Write
46+
- Traces: Write
47+
1. Click **Create**.
3148

32-
**To set up the Loki integration in Grafana Cloud:**
3349

34-
1. Get valid Push credentials for your Cloud Metrics and Cloud Logs instances.
35-
1. Create a secret in the same namespace as Loki to store your Cloud Logs credentials.
50+
1. Once the policy is created, select the policy and click **Add token**.
51+
1. Name the token, select an expiration date, then click **Create**.
52+
1. Copy the token to a secure location as it will not be displayed again.
3653

54+
1. Navigate to the Grafana Cloud Portal **Overview** page.
55+
1. Click the **Details** button for your Prometheus or Mimir instance.
56+
1. From the **Using a self-hosted Grafana instance with Grafana Cloud Metrics** section, collect the instance **Name** and **URL**.
57+
1. Navigate back to the **Overview** page.
58+
1. Click the **Details** button for your Loki instance.
59+
1. From the **Using Grafana with Logs** section, collect the instance **Name** and **URL**.
60+
1. Navigate back to the **Overview** page.
61+
1. Click the **Details** button for your Tempo instance.
62+
1. From the **Using Grafana with Tempo** section, collect the instance **Name** and **URL**.
63+
64+
3. Finally, generate the secrets to store your credentials for each metric type within your Kubernetes cluster:
3765
```bash
38-
cat <<'EOF' | NAMESPACE=loki /bin/sh -c 'kubectl apply -n $NAMESPACE -f -'
39-
apiVersion: v1
40-
data:
41-
password: <BASE64_ENCODED_CLOUD_LOGS_PASSWORD>
42-
username: <BASE64_ENCODED_CLOUD_LOGS_USERNAME>
43-
kind: Secret
44-
metadata:
45-
name: grafana-cloud-logs-credentials
46-
type: Opaque
47-
EOF
66+
kubectl create secret generic logs -n meta \
67+
--from-literal=username=<USERNAME LOGS> \
68+
--from-literal= <ACCESS POLICY TOKEN> \
69+
--from-literal=endpoint='https://<LOG URL>/loki/api/v1/push'
70+
71+
kubectl create secret generic metrics -n meta \
72+
--from-literal=username=<USERNAME METRICS> \
73+
--from-literal=password=<ACCESS POLICY TOKEN> \
74+
--from-literal=endpoint='https://<METRICS URL>/api/prom/push'
75+
76+
kubectl create secret generic traces -n meta \
77+
--from-literal=username=<OTLP INSTANCE ID> \
78+
--from-literal=password=<ACCESS POLICY TOKEN> \
79+
--from-literal=endpoint='https://<OTLP URL>/otlp'
4880
```
4981

50-
1. Create a secret to store your Cloud Metrics credentials.
82+
## Configuration and Installation
83+
84+
To install the `meta-monitoring` Helm chart, you must create a `values.yaml` file. At a minimum this file should contain the following:
85+
* The namespace to monitor
86+
* Enablement of cloud monitoring
87+
88+
This example `values.yaml` file provides the minimum configuration to monitor the `loki` namespace:
89+
90+
```yaml
91+
namespacesToMonitor:
92+
- default
93+
94+
cloud:
95+
logs:
96+
enabled: true
97+
secret: "logs"
98+
metrics:
99+
enabled: true
100+
secret: "metrics"
101+
traces:
102+
enabled: true
103+
secret: "traces"
104+
```
105+
For further configuration options, refer to the [sample values.yaml file](https://github.com/grafana/meta-monitoring-chart/blob/main/charts/meta-monitoring/values.yaml).
106+
107+
To install the `meta-monitoring` Helm chart, run the following commands:
108+
109+
```bash
110+
helm repo add grafana https://grafana.github.io/helm-charts
111+
helm repo update
112+
helm install meta-monitoring grafana/meta-monitoring -n meta -f values.yaml
113+
```
114+
or when upgrading the configuration:
115+
```bash
116+
helm upgrade meta-monitoring grafana/meta-monitoring -n meta -f values.yaml
117+
```
118+
119+
To verify the installation, run the following command:
120+
121+
```bash
122+
kubectl get pods -n meta
123+
```
124+
It should return the following pods:
125+
```bash
126+
NAME READY STATUS RESTARTS AGE
127+
meta-alloy-0 2/2 Running 0 23h
128+
meta-alloy-1 2/2 Running 0 23h
129+
meta-alloy-2 2/2 Running 0 23h
130+
```
131+
132+
133+
## Enable Loki Tracing
134+
135+
By default, Loki does not have tracing enabled. To enable tracing, modify the Loki configuration by editing the `values.yaml` file and adding the following configuration:
136+
137+
Set the `tracing.enabled` configuration to `true`:
138+
```yaml
139+
loki:
140+
tracing:
141+
enabled: true
142+
```
51143

144+
Next, instrument each of the Loki components to send traces to the meta-monitoring stack. Add the `extraEnv` configuration to each of the Loki components:
145+
146+
```yaml
147+
ingester:
148+
replicas: 3
149+
extraEnv:
150+
- name: JAEGER_ENDPOINT
151+
value: "http://mmc-alloy-external.default.svc.cluster.local:14268/api/traces"
152+
# This sets the Jaeger endpoint where traces will be sent.
153+
# The endpoint points to the mmc-alloy service in the default namespace at port 14268.
154+
155+
- name: JAEGER_AGENT_TAGS
156+
value: 'cluster="prod",namespace="default"'
157+
# This specifies additional tags to attach to each span.
158+
# Here, the cluster is labeled as "prod" and the namespace as "default".
159+
160+
- name: JAEGER_SAMPLER_TYPE
161+
value: "ratelimiting"
162+
# This sets the sampling strategy for traces.
163+
# "ratelimiting" means that traces will be sampled at a fixed rate.
164+
165+
- name: JAEGER_SAMPLER_PARAM
166+
value: "1.0"
167+
# This sets the parameter for the sampler.
168+
# For ratelimiting, "1.0" typically means one trace per second.
169+
```
170+
171+
Since the meta-monitoring stack is installed in the `meta` namespace, the Loki components will need to be able to communicate with the meta-monitoring stack. To do this, create a new `externalname` service in the `default` namespace that points to the `meta` namespace by running the following command:
172+
173+
```bash
174+
kubectl create service externalname mmc-alloy-external --external-name meta-alloy.meta.svc.cluster.local -n default
175+
```
176+
177+
Finally, upgrade the Loki installation with the new configuration:
178+
179+
```bash
180+
helm upgrade --values values.yaml loki grafana/loki
181+
```
182+
183+
## Import the Loki Dashboards to Grafana Cloud
184+
185+
The meta-monitoring stack includes a set of dashboards that can be imported into Grafana Cloud. These can be found in the [meta-monitoring repository](https://github.com/grafana/meta-monitoring-chart/tree/main/charts/meta-monitoring/src/dashboards).
186+
187+
188+
## Installing Rules
189+
190+
The meta-monitoring stack includes a set of rules that can be installed to monitor the Loki installation. These rules can be found in the [meta-monitoring repository](https://github.com/grafana/meta-monitoring-chart/). To install the rules:
191+
192+
1. Clone the repository:
193+
```bash
194+
git clone https://github.com/grafana/meta-monitoring-chart/
195+
```
196+
1. Install `mimirtool` based on the instructions located [here](https://grafana.com/docs/mimir/latest/manage/tools/mimirtool/)
197+
1. Create a new access policy token in Grafana Cloud with the following permissions:
198+
- Rules: Write
199+
- Rules: Read
200+
1. Create a token for the access policy and copy it to a secure location.
201+
1. Install the rules:
52202
```bash
53-
cat <<'EOF' | NAMESPACE=loki /bin/sh -c 'kubectl apply -n $NAMESPACE -f -'
54-
apiVersion: v1
55-
data:
56-
password: <BASE64_ENCODED_CLOUD_METRICS_PASSWORD>
57-
username: <BASE64_ENCODED_CLOUD_METRICS_USERNAME>
58-
kind: Secret
59-
metadata:
60-
name: grafana-cloud-metrics-credentials
61-
type: Opaque
62-
EOF
203+
mimirtool rules load --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token> *.yaml
63204
```
205+
1. Verify that the rules have been installed:
206+
```bash
207+
mimirtool rules list --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token>
208+
```
209+
It should return a list of rules that have been installed.
210+
```bash
64211
65-
1. Enable monitoring metrics and logs for the Loki installation to be sent your cloud database instances by adding the following to your Helm `values.yaml` file:
66-
67-
```yaml
68-
---
69-
monitoring:
70-
dashboards:
71-
enabled: false
72-
rules:
73-
enabled: false
74-
selfMonitoring:
75-
logsInstance:
76-
clients:
77-
- url: <CLOUD_LOGS_URL>
78-
basicAuth:
79-
username:
80-
name: grafana-cloud-logs-credentials
81-
key: username
82-
password:
83-
name: grafana-cloud-logs-credentials
84-
key: password
85-
serviceMonitor:
86-
metricsInstance:
87-
remoteWrite:
88-
- url: <CLOUD_METRICS_URL>
89-
basicAuth:
90-
username:
91-
name: grafana-cloud-metrics-credentials
92-
key: username
93-
password:
94-
name: grafana-cloud-metrics-credentials
95-
key: password
212+
loki-rules:
213+
- name: loki_rules
214+
rules:
215+
- record: cluster_job:loki_request_duration_seconds:99quantile
216+
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job))
217+
- record: cluster_job:loki_request_duration_seconds:50quantile
218+
expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job))
219+
- record: cluster_job:loki_request_duration_seconds:avg
220+
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)
221+
- record: cluster_job:loki_request_duration_seconds_bucket:sum_rate
222+
expr: sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job)
223+
- record: cluster_job:loki_request_duration_seconds_sum:sum_rate
224+
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job)
225+
- record: cluster_job:loki_request_duration_seconds_count:sum_rate
226+
expr: sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)
227+
- record: cluster_job_route:loki_request_duration_seconds:99quantile
228+
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job, route))
229+
- record: cluster_job_route:loki_request_duration_seconds:50quantile
230+
expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job, route))
231+
- record: cluster_job_route:loki_request_duration_seconds:avg
232+
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)
233+
- record: cluster_job_route:loki_request_duration_seconds_bucket:sum_rate
234+
expr: sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job, route)
235+
- record: cluster_job_route:loki_request_duration_seconds_sum:sum_rate
236+
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)
237+
- record: cluster_job_route:loki_request_duration_seconds_count:sum_rate
238+
expr: sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)
239+
- record: cluster_namespace_job_route:loki_request_duration_seconds:99quantile
240+
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))
241+
- record: cluster_namespace_job_route:loki_request_duration_seconds:50quantile
242+
expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))
243+
- record: cluster_namespace_job_route:loki_request_duration_seconds:avg
244+
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)
245+
- record: cluster_namespace_job_route:loki_request_duration_seconds_bucket:sum_rate
246+
expr: sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route)
247+
- record: cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate
248+
expr: sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route)
249+
- record: cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate
250+
expr: sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)
96251
```
252+
## Install kube-state-metrics
253+
254+
Metrics about Kubernetes objects are scraped from [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics). This needs to be installed in the cluster. The `kubeStateMetrics.endpoint` entry in the meta-monitoring `values.yaml` should be set to its address (without the `/metrics` part in the URL):
255+
256+
```yaml
257+
kubeStateMetrics:
258+
# Scrape https://github.com/kubernetes/kube-state-metrics by default
259+
enabled: true
260+
# This endpoint is created when the helm chart from
261+
# https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics/
262+
# is used. Change this if kube-state-metrics is installed somewhere else.
263+
endpoint: kube-state-metrics.kube-state-metrics.svc.cluster.local:8080
264+
```
97265

98-
1. Install the self-hosted Grafana Loki integration by going to your hosted Grafana instance, selecting **Connections** from the Home menu, then search for and install the **Self-hosted Grafana Loki** integration.
99266

100-
1. Once the self-hosted Grafana Loki integration is installed, click the **View Dashboards** button to see the installed dashboards.

0 commit comments

Comments
 (0)