Loki gateway metrics (Nginx) #9522

DanielCastronovo · 2023-05-25T16:26:09Z

Is your feature request related to a problem? Please describe.
I'm not be able to view if Loki Gateway (Nginx) is fully operational.
Only logs.

Describe the solution you'd like
Enable nginx exporter + service monitor + create a dashboard + alert

paltaa · 2024-05-06T17:14:39Z

Hey, i enabled monitoring in the helm chart but getting targetDown for loki-gateway scraper

monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  dashboards:
    enabled: true
  rules:
    enabled: true
  serviceMonitor:
    enabled: true
  lokiCanary:
    enabled: false

Alerts:

[FIRING:1] ⚠️ TargetDown
• 100% of the monitoring/loki-gateway/loki-gateway targets in monitoring namespace are down.

This is using alertmanager with prometheus, any ideas on what values do i need to configure nginx-exporter for loki-gateway pod in kubernetes?

Cheers

paltaa · 2024-05-08T17:09:46Z

Took a look at the rendered CRD's

Name:         loki
Namespace:    monitoring
Labels:       app.kubernetes.io/instance=loki
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=loki
              app.kubernetes.io/version=3.0.0
              argocd.argoproj.io/instance=loki
              helm.sh/chart=loki-6.5.0
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Creation Timestamp:  2024-02-28T13:15:15Z
  Generation:          1
  Resource Version:    40402766
  UID:                 7d63382c-2cf4-45ab-9200-f3239a2dda76
Spec:
  Endpoints:
    Interval:  15s
    Path:      /metrics
    Port:      http-metrics
    Relabelings:
      Action:       replace
      Replacement:  monitoring/$1
      Source Labels:
        job
      Target Label:  job
      Action:        replace
      Replacement:   loki
      Target Label:  cluster
    Scheme:          http
  Selector:
    Match Expressions:
      Key:       prometheus.io/service-monitor
      Operator:  NotIn
      Values:
        false
    Match Labels:
      app.kubernetes.io/instance:  loki
      app.kubernetes.io/name:      loki
Events:                            <none>

Its just a serviceMonitor pointing to a broken service endpoint so we can safely delete for the moment:

monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  dashboards:
    enabled: false
  rules:
    enabled: false
  serviceMonitor:
    enabled: false
  lokiCanary:
    enabled: false

Eyeless77 · 2024-05-26T10:05:42Z

Seems like /metrics path is not defined in nginx.conf for loki-gateway:
https://github.com/grafana/loki/blob/main/production/helm/loki/templates/_helpers.tpl#L750-L1014

But this endpoint is defined for loki-gateway deployment template:
https://github.com/grafana/loki/blob/main/production/helm/loki/templates/gateway/deployment-gateway-nginx.yaml#L63-L66

Servicemonitor is created for Prometheus to scrape all http-metrics endpoints, so it gets 404 when it tries to scrape /metrics:

10.244.4.42 - - [26/May/2024:10:01:37 +0000]  404 "GET /metrics HTTP/1.1" 153 "-" "Prometheus/2.51.1" "-"
10.244.4.42 - - [26/May/2024:10:01:52 +0000]  404 "GET /metrics HTTP/1.1" 153 "-" "Prometheus/2.51.1" "-"

IMO the dirty way is to set serviceMonitor.enabled: false as @paltaa suggested. But it disables monitoring for the whole loki deployment.

Eyeless77 · 2024-05-26T10:26:22Z

Looks like previously in 2.x helm charts the endpoint name was just http:
https://github.com/grafana/loki/blob/v2.9.8/production/helm/loki/templates/gateway/deployment-gateway.yaml#L62

And now it's changed for http-metrics and is also used by readinessProbe for gateway deployment:
https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L1019-L1022

Pionerd · 2024-05-26T11:31:10Z

Suffering from the same issue.

A bit nicer workaround: the serviceMonitor contains a check where the label prometheus.io/service-monitor: "false" may not be present on your service. So by adding that to your Gateway service it should be excluded, until the above is fixed in the helm chart itself.

values.yaml

gateway:
  service:
    labels:
      prometheus.io/service-monitor: "false"

akorp · 2024-05-29T08:43:21Z

In our case before the upgrade to v3 (chart: v5.20.0) we didn't have prometheus scraping of the gateway pods likely because the port names didn't match.

kind: ServiceMonitor
  endpoints:
    - port: http-metrics
      path: /metrics

kind: Deployment
metadata:
  name: loki-gateway
          ports:
            - name: http

After upgrading to v3 (v6.6.1) we got monitoring of gateway pods (the gateway pods got http-metrics port), but since we enabled auth on the gateway (basicAuth: enabled: true), prometheus scraping is getting 401 response.

server returned HTTP status 401 Unauthorized
http://10.1.5.228:8080/metrics

What is the best practice here? Is it possible to add an option to disable authentication only for metrics endpoint in the gateway-nginx via helm-chart? Or is adding auth credentials for prometheus scraping a preferred option here?

pschichtel · 2024-06-03T01:04:43Z

@akorp the issue is not auth, the issue is that /metrics is not handled, having auth enabled just fails the request with a 401 instead of 404.

This commit introduced the change seemingly as a drive-by: 79b876b#diff-d79225d50b6c12d41bceaed705a35fd5b5fff56f829fbbe5744ce6be632a0038

I think the port rename should be reverted. Until then @Pionerd's workaround is probably the best.

Pionerd · 2024-06-13T12:56:33Z

@DanielCastronovo How is this completed?

ThePooN · 2024-06-21T16:40:52Z

Still seems to be an issue here as well.

Worked-around using:

gateway:
  service:
    labels:
      prometheus.io/service-monitor: "false"

ohdearaugustin · 2024-06-26T19:18:23Z

Not completed still an issue. Please reopen.

Probably the closed it because they move their monitoring to this new even less complete meta monitoring chart.....

konglingning · 2024-08-12T08:13:22Z

same issue.

KA-ROM · 2024-08-22T10:57:01Z

Same. Please reopen.

vrivellino · 2024-08-28T16:03:57Z

I recently upgraded to v6.10.0 of the helm chart and experienced this same issue. I worked around it by deploying nginx-prometheus-exporter along side nginx in the loki-gateway deployment. This how I did it:

loki chart values snippet

gateway:
  nginxConfig:
    serverSnippet: |
      location = /stub_status {
        stub_status on;
        allow 127.0.0.1;
        deny all;
      }
      location = /metrics {
        proxy_pass       http://127.0.0.1:9113/metrics;
      }
  extraContainers:
    - name: nginx-exporter
      securityContext:
        allowPrivilegeEscalation: false
      image: nginx/nginx-prometheus-exporter:1.3.0
      imagePullPolicy: IfNotPresent
      ports:
        - containerPort: 9113
          name: http-exporter
      resources:
        limits:
          memory: 128Mi
          cpu: 500m
        requests:
          memory: 64Mi
          cpu: 100m

hollanbm · 2024-08-29T02:15:14Z

I recently upgraded to v6.10.0 of the helm chart and experienced this same issue. I worked around it by deploying nginx-prometheus-exporter along side nginx in the loki-gateway deployment. This how I did it:

loki chart values snippet

Thanks for this, I too just ran into this with the chart upgrade.

We wouldn't get much details from nginx anyway as the pod is nginx OSS, so let's forget about metrics Ref: grafana/loki#9522 (comment)

We wouldn't get much details from nginx anyway as the pod is nginx OSS, so let's forget about metrics for this component Ref: grafana/loki#9522 (comment) Signed-off-by: Thomas P. <[email protected]>

trallnag · 2024-11-27T14:55:09Z

Is there an open issue for this? Maybe the title of this one is not sufficient

JeffreyVdb · 2025-01-17T09:30:00Z

To add to the answer of @vrivellino
It's also possible to perform this using a native sidecar container by using the post rendering feature in helm:

patches:
  - target:
      kind: Deployment
      labelSelector: app.kubernetes.io/name=loki,app.kubernetes.io/component=gateway
    patch: |-
      - op: add
        path: /spec/template/spec/initContainers
        value:
          - name: nginx-exporter
            image: public.ecr.aws/nginx/nginx-prometheus-exporter:1.4
            imagePullPolicy: IfNotPresent
            securityContext:
              allowPrivilegeEscalation: false

            # Makes this a native sidecar container
            restartPolicy: Always

            ports:
              - containerPort: 9113
                name: http-exporter

            resources:
              requests:
                memory: 100Mi
                cpu: 50m
              limits:
                memory: 100Mi
                cpu: 50m

tyriis added a commit to tyriis/home-ops that referenced this issue Jun 4, 2024

fix(loki): disable gateway service-monitor grafana/loki#9522

bc66d39

DanielCastronovo closed this as completed Jun 12, 2024

jkleinlercher mentioned this issue Aug 16, 2024

[monitoring] k8s-monitoring gathers metrics endpoints from loki gateway suxess-it/kubriX#452

Open

dominikdeichsel mentioned this issue Sep 18, 2024

Loki gateway cannot be scraped by prometheus when basicAuth is enabled #14141

Open

TPXP mentioned this issue Oct 16, 2024

fix(loki): workaround targetdown alert for loki-gateway particuleio/terraform-kubernetes-addons#3043

Merged

asherf mentioned this issue Feb 5, 2025

fix(helm): Disable service monitor for nginx service #12746

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loki gateway metrics (Nginx) #9522

Loki gateway metrics (Nginx) #9522

DanielCastronovo commented May 25, 2023

paltaa commented May 6, 2024

paltaa commented May 8, 2024

Eyeless77 commented May 26, 2024 •

edited

Loading

Eyeless77 commented May 26, 2024

Pionerd commented May 26, 2024

akorp commented May 29, 2024

pschichtel commented Jun 3, 2024

Pionerd commented Jun 13, 2024

ThePooN commented Jun 21, 2024 •

edited

Loading

ohdearaugustin commented Jun 26, 2024

konglingning commented Aug 12, 2024

KA-ROM commented Aug 22, 2024

vrivellino commented Aug 28, 2024

hollanbm commented Aug 29, 2024 •

edited

Loading

trallnag commented Nov 27, 2024

JeffreyVdb commented Jan 17, 2025

Loki gateway metrics (Nginx) #9522

Loki gateway metrics (Nginx) #9522

Comments

DanielCastronovo commented May 25, 2023

paltaa commented May 6, 2024

paltaa commented May 8, 2024

Eyeless77 commented May 26, 2024 • edited Loading

Eyeless77 commented May 26, 2024

Pionerd commented May 26, 2024

akorp commented May 29, 2024

pschichtel commented Jun 3, 2024

Pionerd commented Jun 13, 2024

ThePooN commented Jun 21, 2024 • edited Loading

ohdearaugustin commented Jun 26, 2024

konglingning commented Aug 12, 2024

KA-ROM commented Aug 22, 2024

vrivellino commented Aug 28, 2024

hollanbm commented Aug 29, 2024 • edited Loading

trallnag commented Nov 27, 2024

JeffreyVdb commented Jan 17, 2025

Eyeless77 commented May 26, 2024 •

edited

Loading

ThePooN commented Jun 21, 2024 •

edited

Loading

hollanbm commented Aug 29, 2024 •

edited

Loading