Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki gateway metrics (Nginx) #9522

Closed
DanielCastronovo opened this issue May 25, 2023 · 16 comments
Closed

Loki gateway metrics (Nginx) #9522

DanielCastronovo opened this issue May 25, 2023 · 16 comments

Comments

@DanielCastronovo
Copy link

Is your feature request related to a problem? Please describe.
I'm not be able to view if Loki Gateway (Nginx) is fully operational.
Only logs.

Describe the solution you'd like
Enable nginx exporter + service monitor + create a dashboard + alert

@paltaa
Copy link

paltaa commented May 6, 2024

Hey, i enabled monitoring in the helm chart but getting targetDown for loki-gateway scraper

monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  dashboards:
    enabled: true
  rules:
    enabled: true
  serviceMonitor:
    enabled: true
  lokiCanary:
    enabled: false

Alerts:

[FIRING:1] ⚠️ TargetDown
• 100% of the monitoring/loki-gateway/loki-gateway targets in monitoring namespace are down.

This is using alertmanager with prometheus, any ideas on what values do i need to configure nginx-exporter for loki-gateway pod in kubernetes?

Cheers

@paltaa
Copy link

paltaa commented May 8, 2024

Took a look at the rendered CRD's

Name:         loki
Namespace:    monitoring
Labels:       app.kubernetes.io/instance=loki
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=loki
              app.kubernetes.io/version=3.0.0
              argocd.argoproj.io/instance=loki
              helm.sh/chart=loki-6.5.0
Annotations:  <none>
API Version:  monitoring.coreos.com/v1
Kind:         ServiceMonitor
Metadata:
  Creation Timestamp:  2024-02-28T13:15:15Z
  Generation:          1
  Resource Version:    40402766
  UID:                 7d63382c-2cf4-45ab-9200-f3239a2dda76
Spec:
  Endpoints:
    Interval:  15s
    Path:      /metrics
    Port:      http-metrics
    Relabelings:
      Action:       replace
      Replacement:  monitoring/$1
      Source Labels:
        job
      Target Label:  job
      Action:        replace
      Replacement:   loki
      Target Label:  cluster
    Scheme:          http
  Selector:
    Match Expressions:
      Key:       prometheus.io/service-monitor
      Operator:  NotIn
      Values:
        false
    Match Labels:
      app.kubernetes.io/instance:  loki
      app.kubernetes.io/name:      loki
Events:                            <none>

Its just a serviceMonitor pointing to a broken service endpoint so we can safely delete for the moment:

monitoring:
  selfMonitoring:
    enabled: false
    grafanaAgent:
      installOperator: false
  dashboards:
    enabled: false
  rules:
    enabled: false
  serviceMonitor:
    enabled: false
  lokiCanary:
    enabled: false

@Eyeless77
Copy link

Eyeless77 commented May 26, 2024

Seems like /metrics path is not defined in nginx.conf for loki-gateway:
https://github.com/grafana/loki/blob/main/production/helm/loki/templates/_helpers.tpl#L750-L1014

But this endpoint is defined for loki-gateway deployment template:
https://github.com/grafana/loki/blob/main/production/helm/loki/templates/gateway/deployment-gateway-nginx.yaml#L63-L66

Servicemonitor is created for Prometheus to scrape all http-metrics endpoints, so it gets 404 when it tries to scrape /metrics:

10.244.4.42 - - [26/May/2024:10:01:37 +0000]  404 "GET /metrics HTTP/1.1" 153 "-" "Prometheus/2.51.1" "-"
10.244.4.42 - - [26/May/2024:10:01:52 +0000]  404 "GET /metrics HTTP/1.1" 153 "-" "Prometheus/2.51.1" "-"

IMO the dirty way is to set serviceMonitor.enabled: false as @paltaa suggested. But it disables monitoring for the whole loki deployment.

@Eyeless77
Copy link

Looks like previously in 2.x helm charts the endpoint name was just http:
https://github.com/grafana/loki/blob/v2.9.8/production/helm/loki/templates/gateway/deployment-gateway.yaml#L62

And now it's changed for http-metrics and is also used by readinessProbe for gateway deployment:
https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L1019-L1022

@Pionerd
Copy link
Contributor

Pionerd commented May 26, 2024

Suffering from the same issue.

A bit nicer workaround: the serviceMonitor contains a check where the label prometheus.io/service-monitor: "false" may not be present on your service. So by adding that to your Gateway service it should be excluded, until the above is fixed in the helm chart itself.

values.yaml

gateway:
  service:
    labels:
      prometheus.io/service-monitor: "false"

@akorp
Copy link

akorp commented May 29, 2024

In our case before the upgrade to v3 (chart: v5.20.0) we didn't have prometheus scraping of the gateway pods likely because the port names didn't match.

kind: ServiceMonitor
  endpoints:
    - port: http-metrics
      path: /metrics
kind: Deployment
metadata:
  name: loki-gateway
          ports:
            - name: http

After upgrading to v3 (v6.6.1) we got monitoring of gateway pods (the gateway pods got http-metrics port), but since we enabled auth on the gateway (basicAuth: enabled: true), prometheus scraping is getting 401 response.

server returned HTTP status 401 Unauthorized
http://10.1.5.228:8080/metrics

What is the best practice here? Is it possible to add an option to disable authentication only for metrics endpoint in the gateway-nginx via helm-chart? Or is adding auth credentials for prometheus scraping a preferred option here?

@pschichtel
Copy link

@akorp the issue is not auth, the issue is that /metrics is not handled, having auth enabled just fails the request with a 401 instead of 404.

This commit introduced the change seemingly as a drive-by: 79b876b#diff-d79225d50b6c12d41bceaed705a35fd5b5fff56f829fbbe5744ce6be632a0038

I think the port rename should be reverted. Until then @Pionerd's workaround is probably the best.

tyriis added a commit to tyriis/home-ops that referenced this issue Jun 4, 2024
@Pionerd
Copy link
Contributor

Pionerd commented Jun 13, 2024

@DanielCastronovo How is this completed?

@ThePooN
Copy link

ThePooN commented Jun 21, 2024

Still seems to be an issue here as well.

Worked-around using:

gateway:
  service:
    labels:
      prometheus.io/service-monitor: "false"

@ohdearaugustin
Copy link

Not completed still an issue. Please reopen.

Probably the closed it because they move their monitoring to this new even less complete meta monitoring chart.....

@konglingning
Copy link

same issue.

@KA-ROM
Copy link

KA-ROM commented Aug 22, 2024

Same. Please reopen.

@vrivellino
Copy link

I recently upgraded to v6.10.0 of the helm chart and experienced this same issue. I worked around it by deploying nginx-prometheus-exporter along side nginx in the loki-gateway deployment. This how I did it:

loki chart values snippet

gateway:
  nginxConfig:
    serverSnippet: |
      location = /stub_status {
        stub_status on;
        allow 127.0.0.1;
        deny all;
      }
      location = /metrics {
        proxy_pass       http://127.0.0.1:9113/metrics;
      }
  extraContainers:
    - name: nginx-exporter
      securityContext:
        allowPrivilegeEscalation: false
      image: nginx/nginx-prometheus-exporter:1.3.0
      imagePullPolicy: IfNotPresent
      ports:
        - containerPort: 9113
          name: http-exporter
      resources:
        limits:
          memory: 128Mi
          cpu: 500m
        requests:
          memory: 64Mi
          cpu: 100m

@hollanbm
Copy link

hollanbm commented Aug 29, 2024

I recently upgraded to v6.10.0 of the helm chart and experienced this same issue. I worked around it by deploying nginx-prometheus-exporter along side nginx in the loki-gateway deployment. This how I did it:

loki chart values snippet

Thanks for this, I too just ran into this with the chart upgrade.

TPXP added a commit to TPXP/terraform-kubernetes-addons that referenced this issue Oct 16, 2024
We wouldn't get much details from nginx anyway as the pod is nginx OSS,
so let's forget about metrics

Ref: grafana/loki#9522 (comment)
TPXP added a commit to TPXP/terraform-kubernetes-addons that referenced this issue Oct 16, 2024
We wouldn't get much details from nginx anyway as the pod is nginx OSS,
so let's forget about metrics for this component

Ref: grafana/loki#9522 (comment)
Signed-off-by: Thomas P. <[email protected]>
ArchiFleKs pushed a commit to particuleio/terraform-kubernetes-addons that referenced this issue Oct 18, 2024
We wouldn't get much details from nginx anyway as the pod is nginx OSS,
so let's forget about metrics for this component

Ref: grafana/loki#9522 (comment)

Signed-off-by: Thomas P. <[email protected]>
@trallnag
Copy link
Contributor

Is there an open issue for this? Maybe the title of this one is not sufficient

@JeffreyVdb
Copy link

To add to the answer of @vrivellino
It's also possible to perform this using a native sidecar container by using the post rendering feature in helm:

patches:
  - target:
      kind: Deployment
      labelSelector: app.kubernetes.io/name=loki,app.kubernetes.io/component=gateway
    patch: |-
      - op: add
        path: /spec/template/spec/initContainers
        value:
          - name: nginx-exporter
            image: public.ecr.aws/nginx/nginx-prometheus-exporter:1.4
            imagePullPolicy: IfNotPresent
            securityContext:
              allowPrivilegeEscalation: false

            # Makes this a native sidecar container
            restartPolicy: Always

            ports:
              - containerPort: 9113
                name: http-exporter

            resources:
              requests:
                memory: 100Mi
                cpu: 50m
              limits:
                memory: 100Mi
                cpu: 50m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests