[DOC] Add deprecation for metrics summary #4193

knylander-grafana · 2024-10-15T19:17:33Z

What this PR does:

Adds a deprecation note for metrics summary API. Aggregate by is already marked as deprecated in Grafana Cloud.

Part of https://github.com/grafana/website/pull/22148 and https://github.com/grafana/tempo-squad/issues/438

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

docs/sources/tempo/api_docs/metrics-summary.md

09jvilla

LGTM!

Co-authored-by: Jennifer Villa <[email protected]>

* Add deprecation for metrics summary * Update docs/sources/tempo/api_docs/metrics-summary.md * Update docs/sources/tempo/api_docs/metrics-summary.md * Apply suggestions from code review Co-authored-by: Jennifer Villa <[email protected]> --------- Co-authored-by: Jennifer Villa <[email protected]> (cherry picked from commit 760ce80)

* Add deprecation for metrics summary * Update docs/sources/tempo/api_docs/metrics-summary.md * Update docs/sources/tempo/api_docs/metrics-summary.md * Apply suggestions from code review Co-authored-by: Jennifer Villa <[email protected]> --------- Co-authored-by: Jennifer Villa <[email protected]>

MrMegaMango · 2024-11-06T20:34:04Z

sorry for hijacking, is there a problem with metrics summary feature? should I not use it, or is there a replacement

09jvilla · 2024-11-07T14:54:57Z

sorry for hijacking, is there a problem with metrics summary feature? should I not use it, or is there a replacement

There is no problem with it. Its simply that this API is now redundant given that we have TraceQL metrics queries, and we'd like to deprecate it in order to reduce our maintenance burden. TraceQL metrics queries are in fact significantly more powerful that what you get with the metrics summary API since they can look at arbitrary time windows (not just the last hour), return time series information (rather than just a single instant value over the past hour), and can look at all spans (not just kind=server).

To provide an example, if you were to aggregate by resource.cloud.region with the metrics summary API, you could get the same results with a couple TraceQL queries:

{ } | rate() by (resource.cloud.region)

Rate of requests by resource.cloud.region

{ status=error} | rate() by (resource.cloud.region)

Error rate by resource.cloud.region

{ } | quantile_over_time(duration, .99, .9, .5) by (resource.cloud.region)

p99, p90, and p50 latency by resource.cloud.region

Or if you want the even easier approach than typing out these queries by hand, we suggest using Explore Traces which is a queryless experience for navigating your trace data stored in Tempo powered by TraceQL metrics queries under the hood.

MrMegaMango · 2024-11-07T15:22:57Z

sorry for hijacking, is there a problem with metrics summary feature? should I not use it, or is there a replacement

There is no problem with it. Its simply that this API is now redundant given that we have TraceQL metrics queries, and we'd like to deprecate it in order to reduce our maintenance burden. TraceQL metrics queries are in fact significantly more powerful that what you get with the metrics summary API since they can look at arbitrary time windows (not just the last hour), return time series information (rather than just a single instant value over the past hour), and can look at all spans (not just kind=server).

To provide an example, if you were to aggregate by resource.cloud.region with the metrics summary API, you could get the same results with a couple TraceQL queries:
{ } | rate() by (resource.cloud.region)
Rate of requests by resource.cloud.region
{ status=error} | rate() by (resource.cloud.region)
Error rate by resource.cloud.region
{ } | quantile_over_time(duration, .99, .9, .5) by (resource.cloud.region)
p99, p90, and p50 latency by resource.cloud.region

Or if you want the even easier approach than typing out these queries by hand, we suggest using Explore Traces which is a queryless experience for navigating your trace data stored in Tempo powered by TraceQL metrics queries under the hood.

Thanks for the reply. I am stuck on getting metric query to work for some time now.
no data for {} | rate()

this is my yaml file for latest tempo-distributed helm chart. Any pointer much appreciated!

{{ if .Values.tempo.enabled }}
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tempo-gcs-ksa
  namespace: monitoring
  annotations:
    iam.gke.io/gcp-service-account: tempo-gcs-serviceaccount@samaya-prod-403612.iam.gserviceaccount.com
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: tempo
  namespace: argocd 
spec:
  project: default
  source:
    repoURL: https://grafana.github.io/helm-charts
    chart: tempo-distributed
    targetRevision: {{ .Values.tempo.targetRevision }}
    helm:
      releaseName: tempo-distributed
      values: |
        service:
          type: ClusterIP
        serviceAccount:
          create: false
          name: tempo-gcs-ksa
        config: |
          storage:
            trace:
              backend: gcs 
              gcs:
                bucket_name: {{ .Values.tempo.bucketName }}
          querier:
            frontend_worker:
                frontend_address: tempo-distributed-query-frontend.monitoring.svc.cluster.local:9095
          server:
            http_listen_port: 3100
          distributor:
            ring:
              kvstore:
                store: memberlist
            receivers:
              otlp:
                protocols:
                  grpc: 
                  http:
          ingester:
            lifecycler:
              ring:
                replication_factor: 1
                kvstore:
                  store: memberlist
          metrics_generator:
            storage:
              path: /var/tempo/wal
            processor:
              span_metrics:
                dimensions: ["service", "span_name", "span_kind", "status_code", "status_message"]
                histogram_buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5]
              local_blocks:
                filter_server_spans: false
                flush_to_storage: true
          memberlist:
            join_members:
              - dns+tempo-distributed-gossip-ring:7946
          overrides:
            defaults:
              metrics_generator:
                processors:
                  - span-metrics
                  - service-graphs
                  - local-blocks
        traces:
          otlp:
            grpc: 
              enabled: true
            http: 
              enabled: true
        ingester:
          replicas: 8
          config:
            replication_factor: 1
          resources:
            requests:
              cpu: "0.1"
              memory: "1.5Gi"
            limits:
              cpu: "0.1"
              memory: "1.5Gi"
          # extraVolumeMounts:
          #   - name: tempo-wal
          #     mountPath: /var/tempo/wal
          # extraVolumes:
          #   - name: tempo-wal
          #     persistentVolumeClaim:
          #       claimName: tempo-wal
        distributor:
          resources:
            requests:
              cpu: "0.1"
              memory: "0.3Gi"
            limits:
              cpu: "0.2"
              memory: "0.4Gi"
        compactor:
          resources:
            requests:
              cpu: "0.3"
              memory: "1Gi"
            limits:
              cpu: "0.4"
              memory: "1Gi"
          config:
            compaction:
              compaction_window: 30m
              max_block_bytes: 573741824
        querier:
          resources:
            requests:
              cpu: "0.2"
              memory: "0.4Gi"
            limits:
              cpu: "0.3"
              memory: "0.5Gi"
        metricsGenerator:
          enabled: true
          podLabels:
            scrape: "true"
          # extraVolumeMounts:
          #   - name: tempo-wal
          #     mountPath: /var/tempo/wal
          # extraVolumes:
          #   - name: tempo-wal
          #     persistentVolumeClaim:
          #       claimName: tempo-wal
        global_overrides:
          defaults:
            metrics_generator:
              processors:
                - span-metrics
                - service-graphs
                - local-blocks
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: monitoring  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
# ---
# kind: PersistentVolumeClaim
# apiVersion: v1
# metadata:
#   name: tempo-wal
#   namespace: monitoring
# spec:
#   storageClassName: standard-rwx
#   accessModes:
#     - ReadWriteMany
#   resources:
#     requests:
#       storage: 10Gi
{{ end }}

09jvilla · 2024-11-07T22:14:19Z

Unfortunately, that's something I don't know how to help debug, and I think this probably isn't the right place for that. My suggestion is you try some of our community forums for help.

MrMegaMango · 2024-11-07T22:16:46Z

Unfortunately, that's something I don't know how to help debug, and I think this probably isn't the right place for that. My suggestion is you try some of our community forums for help.

Thanks for the response!
I got help. I need to set the trace storage path in config for it to work.

knylander-grafana · 2024-11-12T18:54:39Z

@MrMegaMango Thank you for asking the question!
@09jvilla Thanks for the great answer!

I've updated the docs here: #4316

Add deprecation for metrics summary

f33b927

knylander-grafana added the type/docs Improvements or additions to documentation label Oct 15, 2024

knylander-grafana self-assigned this Oct 15, 2024

knylander-grafana requested review from joe-elliott, annanay25, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham and stoewer as code owners October 15, 2024 19:17

joe-elliott reviewed Oct 15, 2024

View reviewed changes