Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add deprecation for metrics summary #4193

Merged

Conversation

knylander-grafana
Copy link
Contributor

@knylander-grafana knylander-grafana commented Oct 15, 2024

What this PR does:

Adds a deprecation note for metrics summary API. Aggregate by is already marked as deprecated in Grafana Cloud.

Part of https://github.com/grafana/website/pull/22148 and https://github.com/grafana/tempo-squad/issues/438

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@knylander-grafana knylander-grafana added the type/docs Improvements or additions to documentation label Oct 15, 2024
@knylander-grafana knylander-grafana self-assigned this Oct 15, 2024
Copy link
Contributor

@09jvilla 09jvilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@knylander-grafana knylander-grafana enabled auto-merge (squash) October 19, 2024 00:09
@knylander-grafana knylander-grafana enabled auto-merge (squash) October 21, 2024 18:20
@knylander-grafana knylander-grafana merged commit 760ce80 into grafana:main Oct 22, 2024
17 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 22, 2024
* Add deprecation for metrics summary

* Update docs/sources/tempo/api_docs/metrics-summary.md

* Update docs/sources/tempo/api_docs/metrics-summary.md

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <[email protected]>

---------

Co-authored-by: Jennifer Villa <[email protected]>
(cherry picked from commit 760ce80)
knylander-grafana added a commit that referenced this pull request Oct 29, 2024
* Add deprecation for metrics summary

* Update docs/sources/tempo/api_docs/metrics-summary.md

* Update docs/sources/tempo/api_docs/metrics-summary.md

* Apply suggestions from code review

Co-authored-by: Jennifer Villa <[email protected]>

---------

Co-authored-by: Jennifer Villa <[email protected]>
@MrMegaMango
Copy link

sorry for hijacking, is there a problem with metrics summary feature? should I not use it, or is there a replacement

@09jvilla
Copy link
Contributor

09jvilla commented Nov 7, 2024

sorry for hijacking, is there a problem with metrics summary feature? should I not use it, or is there a replacement

There is no problem with it. Its simply that this API is now redundant given that we have TraceQL metrics queries, and we'd like to deprecate it in order to reduce our maintenance burden. TraceQL metrics queries are in fact significantly more powerful that what you get with the metrics summary API since they can look at arbitrary time windows (not just the last hour), return time series information (rather than just a single instant value over the past hour), and can look at all spans (not just kind=server).

To provide an example, if you were to aggregate by resource.cloud.region with the metrics summary API, you could get the same results with a couple TraceQL queries:

{ } | rate() by (resource.cloud.region)

Rate of requests by resource.cloud.region

{ status=error} | rate() by (resource.cloud.region)

Error rate by resource.cloud.region

{ } | quantile_over_time(duration, .99, .9, .5) by (resource.cloud.region)

p99, p90, and p50 latency by resource.cloud.region

Or if you want the even easier approach than typing out these queries by hand, we suggest using Explore Traces which is a queryless experience for navigating your trace data stored in Tempo powered by TraceQL metrics queries under the hood.

@MrMegaMango
Copy link

MrMegaMango commented Nov 7, 2024

sorry for hijacking, is there a problem with metrics summary feature? should I not use it, or is there a replacement

There is no problem with it. Its simply that this API is now redundant given that we have TraceQL metrics queries, and we'd like to deprecate it in order to reduce our maintenance burden. TraceQL metrics queries are in fact significantly more powerful that what you get with the metrics summary API since they can look at arbitrary time windows (not just the last hour), return time series information (rather than just a single instant value over the past hour), and can look at all spans (not just kind=server).

To provide an example, if you were to aggregate by resource.cloud.region with the metrics summary API, you could get the same results with a couple TraceQL queries:

{ } | rate() by (resource.cloud.region)

Rate of requests by resource.cloud.region

{ status=error} | rate() by (resource.cloud.region)

Error rate by resource.cloud.region

{ } | quantile_over_time(duration, .99, .9, .5) by (resource.cloud.region)

p99, p90, and p50 latency by resource.cloud.region

Or if you want the even easier approach than typing out these queries by hand, we suggest using Explore Traces which is a queryless experience for navigating your trace data stored in Tempo powered by TraceQL metrics queries under the hood.

Thanks for the reply. I am stuck on getting metric query to work for some time now.
no data for {} | rate()
Screenshot 2024-11-07 at 15 20 44
Screenshot 2024-11-07 at 15 20 38

this is my yaml file for latest tempo-distributed helm chart. Any pointer much appreciated!

{{ if .Values.tempo.enabled }}
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tempo-gcs-ksa
  namespace: monitoring
  annotations:
    iam.gke.io/gcp-service-account: tempo-gcs-serviceaccount@samaya-prod-403612.iam.gserviceaccount.com
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: tempo
  namespace: argocd 
spec:
  project: default
  source:
    repoURL: https://grafana.github.io/helm-charts
    chart: tempo-distributed
    targetRevision: {{ .Values.tempo.targetRevision }}
    helm:
      releaseName: tempo-distributed
      values: |
        service:
          type: ClusterIP
        serviceAccount:
          create: false
          name: tempo-gcs-ksa
        config: |
          storage:
            trace:
              backend: gcs 
              gcs:
                bucket_name: {{ .Values.tempo.bucketName }}
          querier:
            frontend_worker:
                frontend_address: tempo-distributed-query-frontend.monitoring.svc.cluster.local:9095
          server:
            http_listen_port: 3100
          distributor:
            ring:
              kvstore:
                store: memberlist
            receivers:
              otlp:
                protocols:
                  grpc: 
                  http:
          ingester:
            lifecycler:
              ring:
                replication_factor: 1
                kvstore:
                  store: memberlist
          metrics_generator:
            storage:
              path: /var/tempo/wal
            processor:
              span_metrics:
                dimensions: ["service", "span_name", "span_kind", "status_code", "status_message"]
                histogram_buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5]
              local_blocks:
                filter_server_spans: false
                flush_to_storage: true
          memberlist:
            join_members:
              - dns+tempo-distributed-gossip-ring:7946
          overrides:
            defaults:
              metrics_generator:
                processors:
                  - span-metrics
                  - service-graphs
                  - local-blocks
        traces:
          otlp:
            grpc: 
              enabled: true
            http: 
              enabled: true
        ingester:
          replicas: 8
          config:
            replication_factor: 1
          resources:
            requests:
              cpu: "0.1"
              memory: "1.5Gi"
            limits:
              cpu: "0.1"
              memory: "1.5Gi"
          # extraVolumeMounts:
          #   - name: tempo-wal
          #     mountPath: /var/tempo/wal
          # extraVolumes:
          #   - name: tempo-wal
          #     persistentVolumeClaim:
          #       claimName: tempo-wal
        distributor:
          resources:
            requests:
              cpu: "0.1"
              memory: "0.3Gi"
            limits:
              cpu: "0.2"
              memory: "0.4Gi"
        compactor:
          resources:
            requests:
              cpu: "0.3"
              memory: "1Gi"
            limits:
              cpu: "0.4"
              memory: "1Gi"
          config:
            compaction:
              compaction_window: 30m
              max_block_bytes: 573741824
        querier:
          resources:
            requests:
              cpu: "0.2"
              memory: "0.4Gi"
            limits:
              cpu: "0.3"
              memory: "0.5Gi"
        metricsGenerator:
          enabled: true
          podLabels:
            scrape: "true"
          # extraVolumeMounts:
          #   - name: tempo-wal
          #     mountPath: /var/tempo/wal
          # extraVolumes:
          #   - name: tempo-wal
          #     persistentVolumeClaim:
          #       claimName: tempo-wal
        global_overrides:
          defaults:
            metrics_generator:
              processors:
                - span-metrics
                - service-graphs
                - local-blocks
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: monitoring  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
# ---
# kind: PersistentVolumeClaim
# apiVersion: v1
# metadata:
#   name: tempo-wal
#   namespace: monitoring
# spec:
#   storageClassName: standard-rwx
#   accessModes:
#     - ReadWriteMany
#   resources:
#     requests:
#       storage: 10Gi
{{ end }}

@09jvilla
Copy link
Contributor

09jvilla commented Nov 7, 2024

Unfortunately, that's something I don't know how to help debug, and I think this probably isn't the right place for that. My suggestion is you try some of our community forums for help.

@MrMegaMango
Copy link

MrMegaMango commented Nov 7, 2024

Unfortunately, that's something I don't know how to help debug, and I think this probably isn't the right place for that. My suggestion is you try some of our community forums for help.

Thanks for the response!
I got help. I need to set the trace storage path in config for it to work.

@knylander-grafana
Copy link
Contributor Author

@MrMegaMango Thank you for asking the question!
@09jvilla Thanks for the great answer!

I've updated the docs here: #4316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-v2.6 type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants