grafana · knylander-grafana · Jan 15, 2025 · Jan 13, 2025 · Jan 13, 2025 · Jan 13, 2025
@@ -19,13 +19,14 @@ The Tempo configuration options include:
   - [Use environment variables in the configuration](#use-environment-variables-in-the-configuration)
   - [Server](#server)
   - [Distributor](#distributor)
+    - [Set max attribute size to help control out of memory errors](#set-max-attribute-size-to-help-control-out-of-memory-errors)
   - [Ingester](#ingester)
   - [Metrics-generator](#metrics-generator)
   - [Query-frontend](#query-frontend)
     - [Limit query size to improve performance and stability](#limit-query-size-to-improve-performance-and-stability)
       - [Limit the spans per spanset](#limit-the-spans-per-spanset)
-    - [Cap the maximum query length](#cap-the-maximum-query-length)
-    - [Querier](#querier)
+      - [Cap the maximum query length](#cap-the-maximum-query-length)
+  - [Querier](#querier)
   - [Compactor](#compactor)
   - [Storage](#storage)
     - [Local storage recommendations](#local-storage-recommendations)
@@ -251,6 +252,19 @@ distributor:
             [stale_duration: <duration> | default = 15m0s]
 ```
 
+### Set max attribute size to help control out of memory errors
+
+Tempo queriers can run out of memory when fetching traces that have spans with very large attributes.
+This issue has been observed when trying to fetch a single trace using the [`tracebyID` endpoint](https://grafana.com/docs/tempo/latest/api_docs/#query).
+While a trace might not have a lot of spans (roughly 500), it can have a larger size (approximately 250KB).
+Some of the spans in that trace had attributes whose values were very large in size.
+
+To avoid these out-of-memory crashes, use `max_span_attr_byte` to limit the maximum allowable size of any individual attribute.
+Any key or values that exceed the configured limit are truncated before storing.
+The default value is `2048`.
+
+Use the `tempo_distributor_attributes_truncated_total` metric to track how many attributes are truncated.
+
 ## Ingester
 
 For more information on configuration options, refer to [this file](https://github.com/grafana/tempo/blob/main/modules/ingester/config.go).
@@ -315,7 +329,7 @@ If you want to enable metrics-generator for your Grafana Cloud account, refer to
 You can limit spans with end times that occur within a configured duration to be considered in metrics generation using `metrics_ingestion_time_range_slack`.
 In Grafana Cloud, this value defaults to 30 seconds so all spans sent to the metrics-generation more than 30 seconds in the past are discarded or rejected.
 
-For more information about the `local-blocks` configuration option, refer to [TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/#configure-the-local-blocks-processor).
+For more information about the `local-blocks` configuration option, refer to [TraceQL metrics](https://grafana.com/docs/tempo/<TEMPO_VERSION>/operations/traceql-metrics/#configure-the-local-blocks-processor).
 
 ```yaml
 # Metrics-generator configuration block
@@ -724,14 +738,14 @@ In a similar manner, excessive queries result size can also negatively impact qu
 #### Limit the spans per spanset
 
 You can set the maximum spans per spanset by setting `max_spans_per_span_set` for the query-frontend.
-The default value is 100. 
+The default value is 100.
 
 In Grafana or Grafana Cloud, you can use the **Span Limit** field in the [TraceQL query editor](https://grafana.com/docs/grafana-cloud/connect-externally-hosted/data-sources/tempo/query-editor/) in Grafana Explore.
 This field sets the maximum number of spans to return for each span set.
 The maximum value that you can set for the **Span Limit** value (or the spss query) is controlled by `max_spans_per_span_set`.
 To disable the maximum spans per span set limit, set `max_spans_per_span_set` to `0`.
-When set to `0`, there is no maximum and users can put any value in **Span Limit**. 
-However, this can only be set by a Tempo administrator, not by the user. 
+When set to `0`, there is no maximum and users can put any value in **Span Limit**.
+However, this can only be set by a Tempo administrator, not by the user.
 
 #### Cap the maximum query length
 

@@ -16,18 +16,22 @@ In addition, the [Tempo runbook](https://github.com/grafana/tempo/blob/main/oper
 
 ## Sending traces
 
-- [Spans are being refused with "pusher failed to consume trace data"](https://grafana.com/docs/tempo/<TEMMPO_VERSION>/troubleshooting/max-trace-limit-reached/)
-- [Is Grafana Alloy sending to the backend?](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/alloy/)
+- [Spans are being refused with "pusher failed to consume trace data"](https://grafana.com/docs/tempo/<TEMMPO_VERSION>/troubleshooting/send-traces/max-trace-limit-reached/)
+- [Is Grafana Alloy sending to the backend?](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/send-traces/alloy/)
 
 ## Querying
 
-- [Unable to find my traces in Tempo](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/unable-to-see-trace/)
-- [Error message "Too many jobs in the queue"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/too-many-jobs-in-queue/)
-- [Queries fail with 500 and "error using pageFinder"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/bad-blocks/)
-- [I can search traces, but there are no service name or span name values available](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/search-tag)
-- [Error message `response larger than the max (<number> vs <limit>)`](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/response-too-large/)
-- [Search results don't match trace lookup results with long-running traces](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/long-running-traces/)
+- [Unable to find my traces in Tempo](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/unable-to-see-trace/)
+- [Error message "Too many jobs in the queue"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/too-many-jobs-in-queue/)
+- [Queries fail with 500 and "error using pageFinder"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/bad-blocks/)
+- [I can search traces, but there are no service name or span name values available](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/search-tag)
+- [Error message `response larger than the max (<number> vs <limit>)`](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/response-too-large/)
+- [Search results don't match trace lookup results with long-running traces](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/long-running-traces/)
 
 ## Metrics-generator
 
 - [Metrics or service graphs seem incomplete](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/metrics-generator/)
+
+## Out-of-memory errors
+
+- [Set the max attribute size to help control out of memory errors](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/)
@@ -11,17 +11,17 @@ aliases:
 
 If you are concerned with data quality issues in the metrics-generator, we'd first recommend:
 
-- Reviewing your telemetry pipeline to determine the number of dropped spans. We are only looking for major issues here.
-- Reviewing the [service graph documentation]({{< relref "../metrics-generator/service_graphs" >}}) to understand how they are built.
+- Reviewing your telemetry pipeline to determine the number of dropped spans. You are only looking for major issues here.
+- Reviewing the [service graph documentation](https://grafana.com/docs/tempo/<TEMPO_VERSION>/metrics-generator/service_graphs/) to understand how they are built.
 
-If everything seems ok from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.
+If everything seems acceptable from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.
 
 ## All metrics
 
 ### Dropped spans in the distributor
 
-The distributor has a queue of outgoing spans to the metrics-generators. If that queue is full then the distributor
-will drop spans before they reach the generator. Use the following metric to determine if that is happening:
+The distributor has a queue of outgoing spans to the metrics-generators.
+If the queue is full, then the distributor drops spans before they reach the generator. Use the following metric to determine if that's happening:
 
 ```
 sum(rate(tempo_distributor_queue_pushes_failures_total{}[1m]))

@@ -0,0 +1,29 @@
+---
+title: Troubleshoot out-of-memory errors
+menuTitle: Out-of-memory errors
+description: Gain an understanding of how to debug out-of-memory (OOM) errors.
+weight: 600
+---
+
+# Troubleshoot out-of-memory errors
+
+Learn about out-of-memory (OOM) errors and how to troubleshoot them.
+
+## Set the max attribute size to help control out of memory errors
+
+Tempo queriers can run out of memory when fetching traces that have spans with very large attributes.
+This issue has been observed when trying to fetch a single trace using the [`tracebyID` endpoint](https://grafana.com/docs/tempo/latest/api_docs/#query).
+
+To avoid these out-of-memory crashes, use `max_span_attr_byte` to limit the maximum allowable size of any individual attribute.
+Any key or values that exceed the configured limit are truncated before storing.
+
+Use the `tempo_distributor_attributes_truncated_total` metric to track how many attributes are truncated.
+
+```yaml
+   # Optional
+    # Configures the max size an attribute can be. Any key or value that exceeds this limit will be truncated before storing
+    # Setting this parameter to '0' would disable this check against attribute size
+    [max_span_attr_byte: <int> | default = '2048']
+```
+
+Refer to the [configuration for distributors](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#distributor) documentation for more information.
@@ -0,0 +1,12 @@
+---
+title: Issues with querying
+menuTitle: Querying
+description: Troubleshoot issues related to querying.
+weight: 300
+---
+
+# Issues with querying
+
+Learn about issues related to querying.
+
+{{< section withDescriptions="true">}}
@@ -3,7 +3,8 @@ title: Bad blocks
 description: Troubleshoot queries failing with an error message indicating bad blocks.
 weight: 475
 aliases:
-- ../operations/troubleshooting/bad-blocks/
+- ../../operations/troubleshooting/bad-blocks/
+- ../bad-blocks/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/bad-blocks/
 ---
 
 # Bad blocks
@@ -26,7 +27,7 @@ To fix such a block, first download it onto a machine where you can run the `tem
 
 Next run the `tempo-cli`'s `gen index` / `gen bloom` commands depending on which file is corrupt/deleted.
 The command will create a fresh index/bloom-filter from the data file at the required location (in the block folder).
-To view all of the options for this command, see the [cli docs]({{< relref "../operations/tempo_cli" >}}).
+To view all of the options for this command, see the [CLI docs](https://grafana.com/docs/tempo/<TEMPO_VERSION>/operations/tempo_cli/).
 
 Finally, upload the generated index or bloom-filter onto the object store backend under the folder for the block.
 

@@ -3,7 +3,8 @@ title: Long-running traces
 description: Troubleshoot search results when using long-running traces
 weight: 479
 aliases:
-  - ../operations/troubleshooting/long-running-traces/
+  - ../../operations/troubleshooting/long-running-traces/
+  - ../long-running-traces/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/long-running-traces/
 ---
 
 # Long-running traces

@@ -4,11 +4,12 @@ description: Troubleshoot response larger than the max error message
 weight: 477
 aliases:
 - ../operations/troubleshooting/response-too-large/
+- ../response-too-large/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/response-too-large/
 ---
 
 # Response larger than the max
 
-The error message will take a similar form to the following:
+The error message is similar to the following:
 
 ```
 500 Internal Server Error Body: response larger than the max (<size> vs <limit>)

@@ -3,7 +3,8 @@ title: Tag search
 description: Troubleshoot No options found in Grafana tag search
 weight: 476
 aliases:
-- ../operations/troubleshooting/search-tag/
+- ../../operations/troubleshooting/search-tag/
+- ../search-tag/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/search-tag/
 ---
 
 # Tag search
@@ -25,4 +26,4 @@ when a query exceeds the configured value.
 There are two main solutions to this issue:
 
 * Reduce the cardinality of tags pushed to Tempo. Reducing the number of unique tag values will reduce the size returned by a tag search query.
-* Increase the `max_bytes_per_tag_values_query` parameter in the [overrides]({{< relref "../configuration#overrides" >}}) block of your Tempo configuration to a value as high as 50MB.
+* Increase the `max_bytes_per_tag_values_query` parameter in the [overrides](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#overrides) block of your Tempo configuration to a value as high as 50MB.
@@ -4,6 +4,7 @@ description: Troubleshoot too many jobs in the queue
 weight: 474
 aliases:
 - ../operations/troubleshooting/too-many-jobs-in-queue/
+- ../too-many-jobs-in-queue/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/too-many-jobs-in-queue/
 ---
 
 # Too many jobs in the queue
@@ -18,28 +19,32 @@ Possible reasons why the compactor may not be running are:
 - Insufficient permissions.
 - Compactor sitting idle because no block is hashing to it.
 - Incorrect configuration settings.
-## Diagnosing the issue
+
+## Diagnose the issue
+
 - Check metric `tempodb_compaction_bytes_written_total`
 If this is greater than zero (0), it means the compactor is running and writing to the backend.
 - Check metric `tempodb_compaction_errors_total`
 If this metric is greater than zero (0), check the logs of the compactor for an error message.
 
 ## Solutions
+
 - Verify that the Compactor has the LIST, GET, PUT, and DELETE permissions on the bucket objects.
   - If these permissions are missing, assign them to the compactor container.
-  - For detailed information, check - https://grafana.com/docs/tempo/latest/configuration/s3/#permissions
+  - For detailed information, refer to the [Amazon S3 permissions](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/hosted-storage/s3/#permissions).
 - If there’s a compactor sitting idle while others are running, port-forward to the compactor’s http endpoint. Then go to `/compactor/ring` and click **Forget** on the inactive compactor.
 - Check the following configuration parameters to ensure that there are correct settings:
   - `max_block_bytes` to determine when the ingester cuts blocks. A good number is anywhere from 100MB to 2GB depending on the workload.
   - `max_compaction_objects` to determine the max number of objects in a compacted block. This should relatively high, generally in the millions.
   - `retention_duration` for how long traces should be retained in the backend.
-- Check the storage section of the config and increase `queue_depth`. Do bear in mind that a deeper queue could mean longer
+- Check the storage section of the configuration and increase `queue_depth`. Do bear in mind that a deeper queue could mean longer
   waiting times for query responses. Adjust `max_workers` accordingly, which configures the number of parallel workers
   that query backend blocks.
-```
+
+```yaml
 storage:
   trace:
     pool:
-      max_workers: 100                 # worker pool determines the number of parallel requests to the object store backend
+      max_workers: 100   # worker pool determines the number of parallel requests to the object store backend
       queue_depth: 10000
 ```
@@ -3,15 +3,16 @@ title: Unable to find traces
 description: Troubleshoot missing traces
 weight: 473
 aliases:
-- ../operations/troubleshooting/missing-trace/
-- ../operations/troubleshooting/unable-to-see-trace/
+- ../../operations/troubleshooting/missing-trace/
+- ../../operations/troubleshooting/unable-to-see-trace/
+- ../unable-to-see-trace/ # htt/docs/tempo/<TEMPO_VERSION>/troubleshooting/unable-to-see-trace/
 ---
 
 # Unable to find traces
 
 The two main causes of missing traces are:
 
-- Issues in ingestion of the data into Tempo. Spans are either not being sent correctly to Tempo or they are not getting sampled.
+- Issues in ingestion of the data into Tempo. Spans are either not sent correctly to Tempo or they aren't getting sampled.
 - Issues querying for traces that have been received by Tempo.
 
 ## Section 1: Diagnose and fix ingestion issues
@@ -106,8 +107,8 @@ If the pipeline isn't reporting any dropped spans, check whether application spa
 - If you require a higher ingest volume, increase the configuration for the rate limiting by adjusting the `max_traces_per_user` property in the [configured override limits](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#standard-overrides).
 
 {{< admonition type="note" >}}
-Check the [ingestion limits page]({{< relref "../configuration#ingestion-limits" >}}) for further information on limits.
-{{% /admonition %}}
+Check the [ingestion limits page](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#overrides) for further information on limits.
+{{< /admonition >}}
 
 ## Section 3: Diagnose and fix issues with querying traces
 

@@ -0,0 +1,12 @@
+---
+title: Issues with sending traces
+menuTitle: Sending traces
+description: Troubleshoot issues related to sending traces.
+weight: 200
+---
+
+# Issues with sending traces
+
+Learn about issues related to sending traces.
+
+{{< section withDescriptions="true">}}
@@ -5,7 +5,8 @@ description: Gain visibility on how many traces are being pushed to Grafana Allo
 weight: 472
 aliases:
 - ../operations/troubleshooting/agent/
-- ./agent.md # /docs/tempo/<TEMPO_VERSION>/troubleshooting/agent.md
+- ../agent.md # /docs/tempo/<TEMPO_VERSION>/troubleshooting/agent.md
+- ../alloy/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/alloy/
 ---
 
 # Troubleshoot Grafana Alloy
@@ -22,21 +23,21 @@ If your logs are showing no obvious errors, one of the following suggestions may
 Alloy publishes a few Prometheus metrics that are useful to determine how much trace traffic it receives and successfully forwards.
 These metrics are a good place to start when diagnosing tracing Alloy issues.
 
-From the [`otelcol.receiver.otlp`](https://grafana.com/docs/alloy/<ALLOY_LATEST>/reference/components/otelcol/otelcol.receiver.otlp/) component:
+From the [`otelcol.receiver.otlp`](https://grafana.com/docs/alloy/<ALLOY_VERSION>/reference/components/otelcol/otelcol.receiver.otlp/) component:
 ```
 receiver_accepted_spans_ratio_total
 receiver_refused_spans_ratio_total
 ```
 
-From the [`otelcol.exporter.otlp`](https://grafana.com/docs/alloy/<ALLOY_LATEST>/reference/components/otelcol/otelcol.exporter.otlp/) component:
+From the [`otelcol.exporter.otlp`](https://grafana.com/docs/alloy/<ALLOY_VERSION>/reference/components/otelcol/otelcol.exporter.otlp/) component:
 ```
 exporter_sent_spans_ratio_total
 exporter_send_failed_spans_ratio_total
 ```
 
 Alloy has a Prometheus scrape endpoint, `/metrics`, that you can use to check metrics locally by opening a browser to `http://localhost:12345/metrics`.
 The `/metrics` HTTP endpoint of the Alloy HTTP server exposes the Alloy component and controller metrics.
-Refer to the [Monitor the Grafana Alloy component controller](https://grafana.com/docs/alloy/latest/troubleshoot/controller_metrics/) documentation for more information.
+Refer to the [Monitor the Grafana Alloy component controller](https://grafana.com/docs/alloy/<ALLOY_VERSION>/troubleshoot/controller_metrics/) documentation for more information.
 
 ### Check metrics in Grafana Cloud