Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for max_span_attr_byte and restructure troubleshoot doc #4551

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix aliases
  • Loading branch information
knylander-grafana committed Jan 13, 2025
commit f7a0dfc6351c647034b6ea4e5f479597a7e07980
1 change: 0 additions & 1 deletion docs/sources/tempo/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,6 @@ To avoid these out-of-memory crashes, use `max_span_attr_byte` to limit the maxi
Any key or values that exceed the configured limit are truncated before storing.
The default value is `2048`.


## Ingester

For more information on configuration options, refer to [this file](https://github.com/grafana/tempo/blob/main/modules/ingester/config.go).
Expand Down
16 changes: 10 additions & 6 deletions docs/sources/tempo/troubleshooting/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,17 @@ In addition, the [Tempo runbook](https://github.com/grafana/tempo/blob/main/oper

## Querying

- [Unable to find my traces in Tempo](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/unable-to-see-trace/)
- [Error message "Too many jobs in the queue"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/too-many-jobs-in-queue/)
- [Queries fail with 500 and "error using pageFinder"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/bad-blocks/)
- [I can search traces, but there are no service name or span name values available](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/search-tag)
- [Error message `response larger than the max (<number> vs <limit>)`](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/response-too-large/)
- [Search results don't match trace lookup results with long-running traces](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/long-running-traces/)
- [Unable to find my traces in Tempo](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/unable-to-see-trace/)
- [Error message "Too many jobs in the queue"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/too-many-jobs-in-queue/)
- [Queries fail with 500 and "error using pageFinder"](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/bad-blocks/)
- [I can search traces, but there are no service name or span name values available](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/search-tag)
- [Error message `response larger than the max (<number> vs <limit>)`](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/response-too-large/)
- [Search results don't match trace lookup results with long-running traces](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/querying/long-running-traces/)

## Metrics-generator

- [Metrics or service graphs seem incomplete](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/metrics-generator/)

## Out-of-memory errors

- [Set the max attribute size to help control out of memory errors](https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/)
10 changes: 5 additions & 5 deletions docs/sources/tempo/troubleshooting/metrics-generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,17 @@ aliases:

If you are concerned with data quality issues in the metrics-generator, we'd first recommend:

- Reviewing your telemetry pipeline to determine the number of dropped spans. We are only looking for major issues here.
- Reviewing the [service graph documentation]({{< relref "../metrics-generator/service_graphs" >}}) to understand how they are built.
- Reviewing your telemetry pipeline to determine the number of dropped spans. You are only looking for major issues here.
- Reviewing the [service graph documentation](https://grafana.com/docs/tempo/<TEMPO_VERSION>/metrics-generator/service_graphs/) to understand how they are built.

If everything seems ok from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.
If everything seems acceptable from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.

## All metrics

### Dropped spans in the distributor

The distributor has a queue of outgoing spans to the metrics-generators. If that queue is full then the distributor
will drop spans before they reach the generator. Use the following metric to determine if that is happening:
The distributor has a queue of outgoing spans to the metrics-generators.
If the queue is full, then the distributor drops spans before they reach the generator. Use the following metric to determine if that's happening:

```
sum(rate(tempo_distributor_queue_pushes_failures_total{}[1m]))
Expand Down
12 changes: 12 additions & 0 deletions docs/sources/tempo/troubleshooting/querying/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: Issues with querying
menuTitle: Querying
description: Troubleshoot issues related to querying.
weight: 300
---

# Issues with querying

Learn about issues related to querying.

{{< section withDescriptions="true">}}
4 changes: 2 additions & 2 deletions docs/sources/tempo/troubleshooting/querying/bad-blocks.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Troubleshoot queries failing with an error message indicating bad b
weight: 475
aliases:
- ../../operations/troubleshooting/bad-blocks/
- ../troubleshooting/bad-blocks/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/bad-blocks/
- ../bad-blocks/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/bad-blocks/
---

# Bad blocks
Expand All @@ -27,7 +27,7 @@ To fix such a block, first download it onto a machine where you can run the `tem

Next run the `tempo-cli`'s `gen index` / `gen bloom` commands depending on which file is corrupt/deleted.
The command will create a fresh index/bloom-filter from the data file at the required location (in the block folder).
To view all of the options for this command, see the [cli docs]({{< relref "../operations/tempo_cli" >}}).
To view all of the options for this command, see the [CLI docs](https://grafana.com/docs/tempo/<TEMPO_VERSION>/operations/tempo_cli/).

Finally, upload the generated index or bloom-filter onto the object store backend under the folder for the block.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Troubleshoot search results when using long-running traces
weight: 479
aliases:
- ../../operations/troubleshooting/long-running-traces/
- ../troubleshooting/long-running-traces/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/long-running-traces/
- ../long-running-traces/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/long-running-traces/
---

# Long-running traces
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Troubleshoot response larger than the max error message
weight: 477
aliases:
- ../operations/troubleshooting/response-too-large/
- ../troubleshooting/response-too-large/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/response-too-large/
- ../response-too-large/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/response-too-large/
---

# Response larger than the max
Expand Down
4 changes: 2 additions & 2 deletions docs/sources/tempo/troubleshooting/querying/search-tag.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Troubleshoot No options found in Grafana tag search
weight: 476
aliases:
- ../../operations/troubleshooting/search-tag/
- ../troubleshooting/search-tag/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/search-tag/
- ../search-tag/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/search-tag/
---

# Tag search
Expand All @@ -26,4 +26,4 @@ when a query exceeds the configured value.
There are two main solutions to this issue:

* Reduce the cardinality of tags pushed to Tempo. Reducing the number of unique tag values will reduce the size returned by a tag search query.
* Increase the `max_bytes_per_tag_values_query` parameter in the [overrides]({{< relref "../configuration#overrides" >}}) block of your Tempo configuration to a value as high as 50MB.
* Increase the `max_bytes_per_tag_values_query` parameter in the [overrides](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#overrides) block of your Tempo configuration to a value as high as 50MB.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Troubleshoot too many jobs in the queue
weight: 474
aliases:
- ../operations/troubleshooting/too-many-jobs-in-queue/
- ../troubleshooting/too-many-jobs-in-queue/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/too-many-jobs-in-queue/
- ../too-many-jobs-in-queue/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/too-many-jobs-in-queue/
---

# Too many jobs in the queue
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ weight: 473
aliases:
- ../../operations/troubleshooting/missing-trace/
- ../../operations/troubleshooting/unable-to-see-trace/
- ../troubleshooting/unable-to-see-trace/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/unable-to-see-trace/
- ../unable-to-see-trace/ # htt/docs/tempo/<TEMPO_VERSION>/troubleshooting/unable-to-see-trace/
---

# Unable to find traces
Expand Down Expand Up @@ -107,8 +107,8 @@ If the pipeline isn't reporting any dropped spans, check whether application spa
- If you require a higher ingest volume, increase the configuration for the rate limiting by adjusting the `max_traces_per_user` property in the [configured override limits](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#standard-overrides).

{{< admonition type="note" >}}
Check the [ingestion limits page]({{< relref "../configuration#ingestion-limits" >}}) for further information on limits.
{{% /admonition %}}
Check the [ingestion limits page](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#overrides) for further information on limits.
{{< /admonition >}}

## Section 3: Diagnose and fix issues with querying traces

Expand Down
12 changes: 12 additions & 0 deletions docs/sources/tempo/troubleshooting/send-traces/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: Issues with sending traces
menuTitle: Sending traces
description: Troubleshoot issues related to sending traces.
weight: 200
---

# Issues with sending traces

Learn about issues related to sending traces.

{{< section withDescriptions="true">}}
2 changes: 1 addition & 1 deletion docs/sources/tempo/troubleshooting/send-traces/alloy.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ weight: 472
aliases:
- ../operations/troubleshooting/agent/
- ../agent.md # /docs/tempo/<TEMPO_VERSION>/troubleshooting/agent.md
- ../troubleshooting/alloy/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/alloy/
- ../alloy/ # https://grafana.com/docs/tempo/<TEMPO_VERSION>/troubleshooting/alloy/
---

# Troubleshoot Grafana Alloy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ aliases:
The two most likely causes of refused spans are unhealthy ingesters or trace limits being exceeded.

To log spans that are discarded, add the `--distributor.log_discarded_spans.enabled` flag to the distributor or
adjust the [distributor config]({{< relref "../configuration#distributor" >}}):
adjust the [distributor configuration](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#distributor):

```yaml
distributor:
Expand All @@ -37,28 +37,29 @@ If you have unhealthy ingesters, your log line will look something like this:
msg="pusher failed to consume trace data" err="at least 2 live replicas required, could only find 1"
```

In this case, you may need to visit the ingester [ring page]({{< relref "../operations/consistent_hash_ring" >}}) at `/ingester/ring` on the Distributors
and "Forget" the unhealthy ingesters. This will work in the short term, but the long term fix is to stabilize your ingesters.
In this case, you may need to visit the ingester [ring page](https://grafana.com/docs/tempo/<TEMPO_VERSION>/operations/consistent_hash_ring/) at `/ingester/ring` on the Distributors
and "Forget" the unhealthy ingesters.
This works in the short term, but the long term fix is to stabilize your ingesters.

## Trace limits reached

In high volume tracing environments, the default trace limits are sometimes not sufficient.
These limits exist to protect Tempo and prevent it from OOMing, crashing or otherwise allow tenants to not DOS each other.
If you are refusing spans due to limits, you will see logs like this at the distributor:
These limits exist to protect Tempo and prevent it from OOMing, crashing, or otherwise allow tenants to not DOS each other.
If you are refusing spans due to limits, you'll see logs like this at the distributor:

```
msg="pusher failed to consume trace data" err="rpc error: code = FailedPrecondition desc = TRACE_TOO_LARGE: max size of trace (52428800) exceeded while adding 15632 bytes to trace a0fbd6f9ac5e2077d90a19551dd67b6f for tenant single-tenant"
msg="pusher failed to consume trace data" err="rpc error: code = FailedPrecondition desc = LIVE_TRACES_EXCEEDED: max live traces per tenant exceeded: per-user traces limit (local: 60000 global: 0 actual local: 60000) exceeded"
msg="pusher failed to consume trace data" err="rpc error: code = ResourceExhausted desc = RATE_LIMITED: ingestion rate limit (15000000 bytes) exceeded while adding 10 bytes"
```

You will also see the following metric incremented. The `reason` label on this metric will contain information about the refused reason.
You'll also see the following metric incremented. The `reason` label on this metric will contain information about the refused reason.

```
tempo_discarded_spans_total
```

In this case, use available configuration options to [increase limits]({{< relref "../configuration#ingestion-limits" >}}).
In this case, use available configuration options to [increase limits](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#ingestion-limits).

## Client resets connection

Expand Down
Loading