diff --git a/docs/sources/tempo/api_docs/_index.md b/docs/sources/tempo/api_docs/_index.md index ca2eae7f574..ec733b3f6da 100644 --- a/docs/sources/tempo/api_docs/_index.md +++ b/docs/sources/tempo/api_docs/_index.md @@ -595,7 +595,7 @@ If provided, the tag values returned by the API are filtered to only return valu Queries can be incomplete: for example, `{ resource.cluster = }`. Tempo extracts only the valid matchers and builds a valid query. -If an input is invalid, Tempo doesn't provide an error. Instead, +If an input is invalid, Tempo doesn't provide an error. Instead, you'll see the whole list when a failure of parsing input. This behavior helps with backwards compatibility. Only queries with a single selector `{}` and AND `&&` operators are supported. @@ -670,7 +670,7 @@ For example the following request computes the total number of failed spans over {{< admonition type="note" >}} Actual API parameters must be url-encoded. This example is left unencoded for readability. -{{% /admonition %}} +{{< /admonition >}} ``` GET /api/metrics/query?q={status=error}|count_over_time()by(resource.service.name) @@ -686,7 +686,7 @@ Returns status code 200 and body `echo` when the query frontend is up and ready {{< admonition type="note" >}} Meant to be used in a Query Visualization UI like Grafana to test that the Tempo data source is working. -{{% /admonition %}} +{{< /admonition >}} ### Overrides API @@ -717,13 +717,13 @@ ingester service. {{< admonition type="note" >}} This is usually used at the time of scaling down a cluster. -{{% /admonition %}} +{{< /admonition >}} ### Usage metrics {{< admonition type="note" >}} This endpoint is only available when one or more usage trackers are enabled in [the distributor]({{< relref "../configuration#distributor" >}}). -{{% /admonition %}} +{{< /admonition >}} ``` GET /usage_metrics @@ -747,7 +747,7 @@ tempo_usage_tracker_bytes_received_total{service="service-A",tenant="single-tena {{< admonition type="note" >}} This endpoint is only available when Tempo is configured with [the global override strategy]({{< relref "../configuration#overrides" >}}). -{{% /admonition %}} +{{< /admonition >}} ``` GET /distributor/ring diff --git a/docs/sources/tempo/api_docs/metrics-summary.md b/docs/sources/tempo/api_docs/metrics-summary.md index 4fac045d58c..f3fd8ed2553 100644 --- a/docs/sources/tempo/api_docs/metrics-summary.md +++ b/docs/sources/tempo/api_docs/metrics-summary.md @@ -12,7 +12,7 @@ weight: 600 {{< admonition type="warning" >}} The metrics summary API is deprecated as of Tempo 2.7. Features powered by the metrics summary API, like the [Aggregate by table](https://grafana.com/docs/grafana//datasources/tempo/query-editor/traceql-search/#optional-use-aggregate-by), are also deprecated in Grafana Cloud and Grafana 11.3 and later. It will be removed in a future release. -{{% /admonition %}} +{{< /admonition >}} This document explains how to use the metrics summary API in Tempo. This API returns RED metrics (span count, erroring span count, and latency information) for `kind=server` spans sent to Tempo in the last hour, grouped by a user-specified attribute. @@ -122,7 +122,7 @@ The response is returned as JSON following [standard protobuf->JSON mapping rule {{< admonition type="note" >}} The `uint64` fields cannot be fully expressed by JSON numeric values so the fields are serialized as strings. -{{% /admonition %}} +{{< /admonition >}} Example: diff --git a/docs/sources/tempo/configuration/network/ipv6.md b/docs/sources/tempo/configuration/network/ipv6.md index 9a9b951ce1a..3caca765cc3 100644 --- a/docs/sources/tempo/configuration/network/ipv6.md +++ b/docs/sources/tempo/configuration/network/ipv6.md @@ -12,7 +12,7 @@ Tempo can be configured to communicate between the components using Internet Pro {{< admonition type="note" >}} The underlying infrastructure must support this address family. This configuration may be used in a single-stack IPv6 environment, or in a dual-stack environment where both IPv6 and IPv4 are present. In a dual-stack scenario, only one address family may be configured at a time, and all components must be configured for that address family. -{{% /admonition %}} +{{< /admonition >}} ## Protocol configuration diff --git a/docs/sources/tempo/configuration/network/tls.md b/docs/sources/tempo/configuration/network/tls.md index fc19048dcbf..ec031ce05a5 100644 --- a/docs/sources/tempo/configuration/network/tls.md +++ b/docs/sources/tempo/configuration/network/tls.md @@ -12,7 +12,7 @@ Tempo can be configured to communicate between the components using Transport La {{< admonition type="note" >}} The ciphers and TLS version here are for example purposes only. We are not recommending which ciphers or TLS versions for use in production environments. -{{% /admonition %}} +{{< /admonition >}} ## Server configuration diff --git a/docs/sources/tempo/configuration/use-trace-data.md b/docs/sources/tempo/configuration/use-trace-data.md index 0f3a6c3eff5..a5b9746d4af 100644 --- a/docs/sources/tempo/configuration/use-trace-data.md +++ b/docs/sources/tempo/configuration/use-trace-data.md @@ -16,7 +16,7 @@ If you are using Grafana on-prem, you need to [set up the Tempo data source](/do {{< admonition type="tip" >}} If you want to explore tracing data in Grafana, try the [Intro to Metrics, Logs, Traces, and Profiling example]({{< relref "../getting-started/docker-example" >}}). -{{% /admonition %}} +{{< /admonition >}} This video explains how to add data sources, including Loki, Tempo, and Mimir, to Grafana and Grafana Cloud. Tempo data source set up starts at 4:58 in the video. diff --git a/docs/sources/tempo/getting-started/_index.md b/docs/sources/tempo/getting-started/_index.md index 88bcb78aa9f..fd3b9a2a875 100644 --- a/docs/sources/tempo/getting-started/_index.md +++ b/docs/sources/tempo/getting-started/_index.md @@ -32,7 +32,7 @@ create and offload spans. {{< admonition type="note" >}} To learn more about instrumentation, read the [Instrument for tracing]({{< relref "./instrumentation" >}}) documentation to learn how to instrument your favorite language for distributed tracing. -{{% /admonition %}} +{{< /admonition >}} ## Pipeline (Grafana Alloy) @@ -54,7 +54,7 @@ refer to [Grafana Alloy configuration for tracing]({{< relref "../configuration/ The [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) / [Jaeger Agent](https://www.jaegertracing.io/docs/latest/deployment/) can also be used at the agent layer. Refer to [this blog post](/blog/2021/04/13/how-to-send-traces-to-grafana-clouds-tempo-service-with-opentelemetry-collector/) to see how the OpenTelemetry Collector can be used with Tempo. -{{% /admonition %}} +{{< /admonition >}} ## Backend (Tempo) @@ -72,7 +72,7 @@ Tempo offers different deployment options, depending upon your needs. Refer to t {{< admonition type="note" >}} Grafana Alloy is already set up to use Tempo. Refer to [Grafana Alloy configuration for tracing](https://grafana.com/docs/tempo//configuration/grafana-alloy). -{{% /admonition %}} +{{< /admonition >}} ## Visualization (Grafana) diff --git a/docs/sources/tempo/getting-started/instrumentation.md b/docs/sources/tempo/getting-started/instrumentation.md index 6303b2e6a93..85ab7efb514 100644 --- a/docs/sources/tempo/getting-started/instrumentation.md +++ b/docs/sources/tempo/getting-started/instrumentation.md @@ -60,7 +60,7 @@ information from a client application with minimal manual instrumentation of the {{< admonition type="note" >}} Jaeger client libraries have been deprecated. For more information, refer to the [Deprecating Jaeger clients article](https://www.jaegertracing.io/docs/1.50/client-libraries/#deprecating-jaeger-clients). Jaeger now recommends using OpenTelemetry SDKs. -{{% /admonition %}} +{{< /admonition >}} - [Jaeger Language Specific Instrumentation](https://www.jaegertracing.io/docs/latest/client-libraries/) diff --git a/docs/sources/tempo/getting-started/metrics-from-traces.md b/docs/sources/tempo/getting-started/metrics-from-traces.md index ee918193175..de616999a25 100644 --- a/docs/sources/tempo/getting-started/metrics-from-traces.md +++ b/docs/sources/tempo/getting-started/metrics-from-traces.md @@ -26,7 +26,7 @@ Span metrics are of particular interest if your system is not monitored with met {{< admonition type="note" >}} Metrics generation is disabled by default. Contact Grafana Support to enable metrics generation in your organization. -{{% /admonition %}} +{{< /admonition >}} After the metrics-generator is enabled in your organization, refer to [Metrics-generator configuration]({{< relref "../configuration" >}}) for information about metrics-generator options. diff --git a/docs/sources/tempo/getting-started/tempo-in-grafana.md b/docs/sources/tempo/getting-started/tempo-in-grafana.md index 14df9afb2a1..c0bc4d47425 100644 --- a/docs/sources/tempo/getting-started/tempo-in-grafana.md +++ b/docs/sources/tempo/getting-started/tempo-in-grafana.md @@ -70,7 +70,7 @@ The JSON data can be downloaded via the Tempo API or the [Inspector panel](/docs {{< admonition type="note" >}} To perform this action on Grafana 10.1 or later, select a Tempo data source, select **Explore** from the main menu, and then select **Import trace**. -{{% /admonition %}} +{{< /admonition >}} ## Link tracing data with profiles diff --git a/docs/sources/tempo/introduction/_index.md b/docs/sources/tempo/introduction/_index.md index f64ef0cf5e5..324ee9d70be 100644 --- a/docs/sources/tempo/introduction/_index.md +++ b/docs/sources/tempo/introduction/_index.md @@ -14,46 +14,54 @@ weight: 120 # Introduction A trace represents the whole journey of a request or an action as it moves through all the nodes of a distributed system, especially containerized applications or microservices architectures. -This makes them the ideal observability signal for discovering bottlenecks and interconnection issues. +Traces are the ideal observability signal for discovering bottlenecks and interconnection issues. Traces are composed of one or more spans. -A span is a unit of work within a trace that has a start time relative to the beginning of the trace, a duration and an operation name for the unit of work. -It usually has a reference to a parent span (unless it's the first span, the root span, in a trace). +A span is a unit of work within a trace that has a start time relative to the beginning of the trace, a duration, and an operation name for the unit of work. +It usually has a reference to a parent span, unless it's the first, or root, span in a trace. It frequently includes key/value attributes that are relevant to the span itself, for example the HTTP method used in the request, as well as other metadata such as the service name, sub-span events, or links to other spans. -By definition, traces are never complete. You can always push a new batch of spans, even if days have passed since the last one. +By definition, traces are never complete. +You can always push another batch of spans, even if days have passed since the last one. When receiving a query requesting a stored trace, tracing backends like Tempo find all the spans for that specific trace and collate them into a returned result. -For that reason, issues can arise on retrieval of the trace data if traces are extremely large. +Retrieving trace data can have issues if traces are extremely large. {{< youtube id="ZirbR0ZJIOs" >}} ## Example of traces -Firstly, a user on your website enters their email address into a form to sign up for your mailing list. They click **Enter**. This initial transaction has a trace ID that's subsequently associated with every interaction in the chain of processes within the system. +Firstly, a user on your website enters their email address into a form to sign up for your mailing list. +They click **Enter**. This initial transaction has a trace ID that's subsequently associated with every interaction in the chain of processes within the system. Next, the user's email address is data that flows through your system. -In a cloud computing world, it's possible that clicking that one button triggers many downstream processes on various microservices operating across many different nodes in your compute infrastructure. +In a cloud computing world, it's possible that clicking that one button triggers many downstream processes on various microservices operating across many different nodes in your compute infrastructure. -As a result, the email address might be sent to a microservice responsible for verification. If the email passes this check, it is then stored in a database. +As a result, the email address goes to a microservice responsible for verification. If the email passes this check, then the database stores the address. -Along the way, an anonymization microservice strips personally identifying data from the address and adds additional metadata before sending it along to a marketing qualifying microservice which determines whether the request was sent from a targeted part of the internet. +Along the way, an anonymizing microservice strips personally identifying data from the address and adds additional metadata before sending it along to a marketing qualifying microservice. +This microservice determines whether the request came from a targeted part of the internet. -Services respond and data flows back from each, sometimes triggering new events across the system. Along the way, logs are written to the nodes on which those services run with a time stamp showing when the info passed through. +Services respond and data flows back from each, sometimes triggering additional events across the system. +Along the way, nodes write logs on which those services run with a time stamp showing when the info passed through. -Finally, the request and response activity ends. No other spans are added to that TraceID. +Finally, the request and response activity end. +No other spans append to that trace ID. ## Traces and trace IDs Setting up tracing adds an identifier, or trace ID, to all of these events. -The trace ID is generated when the request is initiated and that same trace ID is applied to every single span as the request and response generate activity across the system. +The trace ID generates when the request initiates. +That same trace ID applies to every span as the request and response generate activity across the system. -That trace ID enables one to trace, or follow, a request as it flows from node to node, service to microservice to lambda function to wherever it goes in your chaotic, cloud computing system and back again. +The trace ID lets you trace, or follow, a request as it flows from node to node, service to microservice to lambda function to wherever it goes in your chaotic, cloud computing system and back again. This is recorded and displayed as spans. -Here's an example showing two pages in Grafana Cloud. The first, on the left (1), shows a query using the **Explore** feature. -In the query results you can see a **traceID** field that was added to an application. That field contains a **Tempo** trace ID. -The second page, on the right (2), uses the same Explore feature to perform a Tempo search using that **trace ID**. +Here's an example showing two pages in Grafana Cloud. +The first, numbered 1, shows a query using the **Explore** feature. +In the query results, you can see a **TraceID** field that was added to an application. +That field contains a **Tempo** trace ID. +The second page, numbered 2, uses the same **Explore** feature to perform a Tempo search using that **TraceID**. It then shows a set of spans as horizontal bars, each bar denoting a different part of the system. ![Traces example with query results and spans](/static/img/docs/tempo/screenshot-trace-explore-spans-g10.png) @@ -61,8 +69,8 @@ It then shows a set of spans as horizontal bars, each bar denoting a different p ## What are traces used for? Traces can help you find bottlenecks. -A trace can be visualized to give a graphic representation of how long it takes for each step in the data flow pathway to complete. -It can show where new requests are initiated and end, and how your system responds. +Applications like Grafana can visualize traces to give a graphic representation of how long it takes for each step in the data flow pathway to complete. +It can show where additional requests initiate and end, and how your system responds. This data helps you locate problem areas, often in places you never would have anticipated or found without this ability to trace the request flow. @@ -72,6 +80,6 @@ This data helps you locate problem areas, often in places you never would have a For more information about traces, refer to: -* [Traces and telemetry]({{< relref "./telemetry" >}}) -* [User journeys: How tracing can help you]({{< relref "./solutions-with-traces" >}}) -* [Glossary]({{< relref "./glossary" >}}) \ No newline at end of file +* [Traces and telemetry](/telemetry) +* [User journeys: How tracing can help you](./solutions-with-traces) +* [Glossary](./glossary) \ No newline at end of file diff --git a/docs/sources/tempo/introduction/solutions-with-traces/_index.md b/docs/sources/tempo/introduction/solutions-with-traces/_index.md index 1e543f00638..26b064391ae 100644 --- a/docs/sources/tempo/introduction/solutions-with-traces/_index.md +++ b/docs/sources/tempo/introduction/solutions-with-traces/_index.md @@ -11,13 +11,12 @@ weight: 300 # Use traces to find solutions -Tracing is best used for analyzing the performance of your system, identifying bottlenecks, monitoring latency, and providing a complete picture of how requests are processed. - -* Decrease MTTR/MTTI: Tracing helps reduce Mean Time To Repair (MTTR) and Mean Time To Identify (MTTI) by pinpointing exactly where errors or latency are occurring within a transaction across multiple services. -* Optimization of bottlenecks and long-running code: By visualizing the path and duration of requests, tracing can help identify bottleneck operations and long-running pieces of code that could benefit from optimization. -* Metrics generation and RED signals: Tracing can help generate useful metrics related to Request rate, Error rate, and Duration of requests (RED). You can set alerts against these high-level signals to detect problems when they arise. -* Seamless telemetry correlation: Using tracing in conjunction with logs and metrics can help give you a comprehensive view of events over time during an active incident or postmorterm analysis by showing relationships between services and dependencies. -* Monitor compliance with policies: Business policy adherence ensures that services are correctly isolated using generated metrics and generated service graphs. +Tracing is best used for analyzing the performance of your system, identifying bottlenecks, monitoring latency, and providing a complete picture of requests processing. +* Decrease mean time to repair and mean time to identify an issue by pinpointing exactly where errors or latency are occurring within a transaction across multiple services. +* Optimize bottlenecks and long-running code by visualizing the path and duration of requests. Tracing can help identify bottleneck operations and long-running pieces of code that could benefit from optimization. +* Detect issues with generated metrics. Tracing generates metrics related to request rate, error rate, and duration of requests. You can set alerts against these high-level signals to detect problems. +* Seamless telemetry correlation. Use tracing in conjunction with logs and metrics for a comprehensive view of events over time, during an active incident, or for root-cause analysis. Tracing shows relationships between services and dependencies. +* Monitor compliance with policies. Business policy adherence ensures that services are correctly isolated using generated metrics and generated service graphs. Each use case provides real-world examples, including the background of the use case and how tracing highlighted and helped resolve any issues. \ No newline at end of file diff --git a/docs/sources/tempo/introduction/solutions-with-traces/traces-app-insights.md b/docs/sources/tempo/introduction/solutions-with-traces/traces-app-insights.md index e912780367d..4e03804e99d 100644 --- a/docs/sources/tempo/introduction/solutions-with-traces/traces-app-insights.md +++ b/docs/sources/tempo/introduction/solutions-with-traces/traces-app-insights.md @@ -30,15 +30,17 @@ Handy Site Corp, a fake website company, runs an ecommerce application that incl ### Define realistic SLOs -Handy Site’s engineers start by establishing service level objectives (SLOs) around latency ensure that customers have a good experience when trying to complete the checkout process. + + +Handy Site’s engineers start by establishing service level objectives, or SLOs, around latency ensure that customers have a good experience when trying to complete the checkout process. To do this, they use metrics generated from their span data. Their service level objective should be a realistic target based on previous history during times of normal operation. -Once they've agreed upon their service level objective, they will set up alerts to warn them when they are at risk of failing to meet that objective. +Once they've agreed upon their service level objective, they set up alerts to signal risk of failing to meet that objective. ### Utilize span metrics to define your SLO and SLI -After evaluating options, they decide to use [span metrics](ref:span-metrics) as a service level indicator (SLI) to measure SLO compliance. +After evaluating options, they decide to use [span metrics](ref:span-metrics) as a service-level indicator (SLI) to measure SLO compliance. ![Metrics generator and exemplars](/media/docs/tempo/intro/traces-metrics-gen-exemplars.png) @@ -46,7 +48,6 @@ Tempo can generate metrics using the [metrics-generator component](ref:metrics-g These metrics are created based on spans from incoming traces and demonstrate immediate usefulness with respect to application flow and overview. This includes rate, error, and duration (RED) metrics. - Span metrics also make it easy to use exemplars. An [exemplar](https://grafana.com/docs/grafana//basics/exemplars/) serves as a detailed example of one of the observations aggregated into a metric. An exemplar contains the observed value together with an optional timestamp and arbitrary trace IDs, which are typically used to reference a trace. Since traces and metrics co-exist in the metrics-generator, exemplars can be automatically added to those metrics, allowing you to quickly jump from a metric showing aggregate latency over time into an individual trace that represents a low, medium, or high latency request. Similarly, you can quickly jump from a metric showing error rate over time into an individual erroring trace. @@ -54,10 +55,14 @@ Since traces and metrics co-exist in the metrics-generator, exemplars can be aut ### Monitor latency Handy Site decides they're most interested in monitoring the latency of requests processed by their checkout service and want to set an objective that 99.5% of requests in a given month should complete within 2 seconds. -To define a service level indicator (SLI) that they can use to track their progress against their objective, they use the `traces_spanmetrics_latency` metric with the proper label selectors, such as `service name = checkoutservice`. -The metrics-generator adds a default set of labels to the metrics it generates, including `span_kind` and `status_code`. However, if they were interested in calculating checkout service latency per endpoint or per version of the software, they could change the configuration of the Tempo metrics-generator to add these custom dimensions as labels to their spanmetrics. +To define a service-level indicator (SLI) that they can use to track their progress against their objective, they use the `traces_spanmetrics_latency` metric with the proper label selectors, such as `service name = checkoutservice`. +The metrics-generator adds a default set of labels to the metrics it generates, including `span_kind` and `status_code`. +If they want to calculate checkout service latency per endpoint or per version of the software, they could change the configuration of the Tempo metrics-generator to add these custom dimensions as labels to their span metrics. -With all of this in place, Handy Site now opens the [Grafana SLO](https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/) application and follows the setup flow to establish an [SLI](https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/create/) for their checkout service around the `traces_spanmetrics_latency` metric. -They can now be alerted to degradations in service quality that directly impact their end user experience. SLO-based alerting also ensures that they don't suffer from noisy alerts. Alerts are only triggered when the value of the SLI is such that the team is in danger of missing their SLO. +With all of this in place, Handy Site opens the [Grafana SLO](https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/) application and follows the setup flow to establish an [SLI](https://grafana.com/docs/grafana-cloud/alerting-and-irm/slo/create/) for their checkout service around the `traces_spanmetrics_latency` metric. +They can be alerted to degradations in service quality that directly impact their end user experience. +SLO-based alerting also ensures that they don't suffer from noisy alerts. +Alerts are only triggered when the value of the SLI is such that the team is in danger of missing their SLO. -![Latency SLO dashboard](/media/docs/tempo/intro/traces-metrics-gen-SLO.png) \ No newline at end of file +![Latency SLO dashboard](/media/docs/tempo/intro/traces-metrics-gen-SLO.png) + \ No newline at end of file diff --git a/docs/sources/tempo/introduction/solutions-with-traces/traces-diagnose-errors.md b/docs/sources/tempo/introduction/solutions-with-traces/traces-diagnose-errors.md index 01778459554..8c179353843 100644 --- a/docs/sources/tempo/introduction/solutions-with-traces/traces-diagnose-errors.md +++ b/docs/sources/tempo/introduction/solutions-with-traces/traces-diagnose-errors.md @@ -18,35 +18,41 @@ refs: # Diagnose errors with traces Traces allow you to quickly diagnose errors in your application, ensuring that you can perform Root Cause Analysis (RCA) on request failures. -Trace visualizations help you determine the spans in which errors occur, along with the context behind those errors, leading to a lower mean time to repair (MTTR). +Trace visualizations help you determine the spans in which errors occur, along with the context behind those errors, leading to a lower mean time to repair. ## Meet Handy Site Corp -Handy Site Corp, a fake website company, runs an ecommerce application that includes user authentication, a product catalog, order management, payment processing, and other services. +Handy Site Corp, a fake website company, runs an e-commerce application that includes user authentication, a product catalog, order management, payment processing, and other services. -Handy Site’s operations team receives several alerts for their error SLO for monitored endpoints in their services. Using their Grafana dashboards, they notice that there are several issues. The dashboard provides the percentages of errors: +Handy Site's operations team receives several alerts for their error service-level objectives, or SLOs, for monitored endpoints in their services. +Using their Grafana dashboards, they notice that there are several issues. +The dashboard provides the percentages of errors: ![Dashboard showing errors in services](/media/docs/tempo/intro/traces-error-SLO.png) More than 5% of the requests from users are resulting in an error across several endpoints, such as `/beholder`, `/owlbear`, and `/illithid`, which causes a degradation in performance and usability. -It’s imperative for the operations team at Handy Site to quickly troubleshoot the issue. The elevated error rates indicate that the Handy Site is unable to provide valid responses to their users’ requests, which in addition to threatening the operation team’s SLO error budget also affects profitability overall for Handy Site. +It's imperative for the operations team at Handy Site to quickly troubleshoot the issue. +The elevated error rates indicate that the Handy Site is unable to provide valid responses to their users' requests, which in addition to threatening the operation team's SLO error budget also affects profitability overall for Handy Site. ## Use TraceQL to query data -Tempo has a traces-first query language, [TraceQL](ref:traceql), that provides a unique toolset for selecting and searching tracing data. TraceQL can match traces based on span and resource attributes, duration, and ancestor<>descendant relationships. It also can compute aggregate statistics (e.g., `rate`) over a set of spans. +Tempo has a traces-first query language, [TraceQL](ref:traceql), that provides a unique toolset for selecting and searching tracing data. +TraceQL can match traces based on span and resource attributes, duration, and ancestor<>descendant relationships. +The language also can compute aggregate statistics, such as `rate` over a set of spans. -Handy Site’s services and applications are instrumented for tracing, so they can use TraceQL as a debugging tool. Using three TraceQL queries, the team identifies and validates the root cause of the issue. +Handy Site's has instrumented their services and applications for tracing, so they can use TraceQL as a debugging tool. +Using three TraceQL queries, the team identifies and validates the root cause of the issue. ### Find HTTP errors -The top-level service, `mythical-requester` receives requests and returns responses to users. When it receives a request, it calls numerous downstream services whose responses it relies on in order to send a response to the user request. +The top-level service, `mythical-requester` receives requests and returns responses to users. When it receives a request, it calls numerous downstream services whose responses it relies on to send a response to the user request. Using Grafana Explore, the operations team starts with a simple TraceQL query to find all traces from this top-level service, where an HTTP response code sent to a user is `400` or above. Status codes in this range include `Forbidden`, `Not found`, `Unauthorized`, and other client and server errors. ```traceql { resource.service.name = "mythical-requester" && span.http.status_code >= 400 } | select(span.http.target) ``` -The addition of the `select` statement to this query (after the `|`) ensures that the query response includes not only the set of matched spans, but also the `http.target` attribute (i.e. the endpoints for the SaaS service) for each of those spans. +Adding the `select` statement to this query (after the `|`) ensures that the query response includes not only the set of matched spans, but also the `http.target` attribute (for example, the endpoints for the SaaS service) for each of those spans. ![Query results showing http.target attribute](/media/docs/tempo/intro/traceql-http-target-handy-site.png) @@ -56,7 +62,8 @@ Looking at the set of returned spans, the most concerning ones are those with th The team decides to use structural operators to follow an error chain from the top-level `mythical-requester` service to any descendant spans that also have an error status. Descendant spans can be any span that's descended from the parent span, such as a child or a further child at any depth. -Using this query, the team can pinpoint the downstream service that might be causing the issue. The query below says "Find me spans where `status = error` that that are descendants of spans from the `mythical-requester` service that have status code `500`." +Using this query, the team can pinpoint the downstream service that might be causing the issue. +The query below says "Find spans where `status = error` that are descendants of spans from the `mythical-requester` service that have status code `500`." ```traceql { resource.service.name = "mythical-requester" && span.http.status_code = 500 } >> { status = error } @@ -64,7 +71,7 @@ Using this query, the team can pinpoint the downstream service that might be cau ![TraceQL results showing expanded span](/media/docs/tempo/intro/traceql-error-insert-handy-site.png) -Expanding the erroring span that is a descendant of the span for the `mythical-server` service shows the team that there is a problem with the data being inserted into the database. +Expanding the span with errors that's a descendant of the span for the `mythical-server` service shows the team that there is a problem with the data inserted into the database. Specifically, the service is passing a `null` value for a column in a database table where `null` values are invalid. ![Error span for INSERT](/media/docs/tempo/intro/traceql-insert-postgres-handy-site.png) @@ -73,7 +80,7 @@ Specifically, the service is passing a `null` value for a column in a database t After identifying the specific cause of this internal server error, the team wants to know if there are errors in any database operations other than the `null` `INSERT` error found above. -Their updated query uses a negated regular expression to find any spans where the database statement either doesn’t exist, or doesn’t start with an `INSERT` clause. +Their updated query uses a negated regular expression to find any spans where the database statement either doesn't exist, or doesn't start with an `INSERT` clause. This should expose any other issues causing an internal server error and filter out the class of issues that they already diagnosed. ```traceql @@ -81,7 +88,8 @@ This should expose any other issues causing an internal server error and filter ``` This query yields no results, suggesting that the root cause of the issues the operations team are seeing is exclusively due to the failing database `INSERT` statement. -At this point, they can roll back to a known working version of the service, or deploy a fix to ensure that `null` data being passed to the service is rejected appropriately. -Once that is complete, the issue can be marked resolved and the Handy team's error rate SLI should return back to acceptable levels. +At this point, they can roll back to a known working version of the service, or deploy a fix to ensure that the service rejects `null` data appropriately. +After it completes, the team marks the issue resolved. +The Handy team's error rate SLI should return back to acceptable levels. ![Empty query results](/media/docs/tempo/intro/traceql-no-results-handy-site.png) diff --git a/docs/sources/tempo/introduction/telemetry.md b/docs/sources/tempo/introduction/telemetry.md index 8024cddf5f7..09b6c467d65 100644 --- a/docs/sources/tempo/introduction/telemetry.md +++ b/docs/sources/tempo/introduction/telemetry.md @@ -21,37 +21,42 @@ Correlating between the four pillars of observability helps create a holistic vi ## Metrics Metrics provide a high level picture of the state of a system. -Because they are numeric values and therefore can easily be compared against known thresholds, metrics are the foundation of alerts, which constantly run in the background and trigger when a value is outside of an expected range. This is typically the first sign that something is going on and are where discovery first starts. +Metrics are the foundation of alerts because metrics are numeric values and can be compared against known thresholds. +Alerts constantly run in the background and trigger when a value is outside of an expected range. +This is typically the first sign that something is going on and are where discovery first starts. Metrics indicate that something is happening. ## Logs Logs provide an audit trail of activity from a single process that create informational context. Logs act as atomic events, detailing what's occurring in the services in your application. -Whereas metrics are quantitative (numeric) and structured, logs are qualitative (textual) and unstructured or semi-structured. They offer a higher degree of detail, but also at the expense of creating significantly higher data volumes. +Whereas metrics are quantitative (numeric) and structured, logs are qualitative (textual) and unstructured or semi-structured. +They offer a higher degree of detail, but also at the expense of creating significantly higher data volumes. Logs let you know what's happening to your application. - ## Traces -Traces add further to the observability picture by telling you what happens at each step or action in a data pathway. Traces provide the map–-the where–-something is going wrong. -A trace provides a graphic representation of how long each step in the data flow pathway (for example, HTTP request, database lookup, call to a third party service) takes to complete. -It can show where new requests are initiated and finished, as well as how your system responds. +Traces add further to the observability picture by telling you what happens at each step or action in a data pathway. Traces provide the map—the where—something is going wrong. +A trace provides a graphic representation of how long each step in the data flow pathway takes to complete. For example, how long a HTTP request, a database lookup, or a call to a third party service takes. +It can show where requests initiate and finish, as well as how your system responds. This data helps you locate problem areas and assess their impact, often in places you never would have anticipated or found without this ability to trace the request flow. ## Profiles Profiles help you understand how your applications utilize compute resources such as CPU time and memory. -This allows you to identify specific lines of code or functions that can be optimized to improve application performance and efficiency. +This helps identify specific lines of code or functions to optimize and improve performance and efficiency. ## Why traces? Metrics in themselves aren't sufficient to find the root cause and solve complex issues. The same can be said for logs, which can contain a significant amount of information but lack the context of the interactions and dependencies between the different components of your complex environment. -Each pillar of observability (metrics, logs, traces, profiles) has its own unique strength when it comes to root causing issues. +Each pillar of observability—metrics, logs, traces, profiles—has its own unique strength when it comes to root causing issues. To get the most value of your observability strategy, you need to be able to correlate them. -Traces have the unique ability to show relationships between services. They allow you to identify which services are upstream from your service, which is helpful when you want to understand which services might be negatively impacted by problems in your service. They also allow you to identify which services are downstream from your service; this is valuable since your application relies on their downstream services, and problems with those services may be the cause of elevated errors or latency being reported by your service. +Traces have the unique ability to show relationships between services. +They help identify which services are upstream from your service, which is helpful when you want to understand which services might be negatively impacted by problems in your service. +Traces also help identify which services are downstream from your service. +This is valuable since your application relies on their downstream services, and problems with those services may be the cause of elevated errors or latency reported by your service. For example, you can directly see the failing database and all impacted failing edge endpoints. Using traces and [exemplars](https://grafana.com/docs/grafana/next/fundamentals/exemplars/), you can go from a metric data point and get to an associated trace. diff --git a/docs/sources/tempo/metrics-generator/service_graphs/_index.md b/docs/sources/tempo/metrics-generator/service_graphs/_index.md index 4003d2ebe53..7c6d45dca44 100644 --- a/docs/sources/tempo/metrics-generator/service_graphs/_index.md +++ b/docs/sources/tempo/metrics-generator/service_graphs/_index.md @@ -29,10 +29,10 @@ The metrics-generator processes traces and generates service graphs in the form Service graphs work by inspecting traces and looking for spans with parent-children relationship that represent a request. The processor uses the [OpenTelemetry semantic conventions](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/trace.md) to detect a myriad of requests. -It currently supports the following requests: +It supports the following requests: - A direct request between two services where the outgoing and the incoming span must have [`span.kind`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#spankind), `client`, and `server`, respectively. - A request across a messaging system where the outgoing and the incoming span must have `span.kind`, `producer`, and `consumer` respectively. -- A database request; in this case the processor looks for spans containing attributes `span.kind`=`client` as well as one of `db.name` or `db.system`. See below for how the name of the node is determined for a database request. +- A database request; in this case the processor looks for spans containing attributes `span.kind`=`client` as well as one of `db.name` or `db.system`. See below for how the name of the node is determined for a database request. Every span that can be paired up to form a request is kept in an in-memory store, until its corresponding pair span is received or the maximum waiting time has passed. When either of these conditions are reached, the request is recorded and removed from the local store. @@ -46,14 +46,15 @@ Each emitted metrics series have the `client` and `server` label corresponding w ### Virtual nodes Virtual nodes are nodes that form part of the lifecycle of a trace, -but spans for them are not being collected because they're outside the user's reach (for example, an external service for payment processing) or are not instrumented (for example, a frontend application). +but spans for them aren't collected because they're outside the user's reach or aren't instrumented. +For example, you might not collect spans for an external service for payment processing that's outside user interaction. Virtual nodes can be detected in two different ways: - The root span has `span.kind` set to `server`. This indicates that the request has initiated by an external system that's not instrumented, like a frontend application or an engineer via `curl`. -- A `client` span does not have its matching `server` span, but has a peer attribute present. In this case, we make the assumption that a call was made to an external service, for which Tempo won't receive spans. +- A `client` span doesn't have its matching `server` span, but has a peer attribute present. In this case, assume that a call was made to an external service, for which Tempo won't receive spans. - The default peer attributes are `peer.service`, `db.name` and `db.system`. - - The order of the attributes is important, as the first one that is present will be used as the virtual node name. + - The order of the attributes is important, as the first one is used as the virtual node name. A database node is identified by the span having at least `db.name` or `db.system` attribute. @@ -63,15 +64,19 @@ The name of a database node is determined using the following span attributes in The following metrics are exported: + + | Metric | Type | Labels | Description | | ----------------------------------------------------- | --------- | ------------------------------- | ---------------------------------------------------------------------------------------------------------- | -| traces_service_graph_request_total | Counter | client, server, connection_type | Total count of requests between two nodes | -| traces_service_graph_request_failed_total | Counter | client, server, connection_type | Total count of failed requests between two nodes | -| traces_service_graph_request_server_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the server | -| traces_service_graph_request_client_seconds | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the client | -| traces_service_graph_request_messaging_system_seconds | Histogram | client, server, connection_type | (Off by default) Time between publisher and consumer for services communicating through a messaging system | -| traces_service_graph_unpaired_spans_total | Counter | client, server, connection_type | Total count of unpaired spans | -| traces_service_graph_dropped_spans_total | Counter | client, server, connection_type | Total count of dropped spans | +| `traces_service_graph_request_total` | Counter | client, server, connection_type | Total count of requests between two nodes | +| `traces_service_graph_request_failed_total` | Counter | client, server, connection_type | Total count of failed requests between two nodes | +| `traces_service_graph_request_server_seconds` | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the server | +| `traces_service_graph_request_client_seconds` | Histogram | client, server, connection_type | Time for a request between two nodes as seen from the client | +| `traces_service_graph_request_messaging_system_seconds` | Histogram | client, server, connection_type | (Off by default) Time between publisher and consumer for services communicating through a messaging system | +| `traces_service_graph_unpaired_spans_total` | Counter | client, server, connection_type | Total count of unpaired spans | +| `traces_service_graph_dropped_spans_total` | Counter | client, server, connection_type | Total count of dropped spans | + + Duration is measured both from the client and the server sides. @@ -81,7 +86,7 @@ Additional labels can be included using the `dimensions` configuration option, o Since the service graph processor has to process both sides of an edge, it needs to process all spans of a trace to function properly. -If spans of a trace are spread out over multiple instances, spans are not paired up reliably. +If spans of a trace are spread out over multiple instances, spans aren't paired up reliably. #### Activate `enable_virtual_node_label` @@ -89,4 +94,4 @@ Activating this feature adds the following label and corresponding values: | Label | Possible Values | Description | |-------------------------|-----------------------------|--------------------------------------------------------------------------| -| virtual_node | `unset`, `client`, `server` | Explicitly indicates the side that is uninstrumented | +| `virtual_node` | `unset`, `client`, `server` | Explicitly indicates the uninstrumented side | diff --git a/docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md b/docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md index 3d0d6501d07..89931329947 100644 --- a/docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md +++ b/docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md @@ -44,7 +44,7 @@ Finally, we get the following cardinality estimation: Sum: [([2 * #hb] + 2) * #hops] + [2 * #services] ``` -{{% admonition type="note" %}} +{{< admonition type="note" >}} If `enable_messaging_system_latency_histogram` configuration is set to `true`, another histogram is produced: ``` @@ -57,8 +57,8 @@ In that case, the estimation formula would be: Sum: [([3 * #hb] + 2) * #hops] + [2 * #services] ``` -{{% /admonition %}} +{{< /admonition >}} {{< admonition type="note" >}} To estimate the number of metrics, refer to the [Dry run metrics generator]({{< relref "../cardinality" >}}) documentation. -{{% /admonition %}} \ No newline at end of file +{{< /admonition >}} \ No newline at end of file diff --git a/docs/sources/tempo/operations/cross_tenant_query.md b/docs/sources/tempo/operations/cross_tenant_query.md index e4946d978fa..4c233886b64 100644 --- a/docs/sources/tempo/operations/cross_tenant_query.md +++ b/docs/sources/tempo/operations/cross_tenant_query.md @@ -13,7 +13,7 @@ aliases: {{< admonition type="note" >}} You need to enable `multitenancy_enabled: true` in the cluster for multi-tenant querying to work. Refer to [Enable multi-tenancy](/docs/tempo/latest/operations/multitenancy/) for more details and implications of `multitenancy_enabled: true`. -{{% /admonition %}} +{{< /admonition >}} Tempo supports multi-tenant queries for search, search-tags, and trace-by-ID search operations. diff --git a/docs/sources/tempo/operations/generic_forwarding.md b/docs/sources/tempo/operations/generic_forwarding.md index b8596723f5c..deb08739d06 100644 --- a/docs/sources/tempo/operations/generic_forwarding.md +++ b/docs/sources/tempo/operations/generic_forwarding.md @@ -11,7 +11,7 @@ Generic forwarding allows asynchronous replication of ingested traces. The distr {{< admonition type="warning" >}} Generic forwarding does not work retroactively. Once enabled, the distributor only replicates freshly ingested spans. -{{% /admonition %}} +{{< /admonition >}} ## Configure generic forwarding diff --git a/docs/sources/tempo/operations/monitor/polling.md b/docs/sources/tempo/operations/monitor/polling.md index ff74771e084..351dccbfdb8 100644 --- a/docs/sources/tempo/operations/monitor/polling.md +++ b/docs/sources/tempo/operations/monitor/polling.md @@ -28,7 +28,7 @@ During normal operation, it will stale by at most twice the configured `blocklis {{< admonition type="note" >}} For details about configuring polling, refer to [polling configuration]({{< relref "../../configuration/polling" >}}). -{{% /admonition %}} +{{< /admonition >}} ## Monitor polling with dashboards and alerts diff --git a/docs/sources/tempo/operations/monitor/set-up-monitoring.md b/docs/sources/tempo/operations/monitor/set-up-monitoring.md index 06e595643c6..a5b99c6f972 100644 --- a/docs/sources/tempo/operations/monitor/set-up-monitoring.md +++ b/docs/sources/tempo/operations/monitor/set-up-monitoring.md @@ -24,7 +24,7 @@ Update any instructions in this document for your own deployment. If you use the [Kubernetes integration Grafana Alloy Helm chart](https://grafana.com/docs/alloy//set-up/install/kubernetes/), you can use the Kubernetes scrape annotations to automatically scrape Tempo. You’ll need to add the labels to all of the deployed components. -{{% /admonition %}} +{{< /admonition >}} ## Before you begin @@ -51,7 +51,7 @@ In addition, the test app instructions explain how to configure a Tempo data sou {{< admonition type="note" >}} If you already have a Tempo environment, then there is no need to create a test app. This guide assumes that the Tempo and Grafana Alloy configurations are the same as or based on [these instructions to create a test application](https://grafana.com/docs/tempo/latest/setup/set-up-test-app/), as you'll augment those configurations to enable Tempo metrics monitoring. -{{% /admonition %}} +{{< /admonition >}} In these examples, Tempo is installed in a namespace called `tempo`. Change this namespace name in the examples as needed to fit your own environment. @@ -140,8 +140,8 @@ rule { This lets you create a configuration that scrapes metrics from Tempo components and writes the data to a Mimir instance of your choice. This example provides a Helm `values.yaml` file that you can use for [Alloy deployed on Kubernetes](https://grafana.com/docs/alloy//configure/kubernetes/). -The file configures the options Alloy uses to scrap a running instance of Tempo. -Refer to the comments in the example for details. +The file configures the options Alloy uses to scrap a running instance of Tempo. +Refer to the comments in the example for details. ```yaml alloy: @@ -265,7 +265,7 @@ This contains a compiled version of the alert and recording rules, as well as th If you want to change any of the mixins, make your updates in the `operations/tempo-mixin` directory. Use the instructions in the [README](https://github.com/grafana/tempo/tree/main/operations/tempo-mixin) in that directory to regenerate the files. The mixins are generated in the `operations/tempo-mixin-compiled` directory. -{{% /admonition %}} +{{< /admonition >}} ### Import the dashboards to Grafana @@ -276,7 +276,7 @@ Refer to [Import a dashboard ](https://grafana.com/docs/grafana/latest/dashboard Install all six dashboards. You can only import one dashboard at a time. Create a new folder in the Dashboards area, for example “Tempo Monitoring”, as an easy location to save the imported dashboards. -{{% /admonition %}} +{{< /admonition >}} To create a folder: diff --git a/docs/sources/tempo/operations/multitenancy.md b/docs/sources/tempo/operations/multitenancy.md index 08f9d032374..4780628aa85 100644 --- a/docs/sources/tempo/operations/multitenancy.md +++ b/docs/sources/tempo/operations/multitenancy.md @@ -18,7 +18,7 @@ in the repository. This example uses the following settings to achieve multi-ten {{< admonition type="note" >}} Multi-tenancy on ingestion is currently [only working](https://github.com/grafana/tempo/issues/495) with GPRC and this may never change. It's strongly recommended to use the OpenTelemetry Collector to support multi-tenancy. -{{% /admonition %}} +{{< /admonition >}} ## Configure multi-tenancy diff --git a/docs/sources/tempo/operations/schema.md b/docs/sources/tempo/operations/schema.md index ca3350b5fad..3db1827c85a 100644 --- a/docs/sources/tempo/operations/schema.md +++ b/docs/sources/tempo/operations/schema.md @@ -9,6 +9,11 @@ aliases: # Apache Parquet schema + + + + + Starting with Tempo 2.0, Apache Parquet is used as the default column-formatted block format. Refer to the [Parquet configuration options]({{< relref "../configuration/parquet.md" >}}) for more information. @@ -51,6 +56,8 @@ The table below uses these abbreviations: - `rs` - resource spans - `ss` - scope spans + + | | | | |:----|:----|:----| |Name|Type|Description| @@ -118,6 +125,8 @@ The table below uses these abbreviations: |rs.ss.Spans.Links|byte array|Protocol-buffer encoded span links if present, else null.| |rs.ss.Spans.TraceState|string|The span's TraceState value if present, else empty string.https://opentelemetry.io/docs/reference/specification/trace/api/#tracestate| + + To increase the readability table omits the groups `list.element` that are added for nested list types in Parquet. ### Block Schema display in Parquet Message format @@ -261,7 +270,7 @@ For speed and ease-of-use, we are projecting several values to columns at the tr ## Well-known attributes Projecting attributes to their own columns has benefits for search speed and size. -Therefore, we are taking an opinionated approach and store some well-known attributes to their own dedicated columns. +Therefore, we are taking an opinionated approach and store some well-known attributes to their own dedicated columns. All other attributes are stored in the generic key/value maps and are still searchable, but not as quickly. We chose these attributes based on the [OTEL semantic-conventions](https://github.com/open-telemetry/semantic-conventions/tree/main) and what we commonly use ourselves (scratching our own itch), but we think they will be useful to most workloads. @@ -323,3 +332,8 @@ Parquet has robust support for many compression algorithms and data encodings. W ### Bloom filters Parquet has native support for bloom filters. However, Tempo does not use them at this time. Tempo already has sophisticated support for sharding and caching bloom filters. + + + + + \ No newline at end of file diff --git a/docs/sources/tempo/operations/tempo_cli.md b/docs/sources/tempo/operations/tempo_cli.md index 0c3374a6b2f..84c3410cacc 100644 --- a/docs/sources/tempo/operations/tempo_cli.md +++ b/docs/sources/tempo/operations/tempo_cli.md @@ -99,7 +99,7 @@ Options: {{< admonition type="note" >}} Streaming over HTTP requires the `stream_over_http_enabled` flag to be set. For more information, refer to [Tempo GRPC API documentation]({{< relref "../api_docs" >}}). -{{% /admonition %}} +{{< /admonition >}} ### Search tags Call the Tempo API and search attribute names. @@ -119,7 +119,7 @@ Options: {{< admonition type="note" >}} Streaming over HTTP requires the `stream_over_http_enabled` flag to be set. For more information, refer to [Tempo GRPC API documentation]({{< relref "../api_docs" >}}). -{{% /admonition %}} +{{< /admonition >}} ### Search tag values Call the Tempo API and search attribute values. @@ -140,7 +140,7 @@ Options: {{< admonition type="note" >}} Streaming over HTTP requires the `stream_over_http_enabled` flag to be set. For more information, refer to [Tempo GRPC API documentation]({{< relref "../api_docs" >}}). -{{% /admonition %}} +{{< /admonition >}} ### Metrics Call the Tempo API and generate metrics from traces using TraceQL. @@ -159,9 +159,9 @@ Options: - `--use-grpc` Use GRPC streaming - `--path-prefix ` String to prefix search paths with -{{% admonition type="note" %}} +{{< admonition type="note" >}} Streaming over HTTP requires the `stream_over_http_enabled` flag to be set. For more information, refer to [Tempo GRPC API documentation]({{< relref "../api_docs" >}}). -{{% /admonition %}} +{{< /admonition >}} ## Query blocks command @@ -172,7 +172,7 @@ tempo-cli query blocks ``` {{< admonition type="note" >}} This can be intense as it downloads every bloom filter and some percentage of indexes/trace data. - {{% /admonition %}} + {{< /admonition >}} Arguments: - `trace-id` Trace ID as a hexadecimal string. diff --git a/docs/sources/tempo/operations/user-configurable-overrides.md b/docs/sources/tempo/operations/user-configurable-overrides.md index 949ca707cad..e7f5b3c7b15 100644 --- a/docs/sources/tempo/operations/user-configurable-overrides.md +++ b/docs/sources/tempo/operations/user-configurable-overrides.md @@ -19,7 +19,7 @@ User-configurable overrides are stored in an object store bucket managed by Temp {{< admonition type="note" >}} We recommend using a different bucket for overrides and traces storage, but they can share a bucket if needed. When sharing a bucket, make sure any lifecycle rules are scoped correctly to not remove data of user-configurable overrides module. -{{% /admonition %}} +{{< /admonition >}} Overrides of every tenant are stored at `/{tenant name}/overrides.json`: @@ -44,7 +44,7 @@ When a field is set in both the user-configurable overrides and the runtime over {{< admonition type="note" >}} `processors` is an exception: Tempo will merge values from both user-configurable overrides and runtime overrides into a single list. -{{% /admonition %}} +{{< /admonition >}} ```yaml [forwarders: ] diff --git a/docs/sources/tempo/release-notes/v2-2.md b/docs/sources/tempo/release-notes/v2-2.md index fde619ac479..858089e1a08 100644 --- a/docs/sources/tempo/release-notes/v2-2.md +++ b/docs/sources/tempo/release-notes/v2-2.md @@ -7,6 +7,11 @@ weight: 50 # Version 2.2 release notes + + + + + The Tempo team is pleased to announce the release of Tempo 2.2. This release gives you: @@ -20,8 +25,8 @@ Tempo 2.2 makes vParquet2, a Parquet version designed to be more compatible with Read the [Tempo 2.2 blog post](/blog/2023/08/02/grafana-tempo-2.2-release-traceql-structural-operators-are-here/) for more examples and details about these improvements. {{< admonition type ="note" >}} -For a complete list of changes, enhancements, and bug fixes refer to the [Tempo 2.2 changelog](https://github.com/grafana/tempo/releases/tag/v2.2.0). -{{% /admonition %}} +For a complete list of changes, enhancements, and bug fixes refer to the [Tempo 2.2 CHANGELOG](https://github.com/grafana/tempo/releases/tag/v2.2.0). +{{< /admonition >}} ## Features and enhancements @@ -29,34 +34,37 @@ Some of the most important features and enhancements in Tempo 2.2 are highlighte ### Expanding the TraceQL language -With this release, we’ve added to the [TraceQL language]({{< relref "../traceql" >}}). +With this release, we've added to the [TraceQL language]({{< relref "../traceql" >}}). TraceQL now offers: -* Structural operators: descendant (>>), child (>), and sibling (~) ([documentation]({{< relref "../traceql#structural" >}})). Find relevant traces based on their structure and relationships among spans. [PR [#2625](https://github.com/grafana/tempo/pull/2625) [#2660](https://github.com/grafana/tempo/pull/2660)] -* A `select()` operation that allows you to specify arbitrary span attributes that you want included in the TraceQL response ([documentation]({{< relref "../traceql#selection" >}})) [PR [2494](https://github.com/grafana/tempo/pull/2494)] -* A `by()` operation that groups span sets within a trace by an attribute of your choosing. This operation is not supported in the Grafana UI yet; you can only use `by()` when querying Tempo’s search API directly. ([documentation]({{< relref "../traceql#grouping" >}}) [PR [2490](https://github.com/grafana/tempo/pull/2490)] -* New intrinsic attributes for use in TraceQL queries: `traceDuration`, `rootName`, and `rootServiceName` ([documentation]({{< relref "../traceql" >}})) [PR [#2503](https://github.com/grafana/tempo/pull/2503)] +- Structural operators: descendant (>>), child (>), and sibling (~) ([documentation]({{< relref "../traceql#structural" >}})). Find relevant traces based on their structure and relationships among spans. [PR [#2625](https://github.com/grafana/tempo/pull/2625) [#2660](https://github.com/grafana/tempo/pull/2660)] +- A `select()` operation that allows you to specify arbitrary span attributes that you want included in the TraceQL response ([documentation]({{< relref "../traceql#selection" >}})) [PR [2494](https://github.com/grafana/tempo/pull/2494)] +- A `by()` operation that groups span sets within a trace by an attribute of your choosing. This operation is not supported in the Grafana UI yet; you can only use `by()` when querying the search API directly. [[documentation]({{< relref "../traceql#grouping" >}}) PR [2490](https://github.com/grafana/tempo/pull/2490)] +- New intrinsic attributes for use in TraceQL queries: `traceDuration`, `rootName`, and `rootServiceName` ([documentation]({{< relref "../traceql" >}})) [PR [#2503](https://github.com/grafana/tempo/pull/2503)] Read the [Tempo 2.2 blog post](/blog/2023/08/02/grafana-tempo-2.2-release-traceql-structural-operators-are-here/) for examples of how to use these new language additions. To learn more about the TraceQL syntax, see the [TraceQL documentation]({{< relref "../traceql" >}}). -For information on planned future extensions to the TraceQL language, see [future work]({{< relref "../traceql/architecture" >}}). +For information on planned future extensions to the TraceQL language, refer to [future work]({{< relref "../traceql/architecture" >}}). ### Get TraceQL results faster -We’re always trying to reduce the time you spend waiting to get results to your TraceQL queries, and we’ve made some nice progress on this front with this release. +We're always trying to reduce the time you spend waiting to get results to your TraceQL queries, and we've made some nice progress on this front with this release. -We’ve added a [GRPC streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#bidirectional-streaming-rpc) endpoint to Tempo’s query frontend that allows a client to stream search results from Tempo. The Tempo CLI has been updated to use this new streaming endpoint [PR [#2366](https://github.com/grafana/tempo/pull/2366)] . As of version 10.1, Grafana supports it as well, though you must first enable the `traceQLStreaming` feature toggle [PR [#72288](https://github.com/grafana/grafana/pull/72288)]. +We've added a [gRPC streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#bidirectional-streaming-rpc) endpoint to the query frontend that allows a client to stream search results from Tempo. The Tempo CLI has been updated to use this new streaming endpoint [PR [#2366](https://github.com/grafana/tempo/pull/2366)]. +As of version 10.1, Grafana supports it as well, though you must first enable the `traceQLStreaming` feature toggle [PR [#72288](https://github.com/grafana/grafana/pull/72288)]. By streaming results to the client, you can start to look at traces matching your query before the entire query completes. This is particularly helpful for long-running queries; while the total time to complete the query is the same, you can start looking at your first matches before the full set of matched traces is returned. -In addition to streaming partial results, we’ve merged a number of improvements to speed up TraceQL queries. Here are just a few: +In addition to streaming partial results, we've merged a number of improvements to speed up TraceQL queries. Here are just a few: -* Add support for query batching between frontend and queriers to improve throughput [PR [2677](https://github.com/grafana/tempo/pull/2677)] -* Improve performance of TraceQL regex [PR [2484](https://github.com/grafana/tempo/pull/2484)] -* Fully skip over Parquet row groups with no matches in the column dictionaries [PR [2676](https://github.com/grafana/tempo/pull/2676)] -* New synchronous read mode for vParquet and vParquet2 [PRs [2165](https://github.com/grafana/tempo/pull/2165), [2535](https://github.com/grafana/tempo/pull/2535)] -* Improved TraceQL throughput by asynchronously creating jobs. [PR [2530](https://github.com/grafana/tempo/pull/2530)] +- Add support for query batching between frontend and queriers to improve throughput [PR [2677](https://github.com/grafana/tempo/pull/2677)] +- Improve performance of TraceQL regular expressions [PR [2484](https://github.com/grafana/tempo/pull/2484)] +- Fully skip over Parquet row groups with no matches in the column dictionaries [PR [2676](https://github.com/grafana/tempo/pull/2676)] + +- New synchronous read mode for vParquet and vParquet2 [PRs [2165](https://github.com/grafana/tempo/pull/2165), [2535](https://github.com/grafana/tempo/pull/2535)] + +- Improved TraceQL throughput by asynchronously creating jobs. [PR [2530](https://github.com/grafana/tempo/pull/2530)] ### Metrics summary API (experimental) @@ -66,36 +74,33 @@ From here, you might see that spans from `namespace=A` have a significantly high As another example, you could use this API to compare latencies of your spans broken down by the `region` attribute. From here, you might notice that spans from `region=North-America` have higher latencies than those from `region=Asia-Pacific`. -This API is meant to enable ad-hoc analysis of your incoming spans; by segmenting your spans by attribute and looking for differences in RED metrics, you can more quickly isolate where problems like elevated error rates or higher latencies are coming from. +This API is meant to enable as-needed analysis of your incoming spans; by segmenting your spans by attribute and looking for differences in RED metrics, you can more quickly isolate where problems like elevated error rates or higher latencies are coming from. -Unlike RED metrics computed by Tempo’s [metrics-generator](({{< relref "../metrics-generator" >}}), the values returned by this API are not persisted as time series. This has the advantage that you do not need to provide your own time series databases for storing and querying these metrics. It also allows you to compute RED metrics broken down by high cardinality attributes that would be too expensive to store in a time series database. Use the metrics generator if you want to store and visualize RED metrics over multi-hour or multi-day time ranges, or you want to alert on these metrics. +Unlike RED metrics computed by the [metrics-generator]({{< relref "../metrics-generator" >}}), the values returned by this API aren't persisted as time series. This has the advantage that you don't need to provide your own time series databases for storing and querying these metrics. It also allows you to compute RED metrics broken down by high cardinality attributes that would be too expensive to store in a time series database. Use the metrics generator if you want to store and visualize RED metrics over multi-hour or multi-day time ranges, or you want to alert on these metrics. To learn more about this API, refer to the [metrics summary API documentation.]({{< relref "../api_docs/metrics-summary" >}}) This work is represented in multiple PRs: [2368](https://github.com/grafana/tempo/pull/2368), [2418](https://github.com/grafana/tempo/pull/2418), [2424](https://github.com/grafana/tempo/pull/2424), [2442](https://github.com/grafana/tempo/pull/2442), [2480](https://github.com/grafana/tempo/pull/2480), [2481](https://github.com/grafana/tempo/pull/2481), [2501](https://github.com/grafana/tempo/pull/2501), [2579](https://github.com/grafana/tempo/pull/2579), and [2582](https://github.com/grafana/tempo/pull/2582). - ### Other enhancements -* Tempo’s [tag values]({{< relref "../api_docs#search-tag-values" >}}) and [tag names]({{< relref "../api_docs#search-tags" >}}) APIs now support filtering [PR [2253](https://github.com/grafana/tempo/pull/2253)]. This lets you retrieve all valid attribute values and names given certain criteria. For example, you can get a list of values for the attribute `namespace` seen on spans with attribute `resource=A.` This feature is off by default; to enable, configure `autocomplete_filtering_enabled`. ([documentation]({{< relref "../api_docs" >}})). Grafana’s autocomplete can make use of this filtering capability to provide better suggestions starting in v10.2 [PR [67845]](https://github.com/grafana/grafana/pull/67845). - -* Tempo’s metrics-generator now supports span filtering. Setting up filters allows you to compute metrics over the specific spans you care about, excluding others. It also can reduce the cardinality of generated metrics, and therefore the cost of storing those metrics in a Prometheus-compatible TSDB. ([documentation]({{< relref "../metrics-generator/span_metrics#filtering" >}})) [PR [2274](https://github.com/grafana/tempo/pull/2274)] +- The [tag values]({{< relref "../api_docs#search-tag-values" >}}) and [tag names]({{< relref "../api_docs#search-tags" >}}) APIs now support filtering [PR [2253](https://github.com/grafana/tempo/pull/2253)]. This lets you retrieve all valid attribute values and names given certain criteria. For example, you can get a list of values for the attribute `namespace` seen on spans with attribute `resource=A.` This feature is off by default; to enable, configure `autocomplete_filtering_enabled`. ([documentation]({{< relref "../api_docs" >}})). The autocomplete in Grafana can make use of this filtering capability to provide better suggestions starting in v10.2 [PR [67845]](https://github.com/grafana/grafana/pull/67845). -* Tempo’s metrics-generator can now detect virtual nodes ([documentation]({{< relref "../metrics-generator/service_graphs#virtual-nodes" >}})) [PR [2365](https://github.com/grafana/tempo/pull/2365)]. As a result, you’ll now see these virtual nodes represented in your service graph. For more information, refer to the [virtual nodes documentation]({{< relref "../metrics-generator/service_graphs#virtual-nodes" >}}). +- The metrics-generator now supports span filtering. Setting up filters allows you to compute metrics over the specific spans you care about, excluding others. It also can reduce the cardinality of generated metrics, and therefore the cost of storing those metrics in a Prometheus-compatible TSDB. ([documentation]({{< relref "../metrics-generator/span_metrics#filtering" >}})) [PR [2274](https://github.com/grafana/tempo/pull/2274)] +- The metrics-generator can now detect virtual nodes ([documentation]({{< relref "../metrics-generator/service_graphs#virtual-nodes" >}})) [PR [2365](https://github.com/grafana/tempo/pull/2365)]. As a result, you'll now see these virtual nodes represented in your service graph. For more information, refer to the [virtual nodes documentation]({{< relref "../metrics-generator/service_graphs#virtual-nodes" >}}). ## Upgrade considerations When [upgrading]({{< relref "../setup/upgrade" >}}) to Tempo 2.2, be aware of these breaking changes: -* JSonnet users only: We've converted the metrics-generator component from a k8s deployment to a k8s statefulset. Refer to the PR for seamless migration instructions. [PRs [#2533](https://github.com/grafana/tempo/pull/2533), [#2467](https://github.com/grafana/tempo/pull/2647)] -* Removed or renamed configuration parameters (see section below) +- JSonnet users only: We've converted the metrics-generator component from a k8s deployment to a k8s statefulset. Refer to the PR for seamless migration instructions. [PRs [#2533](https://github.com/grafana/tempo/pull/2533), [#2467](https://github.com/grafana/tempo/pull/2647)] +- Removed or renamed configuration parameters (see section below) -While not a breaking change, upgrading to Tempo 2.2 will by default change Tempo’s block format to vParquet2. +While not a breaking change, upgrading to Tempo 2.2 will by default change the block format to vParquet2. To stay on a previous block format, read the [Parquet configuration documentation]({{< relref "../configuration/parquet#choose-a-different-block-format" >}}). We strongly encourage upgrading to vParquet2 as soon as possible as this is required for using structural operators in your TraceQL queries and provides query performance improvements, in particular on queries using the `duration` intrinsic. - ### Removed or renamed configuration parameters The following fields were removed or renamed. @@ -136,31 +141,36 @@ The following fields were removed or renamed. ## Bug fixes -For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases). ### 2.2.4 -* Updated Alpine image version to 3.18 to patch [CVE-2022-48174](https://nvd.nist.gov/vuln/detail/CVE-2022-48174) [PR 3046](https://github.com/grafana/tempo/pull/3046) -* Bumped Jaeger query docker image to 1.50.0 [PR 2998](https://github.com/grafana/tempo/pull/2998) +- Updated Alpine image version to 3.18 to patch [CVE-2022-48174](https://nvd.nist.gov/vuln/detail/CVE-2022-48174) [PR 3046](https://github.com/grafana/tempo/pull/3046) +- Bumped Jaeger query docker image to 1.50.0 [PR 2998](https://github.com/grafana/tempo/pull/2998) ### 2.2.3 -* Fixed S3 credentials providers configuration [PR 2889](https://github.com/grafana/tempo/pull/2889) +- Fixed S3 credentials providers configuration [PR 2889](https://github.com/grafana/tempo/pull/2889) ### 2.2.2 -* Fixed node role auth IDMSv1 [PR 2760](https://github.com/grafana/tempo/pull/2760) +- Fixed node role auth IDMSv1 [PR 2760](https://github.com/grafana/tempo/pull/2760) ### 2.2.1 -* Fixed incorrect metrics for index failures [PR 2781](https://github.com/grafana/tempo/pull/2781) -* Fixed a panic in the metrics-generator when using multiple tenants with default overrides [PR 2786](https://github.com/grafana/tempo/pull/2786) -* Restored `tenant_header_key` removed in [PR 2414](https://github.com/grafana/tempo/pull/2414) [PR 2786](https://github.com/grafana/tempo/pull/2795) -* Disabled streaming over HTTP by default [PR 2803](https://github.com/grafana/tempo/pull/2803) +- Fixed incorrect metrics for index failures [PR 2781](https://github.com/grafana/tempo/pull/2781) +- Fixed a panic in the metrics-generator when using multiple tenants with default overrides [PR 2786](https://github.com/grafana/tempo/pull/2786) +- Restored `tenant_header_key` removed in [PR 2414](https://github.com/grafana/tempo/pull/2414) [PR 2786](https://github.com/grafana/tempo/pull/2795) +- Disabled streaming over HTTP by default [PR 2803](https://github.com/grafana/tempo/pull/2803) ### 2.2 -* Fixed an issue in the metrics-generator that prevented scaling up parallelism when remote writing of metrics was lagging behind [PR [2463](https://github.com/grafana/tempo/issues/2463)] -* Fixed an issue where metrics-generator was setting wrong labels for `traces_target_info` [PR [2546](https://github.com/grafana/tempo/pull/2546)] -* Fixed an issue where matches and other spanset level attributes were not persisted to the TraceQL results. [PR [2490](https://github.com/grafana/tempo/pull/2490)] -* Fixed an issue where ingester search could occasionally fail with `file does not exist` error [PR [2534](https://github.com/grafana/tempo/issues/2534)] +- Fixed an issue in the metrics-generator that prevented scaling up parallelism when remote writing of metrics was lagging behind [PR [2463](https://github.com/grafana/tempo/issues/2463)] +- Fixed an issue where metrics-generator was setting wrong labels for `traces_target_info` [PR [2546](https://github.com/grafana/tempo/pull/2546)] +- Fixed an issue where matches and other spanset level attributes were not persisted to the TraceQL results. [PR [2490](https://github.com/grafana/tempo/pull/2490)] +- Fixed an issue where ingester search could occasionally fail with `file does not exist` error [PR [2534](https://github.com/grafana/tempo/issues/2534)] + + + + + \ No newline at end of file diff --git a/docs/sources/tempo/release-notes/v2-3.md b/docs/sources/tempo/release-notes/v2-3.md index ca4236a28c9..6bf352b4779 100644 --- a/docs/sources/tempo/release-notes/v2-3.md +++ b/docs/sources/tempo/release-notes/v2-3.md @@ -7,6 +7,11 @@ weight: 45 # Version 2.3 release notes + + + + + The Tempo team is pleased to announce the release of Tempo 2.3. This release gives you: @@ -28,7 +33,7 @@ This block format improves query performance relative to previous formats. Read the [**Tempo 2.3 blog post**](/blog/2023/11/01/grafana-tempo-2.3-release-faster-trace-queries-traceql-upgrades/) for more examples and details about these improvements. -These release notes highlight the most important features and bugfixes. For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +These release notes highlight the most important features and bugfixes. For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases). {{< youtube id="2FWi9_dSBdM?rel=0" >}} @@ -58,7 +63,7 @@ Unique to Tempo, TraceQL is a query language that lets you perform custom querie To learn more about the TraceQL syntax, see the [TraceQL documentation]({{< relref "../traceql" >}}). For information on planned future extensions to the TraceQL language, refer to [future work]({{< relref "../traceql/architecture" >}}). -We’ve made the following improvements to TraceQL: +We've made the following improvements to TraceQL: * Added two structural operators, ancestor (`<<`) and parent (`<`) ([documentation]({{< relref "../traceql#experimental-structural" >}})) [[PR 2877](https://github.com/grafana/tempo/pull/2877)] @@ -68,13 +73,13 @@ We’ve made the following improvements to TraceQL: * Improved the performance of TraceQL [`select()` queries]({{< relref "../traceql#selection" >}}). Metrics-summary now also correctly handles missing attributes. [[PR 2765](https://github.com/grafana/tempo/pull/2765)] -* Added support for searching by OpenTelemetry’s [span status message ](https://github.com/open-telemetry/opentelemetry-proto/blob/afcd2aa7f728216d5891ffc0d83f09f0278a6611/opentelemetry/proto/trace/v1/trace.proto#L260)using `statusMessage` intrinsic attribute ([documentation]({{< relref "../traceql#intrinsic-fields" >}})) [[PR 2848](https://github.com/grafana/tempo/pull/2848)] +* Added support for searching by OpenTelemetry's [span status message ](https://github.com/open-telemetry/opentelemetry-proto/blob/afcd2aa7f728216d5891ffc0d83f09f0278a6611/opentelemetry/proto/trace/v1/trace.proto#L260)using `statusMessage` intrinsic attribute ([documentation]({{< relref "../traceql#intrinsic-fields" >}})) [[PR 2848](https://github.com/grafana/tempo/pull/2848)] * Fixed cases where an empty filter (`{}`) didn't return expected results [[PR 2498](https://github.com/grafana/tempo/issues/2498)] ### Metrics-generator -We’ve made the following improvements to metrics-generator: +We've made the following improvements to metrics-generator: * Added a scope query parameter to `/api/overrides` so users can choose between fetching the overrides stored by the API and the merged overrides (those actually used by Tempo) [[PR 2915](https://github.com/grafana/tempo/pull/2915), [#3018](https://github.com/grafana/tempo/pull/3018)] * Added `TempoUserConfigurableOverridesReloadFailing` alert [[PR 2784](https://github.com/grafana/tempo/pull/2784)] @@ -90,11 +95,11 @@ When [upgrading]({{< relref "../setup/upgrade" >}}) to Tempo 2.3, be aware of th Although the vParquet3 format isn't yet the default, it's production ready and we highly recommend switching to it for improved query performance and [dedicated attribute columns]({{< relref "../operations/dedicated_columns" >}}). -Upgrading to Tempo 2.3 doesn’t modify the Parquet block format. You can use Tempo 2.3 with vParquet2 or vParquet3. vParquet2 remains the default backend for Tempo 2.3; vParquet3 is available as a stable option. +Upgrading to Tempo 2.3 doesn't modify the Parquet block format. You can use Tempo 2.3 with vParquet2 or vParquet3. vParquet2 remains the default backend for Tempo 2.3; vParquet3 is available as a stable option. {{< admonition type="note" >}} -Tempo 2.2 can’t read data stored in vParquet3. -{{% /admonition %}} +Tempo 2.2 can't read data stored in vParquet3. +{{< /admonition >}} For information on upgrading, refer to [Change the block format to vParquet3]({{< relref "../setup/upgrade" >}}) upgrade documentation. @@ -116,7 +121,7 @@ distributor: ### Changes to the Overrides module configuration -We’ve added a new `defaults` block to the overrides module for configuring global or per-tenant settings. The Overrides change to indented syntax. For more information, read the [Overrides configuration documentation]({{< relref "../configuration#overrides" >}}). +We've added a new `defaults` block to the overrides module for configuring global or per-tenant settings. The Overrides change to indented syntax. For more information, read the [Overrides configuration documentation]({{< relref "../configuration#overrides" >}}). You can also use the Tempo CLI to migrate configurations. Refer to the [documentation]({{< relref "../operations/tempo_cli#migrate-overrides-config-command" >}}). [[PR 2688](https://github.com/grafana/tempo/pull/2688)] @@ -181,13 +186,13 @@ The following vulnerabilities have been addressed: ## Bugfixes -For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases/tag/v2.3.1). ### 2.3.1 * Include statusMessage intrinsic attribute in tag search. [PR 3084](https://github.com/grafana/tempo/pull/3084) * Fix compactor ignore configured S3 headers. [PR 3149](https://github.com/grafana/tempo/pull/3154) -* Readd session token to s3 credentials. [PR 3144](https://github.com/grafana/tempo/pull/3144) +* Read session token to s3 credentials. [PR 3144](https://github.com/grafana/tempo/pull/3144) ### 2.3 @@ -199,7 +204,7 @@ For a complete list, refer to the [Tempo changelog](https://github.com/grafana/t * Aligned `tempo_query_frontend_queries_total` and `tempo_query_frontend_queries_within_slo_total`. [PR 2840](https://github.com/grafana/tempo/pull/2840) This query now correctly tells you `%age` of requests that are within SLO: - ``` + ```traceql sum(rate(tempo_query_frontend_queries_within_slo_total{}[1m])) by (op) / sum(rate(tempo_query_frontend_queries_total{}[1m])) by (op) @@ -209,5 +214,9 @@ For a complete list, refer to the [Tempo changelog](https://github.com/grafana/t * Respected spss on GRPC streaming. [PR 2971](https://github.com/grafana/tempo/pull/2840) * Moved empty root span substitution from `querier` to `query-frontend`. [PR 2671](https://github.com/grafana/tempo/issues/2671) * Ingester errors correctly propagate on the query path [PR 2935](https://github.com/grafana/tempo/issues/2935) -* Fixed an issue where the ingester didn’t stop a query after timeout [PR 3031](https://github.com/grafana/tempo/pull/3031) -* Reordered the S3 credential chain and upgraded `minio-go`. `native_aws_auth_enabled` is deprecated [PR 3006](https://github.com/grafana/tempo/pull/3006) \ No newline at end of file +* Fixed an issue where the ingester didn't stop a query after timeout [PR 3031](https://github.com/grafana/tempo/pull/3031) +* Reordered the S3 credential chain and upgraded `minio-go`. `native_aws_auth_enabled` is deprecated [PR 3006](https://github.com/grafana/tempo/pull/3006) + + + + \ No newline at end of file diff --git a/docs/sources/tempo/release-notes/v2-4.md b/docs/sources/tempo/release-notes/v2-4.md index 287701bb1f2..91d5c6f4a75 100644 --- a/docs/sources/tempo/release-notes/v2-4.md +++ b/docs/sources/tempo/release-notes/v2-4.md @@ -7,6 +7,11 @@ weight: 40 # Version 2.4 release notes + + + + + The Tempo team is pleased to announce the release of Tempo 2.4. This release gives you: @@ -15,11 +20,11 @@ This release gives you: * Performance enhancements, thanks to the addition of new caching tiers * Cost savings, thanks to polling improvements that reduce calls to object storage -As part of this release, vParquet3 has also been promoted to the new default storage format for traces. For more about why we’re so excited about vParquet3, refer to [Accelerate TraceQL queries at scale with dedicated attribute columns in Grafana Tempo](/blog/2024/01/22/accelerate-traceql-queries-at-scale-with-dedicated-attribute-columns-in-grafana-tempo/). +As part of this release, vParquet3 has also been promoted to the new default storage format for traces. For more about why we're so excited about vParquet3, refer to [Accelerate TraceQL queries at scale with dedicated attribute columns in Grafana Tempo](/blog/2024/01/22/accelerate-traceql-queries-at-scale-with-dedicated-attribute-columns-in-grafana-tempo/). Read the [Tempo 2.4 blog post](/blog/2024/02/29/grafana-tempo-2.4-release-traceql-metrics-tiered-caching-and-tco-improvements/) for more examples and details about these improvements. -These release notes highlight the most important features and bugfixes. For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +These release notes highlight the most important features and bug fixes. For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases/tag/v2.4). {{< youtube id="EYUx2DkNRas" >}} @@ -29,20 +34,21 @@ The most important features and enhancements in Tempo 2.4 are highlighted below. ### Multi-tenant queries -Tempo now allows you to query multiple tenants at once. We’ve made multi-tenant queries compatible with streaming ([first released in v2.2]({{< relref "./v2-2#get-traceql-results-faster" >}})) so you can get query results as fast as possible. To learn more, refer to [Cross-tenant federation]({{< relref "../operations/cross_tenant_query" >}}) and [Enable multi-tenancy]({{< relref "../operations/multitenancy" >}}). [PRs [3262](https://github.com/grafana/tempo/pull/3262), [3087](https://github.com/grafana/tempo/pull/3087)] +Tempo now allows you to query multiple tenants at once. We've made multi-tenant queries compatible with streaming ([first released in v2.2]({{< relref "./v2-2#get-traceql-results-faster" >}})) so you can get query results as fast as possible. +To learn more, refer to [Cross-tenant federation]({{< relref "../operations/cross_tenant_query" >}}) and [Enable multi-tenancy]({{< relref "../operations/multitenancy" >}}). [PRs [3262](https://github.com/grafana/tempo/pull/3262), [3087](https://github.com/grafana/tempo/pull/3087)] ### TraceQL metrics (experimental) -We’re excited to announce the addition of metrics queries to the TraceQL language. Metric queries extend trace queries by applying a function to trace query results. +We're excited to announce the addition of metrics queries to the TraceQL language. Metric queries extend trace queries by applying a function to trace query results. This powerful feature creates metrics from traces, much in the same way that LogQL metric queries create metrics from logs. -In this case, we are calculating the rate of the erroring spans coming from the service `foo`. Rate is a `spans/sec` quantity. +In this case, we're calculating the rate of the erroring spans coming from the service `foo`. Rate is a `spans/sec` quantity. -``` +```traceql { resource.service.name = "foo" && status = error } | rate() ``` -In addition, you can use Grafana Explore to [query and visualize the metrics]({{< relref "../operations/traceql-metrics" >}}) with the Tempo data soruce in Grafana or Grafana Cloud. +In addition, you can use Grafana Explore to [query and visualize the metrics]({{< relref "../operations/traceql-metrics" >}}) with the Tempo data source in Grafana or Grafana Cloud. ![Metrics visualization in Grafana](/media/docs/tempo/metrics-explore-sample-2.4.png) @@ -54,15 +60,18 @@ To learn more about the TraceQL syntax, see the [TraceQL documentation]({{< relr We continue to make query performance improvements so you spend less time waiting on results to your TraceQL queries. Below are some notable PRs that made it into this release: -* Improve TraceQL regex performance in certain queries. [PR [3139](https://github.com/grafana/tempo/pull/3139)] +* Improve TraceQL regular expression performance in certain queries. [PR [3139](https://github.com/grafana/tempo/pull/3139)] * Improve TraceQL performance in complex queries. [[PR 3113](https://github.com/grafana/tempo/pull/3113)] * TraceQL/Structural operators performance improvement. [[PR 3088](https://github.com/grafana/tempo/pull/3088)] + + ### vParquet3 is now the default block format Tempo 2.4 makes [vParquet3]({{< relref "../configuration/parquet" >}}) the default storage format. -We’re excited about [vParquet3]({{< relref "../configuration/parquet" >}}) relative to prior formats because of its support for [dedicated attribute columns]({{< relref "../operations/dedicated_columns" >}}), which help speed up queries on your largest and most queried attributes. We've seen excellent performance improvements when running it ourselves, and by promoting it to the default, we're signaling that it is ready for broad adoption. +We're excited about [vParquet3]({{< relref "../configuration/parquet" >}}) relative to prior formats because of its support for [dedicated attribute columns]({{< relref "../operations/dedicated_columns" >}}), which help speed up queries on your largest and most queried attributes. +We've seen excellent performance improvements when running it ourselves, and by promoting it to the default, we're signaling that it's ready for broad adoption. Dedicated attribute columns, available using vParquet3, improve query performance by storing the largest and most frequently used attributes in their own columns, rather than in the generic attribute key-value list. For more information, refer to @@ -72,15 +81,21 @@ If you had manually configured vParquet3, we recommend removing it to move forwa To read more about the design of vParquet3, refer to [the design proposal](https://github.com/grafana/tempo/blob/main/docs/design-proposals/2023-05%20vParquet3.md). For general information, refer to [the Apache Parquet schema]({{< relref "../operations/schema" >}}). + ### Additional caching layers -Tempo has added two new caches to improve TraceQL query performance. The frontend-search cache handles job search caching. The parquet-page cache handles page level caching. Refer to the [Cache section]({{< relref "../configuration#cache" >}}) of the Configuration documentation for how to configure these new caching layers. As part of adding these new caching layers, we’ve refactored our caching interface. This includes breaking changes described in “Breaking Changes”. [PRs [3166](https://github.com/grafana/tempo/pull/3166), [3225](https://github.com/grafana/tempo/pull/3225), [3196](https://github.com/grafana/tempo/pull/3196)] +Tempo has added two new caches to improve TraceQL query performance. The frontend-search cache handles job search caching. +The parquet-page cache handles page level caching. +Refer to the [Cache section]({{< relref "../configuration#cache" >}}) of the Configuration documentation for how to configure these new caching layers. -### Polling improvements for cost reduction +As part of adding these new caching layers, we've refactored our caching interface. +This includes breaking changes described in Breaking Changes. [PRs [3166](https://github.com/grafana/tempo/pull/3166), [3225](https://github.com/grafana/tempo/pull/3225), [3196](https://github.com/grafana/tempo/pull/3196)] -We’ve improved how Tempo polls object storage, ensuring that we reuse previous results. This has dramatically reduced the number of requests Tempo makes to the object store. Not only does this reduce the load on your object store, for many, it will save you money (since most hosted object storage solutions charge per request). +### Improved polling for cost reduction -We’ve also added the `list_blocks_concurrency` parameter to allow you to tune the number of list calls Tempo makes in parallel to object storage so you can select the value that works best for your environment. We’ve set the default value to `3`, which should work well for the average Tempo cluster. [[PR 2652](https://github.com/grafana/tempo/pull/2652)] +We've improved how Tempo polls object storage, ensuring that we reuse previous results. This has dramatically reduced the number of requests Tempo makes to the object store. Not only does this reduce the load on your object store, for many, it will save you money (since most hosted object storage solutions charge per request). + +We've also added the `list_blocks_concurrency` parameter to allow you to tune the number of list calls Tempo makes in parallel to object storage so you can select the value that works best for your environment. We've set the default value to `3`, which should work well for the average Tempo cluster. [[PR 2652](https://github.com/grafana/tempo/pull/2652)] ### Other enhancements and improvements @@ -88,14 +103,14 @@ In addition, the following improvements have been made in Tempo 2.4: * Improved Tempo error handling on writes, so that one erroring trace doesn't result in an entire batch of traces being dropped. [PR 2571](https://github.com/grafana/tempo/pull/2571) * Added per-tenant compaction window. [PR 3129](https://github.com/grafana/tempo/pull/3129) -* Added `--max-start-time` and `--min-start-time` flag to tempo-cli command `analyse blocks`. [PR 3250](https://github.com/grafana/tempo/pull/3250) +* Added `--max-start-time` and `--min-start-time` flag to `tempo-cli` command `analyse blocks`. [PR 3250](https://github.com/grafana/tempo/pull/3250) * Added per-tenant configurable `remote_write` headers to metrics-generator. [#3175](https://github.com/grafana/tempo/pull/3175) * Added variable expansion support to overrides configuration. [PR 3175](https://github.com/grafana/tempo/pull/3175) * Added HTML pages `/status/overrides` and `/status/overrides/{tenant}`. [PR 3244](https://github.com/grafana/tempo/pull/3244) [#3332](https://github.com/grafana/tempo/pull/3332) * Precalculate and reuse the vParquet3 schema before opening blocks. [PR 3367](https://github.com/grafana/tempo/pull/3367) * Made the trace ID label name configurable for remote written exemplars. [PR 3074](https://github.com/grafana/tempo/pull/3074) * Performance improvements in span filtering. [PR 3025](https://github.com/grafana/tempo/pull/3025) -* Introduced localblocks process configuration option to select only server spans. [PR 3303](https://github.com/grafana/tempo/pull/3303) +* Introduced `localblocks` process configuration option to select only server spans. [PR 3303](https://github.com/grafana/tempo/pull/3303) ## Upgrade considerations @@ -103,11 +118,11 @@ When [upgrading]({{< relref "../setup/upgrade" >}}) to Tempo 2.4, be aware of th ### Transition to vParquet 3 -vParquet3 format is now the default block format. It is production ready and we highly recommend switching to it for improved query performance and [dedicated attribute columns]({{< relref "../operations/dedicated_columns" >}}). +vParquet3 format is now the default block format. It's production ready and we highly recommend switching to it for improved query performance and [dedicated attribute columns]({{< relref "../operations/dedicated_columns" >}}). Upgrading to Tempo 2.4 modifies the Parquet block format. Although you can use Tempo 2.3 with vParquet2 or vParquet3, you can only use Tempo 2.4 with vParquet3. -With this release, the first version of our Parquet backend, vParquet, is being deprecated. Tempo 2.4 still reads vParquet1 blocks. However, Tempo will exit with error if they are manually configured. [[PR 3377](https://github.com/grafana/tempo/pull/3377/files#top)] +With this release, the first version of our Parquet backend, vParquet, is being deprecated. Tempo 2.4 still reads vParquet1 blocks. However, Tempo will exit with error if they're manually configured. [[PR 3377](https://github.com/grafana/tempo/pull/3377/files#top)] For information on upgrading, refer to [Upgrade to Tempo 2.4]({{< relref "../setup/upgrade" >}}) and [Choose a different block format]({{< relref "../configuration/parquet#choose-a-different-block-format" >}}) . @@ -214,7 +229,7 @@ This release addresses the following vulnerabilities: ## Bugfixes -For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases). ### 2.4.2 @@ -236,8 +251,13 @@ For a complete list, refer to the [Tempo changelog](https://github.com/grafana/t * Fixed autocomplete filters sometimes returning erroneous results. [PR 3339](https://github.com/grafana/tempo/pull/3339) * Fixed trace context propagation between query-frontend and querier. [PR 3387](https://github.com/grafana/tempo/pull/3387) * Fixed parsing of span.resource.xyz attributes in TraceQL. [PR 3284](https://github.com/grafana/tempo/pull/3284) -* Changed exit code if config is successfully verified. [PR 3174](https://github.com/grafana/tempo/pull/3174) -* The tempo-cli analyze blocks command no longer fails on compacted blocks. [PR 3183](https://github.com/grafana/tempo/pull/3183) +* Changed exit code if configuration is successfully verified. [PR 3174](https://github.com/grafana/tempo/pull/3174) +* The `tempo-cli analyze blocks` command no longer fails on compacted blocks. [PR 3183](https://github.com/grafana/tempo/pull/3183) * Moved waitgroup handling for poller error condition. [PR 3224](https://github.com/grafana/tempo/pull/3224) * Fixed head block excessive locking in ingester search. [PR 3328](https://github.com/grafana/tempo/pull/3328) * Fixed an issue with ingester failed to write traces to disk after a crash or unclean restart. [PR 3346](https://github.com/grafana/tempo/issues/3346) + + + + + \ No newline at end of file diff --git a/docs/sources/tempo/release-notes/v2-5.md b/docs/sources/tempo/release-notes/v2-5.md index c8e5c8a0b9f..34def386029 100644 --- a/docs/sources/tempo/release-notes/v2-5.md +++ b/docs/sources/tempo/release-notes/v2-5.md @@ -7,6 +7,11 @@ weight: 35 # Version 2.5 release notes + + + + + The Tempo team is pleased to announce the release of Tempo 2.5. This release gives you: @@ -15,20 +20,21 @@ This release gives you: * TraceQL enhancements and performance improvements * Performance and stability enhancements -As part of this release, we’ve updated ownership for `/var/tempo` from `root:root` to a new `tempo:tempo` user with a UUID of `10001`. Learn about this breaking change in Upgrade considerations. +As part of this release, we've updated ownership for `/var/tempo` from `root:root` to a new `tempo:tempo` user with a UUID of `10001`. Learn about this breaking change in Upgrade considerations. Read the [Tempo 2.5 blog post](https://grafana.com/blog/2024/06/03/grafana-tempo-2.5-release-vparquet4-streaming-endpoints-and-more-metrics/) for more examples and details about these improvements. -These release notes highlight the most important features and bugfixes. For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +These release notes highlight the most important features and bug fixes. For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases). {{< youtube id="c4gW9fwkLhc" >}} ## Features and enhancements The most important features and enhancements in Tempo 2.5 are highlighted below. + ### Additional TraceQL metrics (experimental) -In this release, we’ve added several [TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/). Tempo 2.4 introduced the `rate()` function to view rates of spans. For this release, we’ve added `quantile_over_time` and `histogram_over_time`. [PR 3605](https://github.com/grafana/tempo/pull/3605), [PR 3633](https://github.com/grafana/tempo/pull/3633), [PR 3644](https://github.com/grafana/tempo/pull/3644)] +In this release, we've added several [TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/). Tempo 2.4 introduced the `rate()` function to view rates of spans. For this release, we've added `quantile_over_time` and `histogram_over_time`. [PR 3605](https://github.com/grafana/tempo/pull/3605), [PR 3633](https://github.com/grafana/tempo/pull/3633), [PR 3644](https://github.com/grafana/tempo/pull/3644)] You can use `quantiles_over_time` allowing users to aggregate numerical values, such as the all-important span duration. Notice that you can specify multiple quantiles in the same query. @@ -48,7 +54,7 @@ With this feature, you can now see partial query results as they come in, so you This is perfect for big queries that take a long time to return a response. The Tempo API endpoints now support gRPC streaming for tag queries and metrics. -We’ve added new streaming endpoints for: +We've added new streaming endpoints for: * `SearchTags` * `SearchTagsV2` @@ -63,17 +69,17 @@ In the Tempo CLI, you can use the `--use-grpc` option to enable GRPC streaming. To learn more, refer to the [Tempo gRPC API](https://grafana.com/docs/tempo/latest/api_docs/#tempo-grpc-api) and [Tempo CLI](https://grafana.com/docs/tempo/latest/operations/tempo_cli/#search) documentation. [PR 3460](https://github.com/grafana/tempo/pull/3460) [[PR #3584](https://github.com/grafana/tempo/pull/3584)] {{< admonition type="note" >}} -NOTE: Streaming over HTTP requires the `stream_over_http_enabled` flag to be set. For more information, refer to [Tempo GRPC API documentation](https://grafana.com/docs/tempo/latest/api_docs/#tempo-grpc-api). +Streaming over HTTP requires the `stream_over_http_enabled` flag to be set. For more information, refer to [Tempo GRPC API documentation](https://grafana.com/docs/tempo/latest/api_docs/#tempo-grpc-api). {{< /admonition >}} -In addition, we’ve reduced memory consumption in the frontend for large traces. [[PR 3522](https://github.com/grafana/tempo/pull/3522)] +In addition, we've reduced memory consumption in the frontend for large traces. [[PR 3522](https://github.com/grafana/tempo/pull/3522)] ### New vParquet4 block format (experimental) New in Tempo 2.5, the vParquet4 block format is required for querying links, events, and arrays and improves query performance relative to previous formats. [[PR 3368](https://github.com/grafana/tempo/pull/3368)] -In addition, we’ve updated the OTLP schema to add attributes to instrumentation scope in vParquet4.[[PR 3649](https://github.com/grafana/tempo/pull/3649)] +In addition, we've updated the OTLP schema to add attributes to instrumentation scope in vParquet4.[[PR 3649](https://github.com/grafana/tempo/pull/3649)] While you can use vParquet4, keep in mind that it's experimental. If you choose to use vParquet4 and then opt to revert to vParquet3, any vParquet4 blocks would not be readable by vParquet3. @@ -86,7 +92,7 @@ To learn more about the TraceQL syntax, see the [TraceQL documentation](https:// For information on planned future extensions to the TraceQL language, refer to [future work](https://github.com/grafana/tempo/blob/main/docs/design-proposals/2023-11%20TraceQL%20Extensions.md). -We’ve made the following improvements to TraceQL: +We've made the following improvements to TraceQL: * Add support for scoped intrinsics using a colon (`:`). The available scoped intrinsics are trace:duration, trace:rootName, trace:rootService, span:duration, span:kind, span:name, span:status, span:statusMessage. [[PR [3629](https://github.com/grafana/tempo/pull/3629)] * Performance improvements on TraceQL and tag value search. [[PR 3650](https://github.com/grafana/tempo/pull/3650),[PR 3667](https://github.com/grafana/tempo/pull/3667)] @@ -104,7 +110,7 @@ We’ve made the following improvements to TraceQL: * Better compaction throughput and memory usage [[PR 3579](https://github.com/grafana/tempo/pull/3579)] * Return a less confusing error message to the client when refusing spans due to ingestion rates. [[PR 3485](https://github.com/grafana/tempo/pull/3485)] * Clean Metrics Generator's Prometheus WAL before creating instance [[PR 3548](https://github.com/grafana/tempo/pull/3548)] -* Delete any remaining objects for empty tenants after a configurable duration, requires config enable [PR 3611](https://github.com/grafana/tempo/pull/3611)] +* Delete any remaining objects for empty tenants after a configurable duration, requires configuration enable [PR 3611](https://github.com/grafana/tempo/pull/3611)] ## Upgrade considerations @@ -119,23 +125,25 @@ The new user `10001` won't have access to the old files created by `root`. The ownership of `/var/tempo` changed from `root:root` to `tempo:tempo` with the UID/GID of `10001`. -The `ingester` and `metrics-generator` statefulsets may need to [run chown](https://opensource.com/article/19/8/linux-chown-command) to change ownership to start properly. +The `ingester` and `metrics-generator` statefulsets may need to [run `chown`](https://opensource.com/article/19/8/linux-chown-command) to change ownership to start properly. Refer to [PR 2265](https://github.com/grafana/tempo/pull/2265) to see a Jsonnet example of an `init` container. -This change doesn’t impact you if you used the Helm chart with the default security context set in the chart. +This change doesn't impact you if you used the Helm chart with the default security context set in the chart. All data should be owned by the `tempo` user already. -The UID won’t impact Helm chart users. +The UID won't impact Helm chart users. + ### Support for vParquet format removed The original vParquet format [has been removed](https://github.com/grafana/tempo/pull/3663) from Tempo 2.5. -Direct upgrades from Tempo 2.1 to Tempo 2.5 are not possible. +Direct upgrades from Tempo 2.1 to Tempo 2.5 aren't possible. You will need to upgrade to an intermediate version and wait for the old vParquet blocks to fall out of retention before upgrading to 2.5. [PR 3663](https://github.com/grafana/tempo/pull/3663)] vParquet(1) won't be recognized as a valid encoding and any remaining vParquet(1) blocks won't be readable. -Installations running with historical defaults should not require any changes as the default has been migrated for several releases. +Installations running with historical defaults shouldn't require any changes as the default has been migrated for several releases. Installations with storage settings pinned to vParquet must run a previous release configured for vParquet2 or higher until all existing vParquet(1) blocks have expired and been deleted from the backend, or else will encounter read errors after upgrading to this release. + ### **Updated, removed, or renamed configuration parameters** @@ -159,7 +167,7 @@ Installations with storage settings pinned to vParquet must run a previous relea ### Additional considerations * Updating to OTLP 1.3.0 removes the deprecated `InstrumentationLibrary` and `InstrumentationLibrarySpan` from the OTLP receivers. [PR 3649](https://github.com/grafana/tempo/pull/3649)] -* Removes the addition of a tenant in multitenant trace id lookup. [PR 3522](https://github.com/grafana/tempo/pull/3522)] +* Removes the addition of a tenant in multi-tenant trace id lookup. [PR 3522](https://github.com/grafana/tempo/pull/3522)] ## Bugfixes @@ -180,3 +188,8 @@ For a complete list, refer to the [Tempo changelog](https://github.com/grafana/t * Fix span-metrics' subprocessors bug that applied wrong configurations when running multiple tenants. [PR 3612](https://github.com/grafana/tempo/pull/3612) * Fix panic in query-frontend when combining results [PR 3683](https://github.com/grafana/tempo/pull/3683) * Fix TraceQL queries involving non boolean operations between statics and attributes. [PR 3698](https://github.com/grafana/tempo/pull/3698) + + + + + \ No newline at end of file diff --git a/docs/sources/tempo/release-notes/v2-6.md b/docs/sources/tempo/release-notes/v2-6.md index 6a753fd6a47..4c04ba8be7f 100644 --- a/docs/sources/tempo/release-notes/v2-6.md +++ b/docs/sources/tempo/release-notes/v2-6.md @@ -6,6 +6,10 @@ weight: 30 --- # Version 2.6 release notes + + + + The Tempo team is pleased to announce the release of Tempo 2.6. @@ -27,7 +31,7 @@ The most important features and enhancements in Tempo 2.6 are highlighted below. ### Additional TraceQL metrics (experimental) -In this release, we’ve added several [TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/). In Tempo 2.6, TraceQL metrics adds: +In this release, we've added several [TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/). In Tempo 2.6, TraceQL metrics adds: * Exemplars [[PR 3824](https://github.com/grafana/tempo/pull/3824), [documentation](https://grafana.com/docs/tempo/next/traceql/metrics-queries/#exemplars)] * Instant metrics queries using `/api/metrics/query` [[PR 3859](https://github.com/grafana/tempo/pull/3859), [documentation](https://grafana.com/docs/tempo/next/api_docs/#traceql-metrics)] @@ -44,9 +48,9 @@ For more information, refer to the [TraceQL metrics queries](https://grafana.com Unique to Tempo, TraceQL is a query language that lets you perform custom queries into your tracing data. To learn more about the TraceQL syntax, refer to the [TraceQL documentation](https://grafana.com/docs/tempo/latest/traceql/). -We’ve added event attributes and link scopes. Like spans, they both have instrinsics and attributes. +We've added event attributes and link scopes. Like spans, they both have instrinsics and attributes. -The `event` scope lets you query events that happen within a span. A span event is a unique point in time during the span’s duration. While spans help build the structural hierarchy of your services, span events can provide a deeper level of granularity to help debug your application faster and maintain optimal performance. To learn more about how you can use span events, read the [What are span events?](https://grafana.com/blog/2024/08/15/all-about-span-events-what-they-are-and-how-to-query-them/) blog post. [PRs [3708](https://github.com/grafana/tempo/pull/3708), [3708](https://github.com/grafana/tempo/pull/3748), [3908](https://github.com/grafana/tempo/pull/3908)] +The `event` scope lets you query events that happen within a span. A span event is a unique point in time during the span's duration. While spans help build the structural hierarchy of your services, span events can provide a deeper level of granularity to help debug your application faster and maintain optimal performance. To learn more about how you can use span events, read the [What are span events?](https://grafana.com/blog/2024/08/15/all-about-span-events-what-they-are-and-how-to-query-them/) blog post. [PRs [3708](https://github.com/grafana/tempo/pull/3708), [3708](https://github.com/grafana/tempo/pull/3748), [3908](https://github.com/grafana/tempo/pull/3908)] If you've instrumented your traces for span links, you can use the `link` scope to search for an attribute within a span link. A span link associates one span with one or more other spans. [PRs [3814](https://github.com/grafana/tempo/pull/3814), [3741](https://github.com/grafana/tempo/pull/3741)] @@ -60,7 +64,7 @@ You can search for an attribute in your link: ![A TraceQL example showing `link` scope](/media/docs/grafana/data-sources/tempo/query-editor/traceql-link-example.png) -We’ve also added autocomplete support for `events` and `links`. [[PR 3846](https://github.com/grafana/tempo/pull/3846)] +We've also added autocomplete support for `events` and `links`. [[PR 3846](https://github.com/grafana/tempo/pull/3846)] Tempo 2.6 improves TraceQL performance with these updates: @@ -243,11 +247,11 @@ Storage:
### Other breaking changes -* **BREAKING CHANGE** tempo-query is no longer a Jaeger instance with grpcPlugin. It's now a standalone server. Serving a gRPC API for Jaeger on `0.0.0.0:7777` by default. [[PR 3840]](https://github.com/grafana/tempo/issues/3840) +* **BREAKING CHANGE** `tempo-query` is no longer a Jaeger instance with grpcPlugin. It's now a standalone server. Serving a gRPC API for Jaeger on `0.0.0.0:7777` by default. [[PR 3840]](https://github.com/grafana/tempo/issues/3840) ## Bugfixes -For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases). +For a complete list, refer to the [Tempo CHANGELOG](https://github.com/grafana/tempo/releases). ### 2.6.1 @@ -260,7 +264,7 @@ For a complete list, refer to the [Tempo changelog](https://github.com/grafana/t * Fix metrics query histograms and quantiles on `traceDuration`. [[PR 3879](https://github.com/grafana/tempo/pull/3879)] * Fix divide by 0 bug in query frontend exemplar calculations. [[PR 3936](https://github.com/grafana/tempo/pull/3936)] * Fix autocomplete of a query using scoped instrinsics. [[PR 3865](https://github.com/grafana/tempo/pull/3865)] -* Improved handling of complete blocks in localblocks processor after enabling flushing. [[PR 3805](https://github.com/grafana/tempo/pull/3805)] +* Improved handling of complete blocks in the `localblocks` processor after enabling flushing. [[PR 3805](https://github.com/grafana/tempo/pull/3805)] * Fix double appending the primary iterator on second pass with event iterator. [[PR 3903](https://github.com/grafana/tempo/pull/3903)] * Fix frontend parsing error on cached responses [[PR 3759](https://github.com/grafana/tempo/pull/3759)] * `max_global_traces_per_user`: take into account `ingestion.tenant_shard_size` when converting to local limit. [[PR 3618](https://github.com/grafana/tempo/pull/3618)] @@ -271,3 +275,8 @@ For a complete list, refer to the [Tempo changelog](https://github.com/grafana/t * Correct block end time when the ingested traces are outside the ingestion slack. [[PR 3954](https://github.com/grafana/tempo/pull/3954)] * Fix race condition where a streaming response could be marshaled while being modified in the combiner resulting in a panic. [[PR 3961](https://github.com/grafana/tempo/pull/3961)] * Pass search options to the backend for `SearchTagValuesBlocksV2` requests. [[PR 3971](https://github.com/grafana/tempo/pull/3971)] + + + + + \ No newline at end of file diff --git a/docs/sources/tempo/release-notes/v2-7.md b/docs/sources/tempo/release-notes/v2-7.md index a318c819009..8ec6c451e5b 100644 --- a/docs/sources/tempo/release-notes/v2-7.md +++ b/docs/sources/tempo/release-notes/v2-7.md @@ -7,12 +7,17 @@ weight: 25 # Version 2.7 release notes + + + + + The Tempo team is pleased to announce the release of Tempo 2.7. This release gives you: * Ability to precisely track ingested traffic and attribute costs based on custom labels -* A series of enhancements that significantly boost Tempo's performance and reduce its overall resource footprint. +* A series of enhancements that significantly boost Tempo performance and reduce its overall resource footprint. * New TraceQL capabilities * Improvements to TraceQL metrics @@ -34,7 +39,7 @@ This functionality provides a more accurate alternative to existing size-based m Modern organizations are increasingly reliant on distributed traces for observability, yet reconciling the costs associated with different teams, services, or departments can be challenging. The existing size metric isn't accurate enough by missing non-span data and can lead to under- or over-counting. -Tempo’s new usage tracking feature overcomes these issues by splitting resource-level data fairly and providing up to 99% accuracy—perfect for cost reconciliation. +The new usage tracking feature overcomes these issues by splitting resource-level data fairly and providing up to 99% accuracy—perfect for cost reconciliation. Unlike the previous method, this new feature precisely accounts for every byte of trace data in the distributor—the only Tempo component with the original payload. A new API endpoint, `/usage_metrics`, exposes the per-tenant metrics on ingested data and cost attribution, and can be controlled with per-tenant configuration. @@ -45,13 +50,13 @@ For additional information, refer to the [Usage metrics documentation](https://g ### Major performance and memory usage improvements -We're excited to announce a series of enhancements that significantly boost Tempo's performance and reduce its overall resource footprint. +We're excited to announce a series of enhancements that significantly boost Tempo performance and reduce its overall resource footprint. **Better refuse large traces:** The ingester now reuses generator code to better detect and reject oversized traces. This change makes trace ingestion more reliable and prevents capacity overloads. Plus, two new metrics `tempo_metrics_generator_live_trace_bytes` and `tempo_ingester_live_trace_bytes` provide deeper visibility into per-tenant byte usage. ([#4365](https://github.com/grafana/tempo/pull/4365)) -**Reduced allocations:** We’ve refined how the query-frontend handles incoming traffic to eliminate unnecessary allocations when query demand is low. +**Reduced allocations:** We've refined how the query-frontend handles incoming traffic to eliminate unnecessary allocations when query demand is low. As part of these improvements, the `querier_forget_delay` configuration option has been removed because it no longer served a practical purpose. ([#3996](https://github.com/grafana/tempo/pull/3996)) This release also reduces the ingester working set by improving prelloc behavior. It also adds tunable prealloc env variables `PREALLOC_BKT_SIZE`, `PREALLOC_NUM_BUCKETS`, `PREALLOC_MIN_BUCKET`, and metric `tempo_ingester_prealloc_miss_bytes_total` to observe and tune prealloc behavior. ([#4344](https://github.com/grafana/tempo/pull/4344), [#4369](https://github.com/grafana/tempo/pull/4369)) @@ -64,13 +69,13 @@ It also adds tunable prealloc env variables `PREALLOC_BKT_SIZE`, `PREALLOC_NUM_B New in Tempo 2.7, TraceQL now allows you to query the [instrumentation scope](https://opentelemetry.io/docs/concepts/instrumentation-scope/) fields ([#3967](https://github.com/grafana/tempo/pull/3967)), letting you filter and explore your traces based on where and how they were instrumented. -We’ve extended TraceQL to automatically collect matches from array values ([#3867](https://github.com/grafana/tempo/pull/3867)), making it easier to parse spans containing arrays of attributes. +We've extended TraceQL to automatically collect matches from array values ([#3867](https://github.com/grafana/tempo/pull/3867)), making it easier to parse spans containing arrays of attributes. Query times are notably faster, thanks to multiple optimizations ([#4114](https://github.com/grafana/tempo/pull/4114), [#4163](https://github.com/grafana/tempo/pull/4163), [#4438](https://github.com/grafana/tempo/pull/4438)). -Whether you’re running standard queries or advanced filters, you should see a significant speed boost. +Whether you're running standard queries or advanced filters, you should see a significant speed boost. -Tempo now uses the Prometheus “fast regex” engine to accelerate regular expression-based filtering ([#4329](https://github.com/grafana/tempo/pull/4329)). -As part of this update, all regex matches are now fully anchored. +Tempo now uses the Prometheus "fast regex" engine to accelerate regular expression-based filtering ([#4329](https://github.com/grafana/tempo/pull/4329)). +As part of this update, all regular expressions matches are now fully anchored. This breaking change means `span.foo =~ "bar"` is evaluated as `span.foo =~ "^bar$"`. Update any affected queries accordingly. @@ -82,7 +87,7 @@ Refer to the [query v2 API endpoint](https://grafana.com/docs/tempo/ + + + \ No newline at end of file diff --git a/docs/sources/tempo/setup/deployment.md b/docs/sources/tempo/setup/deployment.md index 7d7df00a2e5..b0f7275c07c 100644 --- a/docs/sources/tempo/setup/deployment.md +++ b/docs/sources/tempo/setup/deployment.md @@ -19,7 +19,7 @@ which is the monolithic deployment mode. {{< admonition type="note" >}} _Monolithic mode_ was previously called _single binary mode_. Similarly _scalable monolithic mode_ was previously called _scalable single binary mode_. While the documentation has been updated to reflect this change, some URL names and deployment tooling (for example, Helm charts) do not yet reflect this change. -{{% /admonition %}} +{{< /admonition >}} ## Monolithic mode @@ -75,7 +75,7 @@ Tempo can be easily deployed through a number of tools, including Helm, Tanka, K {{< admonition type="note" >}} The Tanka and Helm examples are equivalent. They are both provided for people who prefer different configuration mechanisms. -{{% /admonition %}} +{{< /admonition >}} ### Helm diff --git a/docs/sources/tempo/setup/operator/multitenancy.md b/docs/sources/tempo/setup/operator/multitenancy.md index 57239a27b8f..5f3056eaf1d 100644 --- a/docs/sources/tempo/setup/operator/multitenancy.md +++ b/docs/sources/tempo/setup/operator/multitenancy.md @@ -19,7 +19,7 @@ The following Kubernetes Custom Resource (CR) deploys a multi-tenant Tempo insta {{< admonition type="note" >}} Jaeger query is not tenant aware and therefore is not supported in this configuration. -{{% /admonition %}} +{{< /admonition >}} ```yaml apiVersion: tempo.grafana.com/v1alpha1 diff --git a/docs/sources/tempo/setup/tanka.md b/docs/sources/tempo/setup/tanka.md index 826e8306790..f0ce265d449 100644 --- a/docs/sources/tempo/setup/tanka.md +++ b/docs/sources/tempo/setup/tanka.md @@ -22,7 +22,7 @@ To set up Tempo using Kubernetes with Tanka, you need to: {{< admonition type="note" >}} This configuration is not suitable for a production environment but can provide a useful way to learn about Tempo. -{{% /admonition %}} +{{< /admonition >}} ## Before you begin @@ -376,7 +376,7 @@ To change the resources requirements, follow these steps: {{< admonition type="note" >}} Lowering these requirements can impact overall performance. -{{% /admonition %}} +{{< /admonition >}} ## Deploy Tempo using Tanka @@ -393,7 +393,7 @@ If the ingesters don’t start after deploying Tempo with the Tanka command, thi pvc_storage_class: 'standard', }, ``` -{{% /admonition %}} +{{< /admonition >}} ## Next steps diff --git a/docs/sources/tempo/setup/upgrade.md b/docs/sources/tempo/setup/upgrade.md index c04c888c84c..c3e87000cf3 100644 --- a/docs/sources/tempo/setup/upgrade.md +++ b/docs/sources/tempo/setup/upgrade.md @@ -7,8 +7,13 @@ weight: 310 # Upgrade your Tempo installation -You can upgrade an existing Tempo installation to the next version. -However, any new release has the potential to have breaking changes that should be tested in a non-production environment prior to rolling these changes to production. + + + + +You can upgrade a Tempo installation to the next version. +However, any release has the potential to have breaking changes. +We recommend testing in a non-production environment prior to rolling these changes to production. The upgrade process changes for each version, depending upon the changes made for the subsequent release. @@ -18,7 +23,7 @@ For detailed information about any release, refer to the [Release notes](../rele {{< admonition type="tip" >}} You can check your configuration options using the [`status` API endpoint]({{< relref "../api_docs#status" >}}) in your Tempo installation. -{{% /admonition %}} +{{< /admonition >}} ## Upgrade to Tempo 2.7 @@ -73,13 +78,15 @@ query_frontend: ### Tempo serverless deprecation -Tempo serverless is now officially deprecated and will be removed in an upcoming release. Prepare to migrate any serverless workflows to alternative deployments. ([#4017](https://github.com/grafana/tempo/pull/4017), [documentation](https://grafana.com/docs/tempo/latest/operations/backend_search/#serverless-environment)) +Tempo serverless is officially deprecated and will be removed in an upcoming release. +Prepare to migrate any serverless workflows to alternative deployments. ([#4017](https://github.com/grafana/tempo/pull/4017), [documentation](https://grafana.com/docs/tempo/latest/operations/backend_search/#serverless-environment)) -There are no changes to this release for serverless. However, you’ll need to remove these configurations before the next release. +There are no changes to this release for serverless. However, you'll need to remove these configurations before the next release. -### Anchored regex matchers in TraceQL +### Anchored regular expressions matchers in TraceQL -Regex matchers in TraceQL are now fully anchored using Prometheus’s fast regexp. For instance, `span.foo =~ "bar"` is interpreted as `span.foo =~ "^bar$"`. Adjust existing queries accordingly. ([#4329](https://github.com/grafana/tempo/pull/4329)) +Regex matchers in TraceQL are now fully anchored using Prometheus's fast regexp. +For instance, `span.foo =~ "bar"` is interpreted as `span.foo =~ "^bar$"`. Adjust existing queries accordingly. ([#4329](https://github.com/grafana/tempo/pull/4329)) For more information, refer to the [Comparison operators TraceQL](http://localhost:3002/docs/tempo//traceql/#comparison-operators) documentation. @@ -146,8 +153,8 @@ Use the following settings: ### Other upgrade considerations * The Tempo CLI now targets the `/api/v2/traces` endpoint by default. Use the `--v1` flag if you still rely on the older `/api/traces` endpoint. ([#4127](https://github.com/grafana/tempo/pull/4127)) -* If you already set the `X-Scope-OrgID` header in per-tenant overrides or global Tempo config, it is now honored and not overwritten by Tempo. This may change behavior if you previously depended on automatic injection. ([#4021](https://github.com/grafana/tempo/pull/4021)) -* The AWS Lambda build output changes from main to bootstrap. Follow [AWS’s migration steps](https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-from-the-go1-x-runtime-to-the-custom-runtime-on-amazon-linux-2/) to ensure your Lambda functions continue to work. ([#3852](https://github.com/grafana/tempo/pull/3852)) +* If you already set the `X-Scope-OrgID` header in per-tenant overrides or global Tempo configuration, it is now honored and not overwritten by Tempo. This may change behavior if you previously depended on automatic injection. ([#4021](https://github.com/grafana/tempo/pull/4021)) +* The AWS Lambda build output changes from main to bootstrap. Follow the [AWS migration steps](https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-from-the-go1-x-runtime-to-the-custom-runtime-on-amazon-linux-2/) to ensure your Lambda functions continue to work. ([#3852](https://github.com/grafana/tempo/pull/3852)) ## Upgrade to Tempo 2.6 @@ -157,11 +164,11 @@ Tempo 2.6 has several considerations for any upgrade: * vParquet4 is now the default block format * Updated, removed, or renamed parameters -For a complete list of changes, refer to the [Temopo 2.6 changelog](https://github.com/grafana/tempo/releases/tag/v2.6.0). +For a complete list of changes, refer to the [Tempo 2.6 CHANGELOG](https://github.com/grafana/tempo/releases/tag/v2.6.0). ### Operational change for TraceQL metrics -We've changed to an RF1 (Replication Factor 1) pattern for TraceQL metrics as we were unable to hit performance goals for RF3 de-duplication. This requires some operational changes to query TraceQL metrics. +We've changed to an RF1 (Replication Factor 1) pattern for TraceQL metrics as we were unable to hit performance goals for RF3 deduplication. This requires some operational changes to query TraceQL metrics. TraceQL metrics are still considered experimental, but we hope to mark them GA soon when we productionize a complete RF1 write-read path. [PRs [3628](https://github.com/grafana/tempo/pull/3628), [3691]([https://github.com/grafana/tempo/pull/3691](https://github.com/grafana/tempo/pull/3691)), [3723]([https://github.com/grafana/tempo/pull/3723](https://github.com/grafana/tempo/pull/3723)), [3995]([https://github.com/grafana/tempo/pull/3995](https://github.com/grafana/tempo/pull/3995))] @@ -178,7 +185,7 @@ The local-blocks processor must be enabled to start using metrics queries like ` - local-blocks ``` -* By default, for all tenants in the main config: +* By default, for all tenants in the main configuration: ```yaml overrides: @@ -274,22 +281,22 @@ For information on upgrading, refer to [Upgrade to Tempo 2.6](https://grafana.co -### tempo-query is a standalone server +### `tempo-query` is a standalone server -With Tempo 2.6.1, tempo-query is no longer a Jaeger instance with grpcPlugin. -It’s now a standalone server. +With Tempo 2.6.1, `tempo-query` is no longer a Jaeger instance with `grpcPlugin`. +It's now a standalone server. Serving a gRPC API for Jaeger on 0.0.0.0:7777 by default. [PR 3840] ## Upgrade to Tempo 2.5 Tempo 2.5 has several considerations for any upgrade: -* Docker image runs as new UID +* Docker image runs as a new UID * Support for vParquet format removed * Experimental vParquet4 block format * Removed configuration parameters -For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.5 changelog](https://github.com/grafana/tempo/releases/tag/v2.5.0). +For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.5 CHANGELOG](https://github.com/grafana/tempo/releases/tag/v2.5.0). ### Docker image runs as new UID @@ -300,24 +307,25 @@ The new user `10001` won't have access to the old files created by `root`. The ownership of `/var/tempo` changed from `root:root` to `tempo:tempo` with the UID/GID of `10001`. -The `ingester` and `metrics-generator` statefulsets may need to [run chown](https://opensource.com/article/19/8/linux-chown-command) to change ownership to start properly. +The `ingester` and `metrics-generator` statefulsets may need to [run `chown`](https://opensource.com/article/19/8/linux-chown-command) to change ownership to start properly. Refer to [PR 2265](https://github.com/grafana/tempo/pull/2265) to see a Jsonnet example of an `init` container. -This change doesn’t impact you if you used the Helm chart with the default security context set in the chart. +This change doesn't impact you if you used the Helm chart with the default security context set in the chart. All data should be owned by the `tempo` user already. -The UID won’t impact Helm chart users. +The UID won't impact Helm chart users. + ### Support for vParquet format removed The original vParquet format [has been removed](https://github.com/grafana/tempo/pull/3663) from Tempo 2.5. -Direct upgrades from Tempo 2.1 to Tempo 2.5 are not possible. +Direct upgrades from Tempo 2.1 to Tempo 2.5 aren't possible. You will need to upgrade to an intermediate version and wait for the old vParquet blocks to fall out of retention before upgrading to 2.5. [PR 3663](https://github.com/grafana/tempo/pull/3663)] -vParquet(1) won't be recognized as a valid encoding and any remaining vParquet(1) blocks will not be readable. +vParquet(1) won't be recognized as a valid encoding and any remaining vParquet(1) blocks won't be readable. Installations running with historical defaults should not require any changes as the default has been migrated for several releases. -Installations with storage settings pinned to vParquet must run a previous release configured for vParquet2 or higher until all existing vParquet(1) blocks have expired and been deleted from the backend, or else will encounter read errors after upgrading to this release. +Installations with storage settings pinned to vParquet must run a previous release configured for vParquet2 or higher until all existing vParquet(1) blocks have expired and been deleted from the backend, or else you'll encounter read errors after upgrading to this release. ### Experimental vParquet4 block format @@ -328,6 +336,7 @@ If you choose to use vParquet4 and then opt to revert to vParquet3, any vParquet To try vParquet4, refer to [Choose a block format](https://grafana.com/docs/tempo/latest/configuration/parquet/#choose-a-different-block-format). + ### Removed configuration parameters @@ -350,17 +359,17 @@ To try vParquet4, refer to [Choose a block format](https://grafana.com/docs/temp ### Additional considerations * Updating to OTLP 1.3.0 removes the deprecated `InstrumentationLibrary` and `InstrumentationLibrarySpan` from the OTLP receivers. [PR 3649](https://github.com/grafana/tempo/pull/3649)] -* Removes the addition of a tenant in multitenant trace id lookup. [PR 3522](https://github.com/grafana/tempo/pull/3522)] +* Removes the addition of a tenant in multi-tenant trace id lookup. [PR 3522](https://github.com/grafana/tempo/pull/3522)] ## Upgrade to Tempo 2.4 Tempo 2.4 has several considerations for any upgrade: - + * vParquet3 is now the default backend * Caches configuration was refactored * Updated, removed, and renamed configuration parameters -For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.4 changelog](https://github.com/grafana/tempo/releases). +For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.4 CHANGELOG](https://github.com/grafana/tempo/releases). ### Transition to vParquet3 as default block format @@ -369,14 +378,15 @@ vParquet3 format is now the default block format. It is production ready and we Upgrading to Tempo 2.4 modifies the Parquet block format. Although you can use Tempo 2.3 with vParquet2 or vParquet3, you can only use Tempo 2.4 with vParquet3. With this release, the first version of our Parquet backend, vParquet, is being deprecated. -Tempo 2.4 will still read vParquet1 blocks. +Tempo 2.4 still reads vParquet1 blocks. However, Tempo will exit with error if they are manually configured. [[PR 3377](https://github.com/grafana/tempo/pull/3377/files#top)] For information on changing the vParquet version, refer to [Choose a different block format](https://grafana.com/docs/tempo/next/configuration/parquet#choose-a-different-block-format). + ### Cache configuration refactored -The major cache refactor to allow multiple role-based caches to be configured. [[PR 3166](https://github.com/grafana/tempo/pull/3166)] +The major cache refactor lets you configure multiple role-based caches. [[PR 3166](https://github.com/grafana/tempo/pull/3166)] This change resulted in several fields being deprecated (refer to the old configuration). These fields have all been migrated to a top level `cache:` field. @@ -448,7 +458,7 @@ Tempo 2.3 has several considerations for any upgrade: * New `defaults` block in Overrides module configuration * Several configuration parameters have been renamed or removed. -For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.3 changelog](https://github.com/grafana/tempo/releases). +For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.3 CHANGELOG](https://github.com/grafana/tempo/releases). ### Production-ready vParquet3 block format @@ -459,11 +469,11 @@ This block format is required for using dedicated attribute columns. While vParquet2 remains the default backend for Tempo 2.3, vParquet3 is available as a stable option. Both work with Tempo 2.3. -Upgrading to Tempo 2.3 doesn’t modify the Parquet block format. +Upgrading to Tempo 2.3 doesn't modify the Parquet block format. {{< admonition type="note" >}} -Tempo 2.2 can’t read data stored in vParquet3. -{{% /admonition %}} +Tempo 2.2 can't read data stored in vParquet3. +{{< /admonition >}} Recommended update process: @@ -550,11 +560,11 @@ Tempo 2.2 has several considerations for any upgrade: * vParquet2 is now the default block format * Several configuration parameters have been renamed or removed. -For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.2 changelog](https://github.com/grafana/tempo/releases). +For a complete list of changes, enhancements, and bug fixes, refer to the [Tempo 2.2 CHANGELOG](https://github.com/grafana/tempo/releases). ### Default block format changed to vParquet2 -While not a breaking change, upgrading to Tempo 2.2 by default changes Tempo’s block format to vParquet2. +While not a breaking change, upgrading to Tempo 2.2 by default changes Tempo's block format to vParquet2. To stay on a previous block format, read the [Parquet configuration documentation]({{< relref "../configuration/parquet#choose-a-different-block-format" >}}). We strongly encourage upgrading to vParquet2 as soon as possible as this is required for using structural operators in your TraceQL queries and provides query performance improvements, in particular on queries using the `duration` intrinsic. @@ -565,11 +575,11 @@ Tempo 2.2 updates the `microservices` JSonnet to support a `statefulset` for the {{< admonition type="note" >}} This update is important if you use the experimental `local-blocks` processor. -{{% /admonition %}} +{{< /admonition >}} To support a new `processor`, the metrics-generator has been converted from a `deployment` into a `statefulset` with a PVC. This requires manual intervention to migrate successfully and avoid downtime. -Note that currently both a `deployment` and a `statefulset` will be managed by the JSonnet for a period of time, after which we will delete the deployment from this repo and you will need to delete user-side references to the `tempo_metrics_generator_deployment`, as well as delete the deployment itself. +Note that currently both a `deployment` and a `statefulset` will be managed by the JSonnet for a period of time, after which we will delete the deployment from this repository and you will need to delete user-side references to the `tempo_metrics_generator_deployment`, as well as delete the deployment itself. Refer to the PR for seamless migration instructions. [PRs [2533](https://github.com/grafana/tempo/pull/2533), [2467](https://github.com/grafana/tempo/pull/2467)] @@ -610,9 +620,9 @@ tempo_ingester_trace_search_bytes_discarded_total ``` ### Upgrade path to maintain search from Tempo 1.x to 2.1 - + Removing support for search on v2 blocks means that if you upgrade directly from 1.9 to 2.1, you will not be able to search your v2 blocks. To avoid this, upgrade to 2.0 first, since 2.0 supports searching both v2 and vParquet blocks. You can let your old v2 blocks gradually age out while Tempo creates new vParquet blocks from incoming traces. Once all of your v2 blocks have been deleted and you only have vParquet format-blocks, you can upgrade to Tempo 2.1. All of your blocks will be searchable. - + Parquet files are no longer cached when carrying out searches. ### Breaking changes to metric names exposed by Tempo @@ -625,27 +635,28 @@ The `query_frontend_result_metrics_inspected_bytes` metric was removed in favor ## Upgrade from Tempo 1.5 to 2.0 -Tempo 2.0 marks a major milestone in Tempo’s development. When planning your upgrade, consider these factors: +Tempo 2.0 marks a major milestone in Tempo development. When planning your upgrade, consider these factors: - Breaking changes: - Renamed, removed, and moved configurations are described in section below. - The `TempoRequestErrors` alert was removed from mixin. Any Jsonnet users relying on this alert should copy this into their own environment. - Advisory: - - Changed defaults – Are these updates relevant for your installation? + - Changed defaults. Are these updates relevant for your installation? - TraceQL editor needs to be enabled in Grafana to use the query editor. - Resource requirements have changed for Tempo 2.0 with the default configuration. Once you upgrade to Tempo 2.0, there is no path to downgrade. {{< admonition type="note" >}} -There is a potential issue loading Tempo 1.5's experimental Parquet storage blocks. You may see errors or even panics in the compactors. We have only been able to reproduce this with interim commits between 1.5 and 2.0, but if you experience any issues please [report them](https://github.com/grafana/tempo/issues/new?assignees=&labels=&template=bug_report.md&title=) so we can isolate and fix this issue. -{{% /admonition %}} +There is a potential issue loading experimental Parquet storage blocks. You may see errors or even panics in the compactors. We have only been able to reproduce this with interim commits between 1.5 and 2.0, but if you experience any issues, [report them](https://github.com/grafana/tempo/issues/new?assignees=&labels=&template=bug_report.md&title=). +{{< /admonition >}} ### Check Tempo installation resource allocation -Parquet provides faster search and is required to enable TraceQL. However, the Tempo installation will require additional CPU and memory resources to use Parquet efficiently. Parquet is more costly due to the extra work of building the columnar blocks, and operators should expect at least 1.5x increase in required resources to run a Tempo 2.0 cluster. Most users will find these extra resources are negligible compared to the benefits that come from the additional features of TraceQL and from storing traces in an open format. +Parquet provides faster search and is required to enable TraceQL. However, the Tempo installation requires additional CPU and memory resources to use Parquet efficiently. Parquet is more costly due to the extra work of building the columnar blocks, and operators should expect at least 1.5x increase in required resources to run a Tempo 2.0 cluster. Most users find these extra resources are negligible compared to the benefits that come from the additional features of TraceQL and from storing traces in an open format. -You can can continue using the previous `v2` block format using the instructions provided in the [Parquet configuration documentation]({{< relref "../configuration/parquet" >}}). Tempo will continue to support trace by id lookup on the `v2` format for the foreseeable future. +You can continue using the previous `v2` block format using the instructions provided in the [Parquet configuration documentation]({{< relref "../configuration/parquet" >}}). +Tempo continues to support trace by id lookup on the `v2` format for the foreseeable future. ### Enable TraceQL in Grafana @@ -698,3 +709,6 @@ storage: storage_account_key: container_name: ``` + + + diff --git a/docs/sources/tempo/shared/best-practices-traces.md b/docs/sources/tempo/shared/best-practices-traces.md index 9a2a790c59f..cda1618ab2c 100644 --- a/docs/sources/tempo/shared/best-practices-traces.md +++ b/docs/sources/tempo/shared/best-practices-traces.md @@ -94,6 +94,6 @@ You can consider breaking up the spans in several ways: - For long-running operations, you could create a new span for every predetermined interval of execution time. {{< admonition type="note" >}} This requires time-based tracking in your application's code and is more complex to implement. - {{% /admonition %}} + {{< /admonition >}} - Use span linking - Should data flow hit bottlenecks where further operations on that data might be batched at a later time, the use of [span links](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md#links-between-spans) can help keep traces constrained to an acceptable time range, while sharing context with other traces that work on the same data. This can also improve the readability of traces. diff --git a/docs/sources/tempo/troubleshooting/send-traces/alloy.md b/docs/sources/tempo/troubleshooting/send-traces/alloy.md index a5909aa6f79..993919ca4a2 100644 --- a/docs/sources/tempo/troubleshooting/send-traces/alloy.md +++ b/docs/sources/tempo/troubleshooting/send-traces/alloy.md @@ -12,7 +12,7 @@ aliases: # Troubleshoot Grafana Alloy Sometimes it can be difficult to tell what, if anything, Grafana Alloy is sending along to the backend. -This document focuses on a few techniques to gain visibility on how many trace spans are pushed to Alloy and if they're making it to the backend. +This document focuses on a few techniques to gain visibility on how many trace spans push to Alloy and if they're making it to the backend. [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) form the basis of the tracing pipeline, which does a fantastic job of logging network and other issues.