-
Notifications
You must be signed in to change notification settings - Fork 543
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Add service graph metrics queries * Fix broken links * Added intro * Update docs/sources/tempo/metrics-generator/service_graphs/metrics-queries.md * Update metrics-queries.md Tried a small language tweak * Apply suggestions from code review Co-authored-by: Jennifer Villa <[email protected]> --------- Co-authored-by: Jennifer Villa <[email protected]> (cherry picked from commit f1c7e6b) Co-authored-by: Kim Nylander <[email protected]>
- Loading branch information
1 parent
a48e5bb
commit 39a1c6f
Showing
8 changed files
with
180 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
56 changes: 56 additions & 0 deletions
56
docs/sources/tempo/metrics-generator/service_graphs/enable-service-graphs.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
aliases: | ||
- /docs/tempo/latest/server_side_metrics/service_graphs/ | ||
- /docs/tempo/latest/metrics-generator/service_graphs/ | ||
title: Enable service graphs | ||
description: Learn how to enable service graphs | ||
weight: 300 | ||
--- | ||
|
||
|
||
## Enable service graphs | ||
|
||
Service graphs are generated in Tempo and pushed to a metrics storage. | ||
Then, they can be represented in Grafana as a graph. | ||
You will need those components to fully use service graphs. | ||
|
||
{{% admonition type="note" %}} | ||
Cardinality can pose a problem when you have lots of services. | ||
To learn more about cardinality and how to perform a dry run of the metrics generator, see the [Cardinality documentation]({{< relref "../cardinality" >}}). | ||
{{% /admonition %}} | ||
|
||
### Enable service graphs in Tempo/GET | ||
|
||
To enable service graphs in Tempo/GET, enable the metrics generator and add an overrides section which enables the `service-graphs` generator. | ||
For more information, refer to the [configuration details]({{< relref "../../configuration#metrics-generator" >}}). | ||
|
||
### Enable service graphs in Grafana | ||
|
||
{{% admonition type="note" %}} | ||
Since Grafana 9.0.4, service graphs have been enabled by default. Prior to Grafana 9.0.4, service graphs were hidden | ||
under the [feature toggle](/docs/grafana/latest/setup-grafana/configure-grafana/#feature_toggles) `tempoServiceGraph`. | ||
{{% /admonition %}} | ||
|
||
Configure a Tempo data source's service graphs by linking to the Prometheus backend where metrics are being sent: | ||
|
||
``` | ||
apiVersion: 1 | ||
datasources: | ||
# Prometheus backend where metrics are sent | ||
- name: Prometheus | ||
type: prometheus | ||
uid: prometheus | ||
url: <prometheus-url> | ||
jsonData: | ||
httpMethod: GET | ||
version: 1 | ||
- name: Tempo | ||
type: tempo | ||
uid: tempo | ||
url: <tempo-url> | ||
jsonData: | ||
httpMethod: GET | ||
serviceMap: | ||
datasourceUid: 'prometheus' | ||
version: 1 | ||
``` |
48 changes: 48 additions & 0 deletions
48
docs/sources/tempo/metrics-generator/service_graphs/estimate-cardinality.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
title: Estimate cardinality from traces | ||
menuTitle: Estimate cardinality | ||
description: Service graphs help you understand the structure of a distributed system and the connections and dependencies between its components. | ||
weight: 300 | ||
--- | ||
|
||
## Estimate cardinality from traces | ||
|
||
Cardinality can pose a problem when you have lots of services. | ||
There isn't a direct formula or solution to this issue. | ||
The following guide should help estimate the cardinality that the feature will generate. | ||
|
||
For more information on cardinality, refer to the [Cardinality]({{< relref "../cardinality" >}}) documentation. | ||
|
||
### How to estimate the cardinality | ||
|
||
The amount of edges depends on the number of nodes in the system and the direction of the requests between them. | ||
Let’s call this amount hops. Every hop will be a unique combination of client + server labels. | ||
|
||
For example: | ||
- A system with 3 nodes `(A, B, C)` of which A only calls B and B only calls C will have 2 hops `(A → B, B → C)` | ||
- A system with 3 nodes `(A, B, C)` that call each other (i.e., all bidirectional link) will have 6 hops `(A → B, B → A, B → C, C → B, A → C, C → A)` | ||
|
||
We can’t calculate the amount of hops automatically based upon the nodes, | ||
but it should be a value between `#services - 1` and `#services!`. | ||
|
||
If we know the amount of hops in a system, we can calculate the cardinality of the generated | ||
[service graphs]({{< relref "../service_graphs" >}}): | ||
|
||
``` | ||
traces_service_graph_request_total: #hops | ||
traces_service_graph_request_failed_total: #hops | ||
traces_service_graph_request_server_seconds: 3 buckets * #hops | ||
traces_service_graph_request_client_seconds: 3 buckets * #hops | ||
traces_service_graph_unpaired_spans_total: #services (absolute worst case) | ||
traces_service_graph_dropped_spans_total: #services (absolute worst case) | ||
``` | ||
|
||
Finally, we get the following cardinality estimation: | ||
|
||
``` | ||
Sum: 8 * #hops + 2 * #services | ||
``` | ||
|
||
{{% admonition type="note" %}} | ||
To estimate the number of metrics, refer to the [Dry run metrics generator]({{< relref "../cardinality" >}}) documentation. | ||
{{% /admonition %}} |
73 changes: 73 additions & 0 deletions
73
docs/sources/tempo/metrics-generator/service_graphs/metrics-queries.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
title: Service graph metrics queries | ||
menuTitle: Metrics queries | ||
description: Use PromQL queries to access metrics from service graphs | ||
weight: 300 | ||
--- | ||
|
||
# Service graph metrics queries | ||
|
||
A collection of useful PromQL queries for service graphs. | ||
|
||
In most cases, users want to see a visual representation of their service graph. Grafana uses the service graph metrics created by Tempo and builds that visual for the user. However, in some cases, users may want to interact with the metrics that define that service graph directly. They may want to, for example, programmatically analyze how their services are interconnected and build downstream applications that use this information. | ||
|
||
To help with this, we've provided a collection of useful PromQL queries that can be used to explore service graph metrics. | ||
|
||
## Instant Queries | ||
|
||
An instant query will give a single value at the end of the selected time range. | ||
[Instant queries](https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries) are quicker to execute and it often easier to understand their results. We will prefer them in some scenarios: | ||
|
||
data:image/s3,"s3://crabby-images/32c69/32c693df2c0afadf64b18b22135911dcf995d0da" alt="Instant query in Grafana" | ||
|
||
### Connectivity between services | ||
|
||
Show me the total calls in the last 7 days for every client/server pair: | ||
|
||
```promql | ||
sum(increase(traces_service_graph_request_server_seconds_count{}[7d])) by (server, client) > 0 | ||
``` | ||
|
||
If you'd like to only see when a single service is the server: | ||
|
||
```promql | ||
sum(increase(traces_service_graph_request_server_seconds_count{server="foo"}[7d])) by (client) > 0 | ||
``` | ||
|
||
If you'd like to only see when a single service is the client: | ||
|
||
```promql | ||
sum(increase(traces_service_graph_request_server_seconds_count{client="foo"}[7d])) by (server) > 0 | ||
``` | ||
|
||
In all of the above queries, you can adjust the interval to change the amount of time this is calculated for. So if you wanted the same analysis done over one day: | ||
|
||
```promql | ||
sum(increase(traces_service_graph_request_server_seconds_count{}[1d])) by (server, client) > 0 | ||
``` | ||
|
||
## Range queries | ||
|
||
Range queries are nice for calculating service graph info over a time range instead of a single point in time. | ||
|
||
data:image/s3,"s3://crabby-images/efbb7/efbb7c6c8680d29fd0ce749dc90a4b438bb227b8" alt="Range query in Grafana" | ||
|
||
### Rates over time between services | ||
|
||
Taking two of the queries above, we can request the rate over time that any given service acted as the client or server: | ||
|
||
```promql | ||
sum(rate(traces_service_graph_request_server_seconds_count{server="foo"}[5m])) by (client) > 0 | ||
sum(rate(traces_service_graph_request_server_seconds_count{client="foo"}[5m])) by (server) > 0 | ||
``` | ||
|
||
Notice that our interval dropped to 5m. This is so we only calculate the rate over the past 5 minutes which creates a more responsive graph. | ||
|
||
### Latency percentiles over time between services | ||
|
||
These queries will give us latency quantiles for the above rate. If we were interested in how the latency changed over time between any two services we could use these. In the following query the `.9` means we're calculating the 90th percentile. Adjust this value if you want to calculate a different percentile for latency (e.g. p50, p95, p99, etc). | ||
|
||
```promql | ||
histogram_quantile(.9, sum(rate(traces_service_graph_request_server_seconds_bucket{client="foo"}[5m])) by (server, le)) | ||
``` |
Binary file added
BIN
+129 KB
.../tempo/metrics-generator/service_graphs/screenshot-serv-graph-instant-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+333 KB
...es/tempo/metrics-generator/service_graphs/screenshot-serv-graph-range-query.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.