From 7a52916d8952600179360fa051d67ae76a627d31 Mon Sep 17 00:00:00 2001 From: boruszak Date: Wed, 22 Feb 2023 21:56:59 +0000 Subject: [PATCH 01/10] backport of commit e8466bf47933e1292a770a66add5683243d7b264 --- .../troubleshoot/troubleshoot-services.mdx | 132 ++++++++++++++++++ website/data/docs-nav-data.json | 4 + 2 files changed, 136 insertions(+) create mode 100644 website/content/docs/troubleshoot/troubleshoot-services.mdx diff --git a/website/content/docs/troubleshoot/troubleshoot-services.mdx b/website/content/docs/troubleshoot/troubleshoot-services.mdx new file mode 100644 index 000000000000..7ce32beb5083 --- /dev/null +++ b/website/content/docs/troubleshoot/troubleshoot-services.mdx @@ -0,0 +1,132 @@ +--- +layout: docs +page_title: Service-to-service troubleshooting overview +description: >- + Consul includes a built-in tool for troubleshooting communication between services in a service mesh. Learn how to use the `consul troubleshoot` command to validate communication between upstream and downstream Envoy proxies on VM and Kubernetes deployments. +--- + +# Service-to-service troubleshooting overview + +This topic provides an overview of Consul’s built-in service-to-service troubleshooting capabilities. When communication between an upstream service and a downstream service in a service mesh fails, you can run the `consul troubleshoot` command to initiate a series of automated validation tests. + +For more information, refer to the [`consul troubleshoot` CLI documentation](/consul/commands/troubleshoot) or the [`consul-k8s troubleshoot` CLI reference](/consul/docs/k8s/k8s-cli#troubleshoot). + +## Introduction + +When communication between upstream and downstream services in a service mesh fails, you can diagnose the cause manually with one or more of Consul’s built-in features, including [health check queries](/consul/docs/discovery/checks), [the UI topology view](/consul/docs/connect/observability/ui-visualization), and [agent telemetry metrics](/consul/docs/agent/telemetry#metrics-reference). + +The `consul troubleshoot` command performs several checks in sequence that enable you to discover issues that impede service-to-service communication. The process systematically queries the [Envoy administration interface API](https://www.envoyproxy.io/docs/envoy/latest/operations/admin) and the Consul API to determine the cause of the communication failure. + +The troubleshooting command validates service-to-service communication by checking for the following common issues: + +- Upstream service does not exist +- One or both hosts are unhealthy +- A filter affects the upstream service +- The CA has expired mTLS certificates +- The services have expired mTLS certificates + +Consul outputs the results of these validation checks to the terminal along with suggested actions to resolve the service communication failure. When it detects rejected configurations or connection failures, Consul also outputs Envoy metrics for services. + +### Envoy proxies in a service mesh + +Consul validates communication in a service mesh by checking the Envoy proxies that are deployed as sidecars for the upstream and downstream services. As a result, troubleshooting requires that [Consul’s service mesh features are enabled](/consul/docs/connect/configuration). + +For more information about using Envoy proxies with Consul, refer to [Envoy proxy configuration for service mesh](/consul/docs/connect/proxies/envoy). + +## Requirements + +- Consul v1.15 or later. +- For Kubernetes, the `consul-k8s` CLI must be installed. + +### Technical constraints + +When troubleshooting service-to-service communication issues, be aware of the following constraints: + +- The troubleshooting tool does not check service intentions. For more information about intentions, including precedence and match order, refer to [service mesh intentions](/consul/docs/connect/intentions). +- The troubleshooting tool validates one direct connection between a downstream service and an upstream service. You must run the `consul troubleshoot` command with the Envoy ID for an individual upstream service. It does support validating multiple connections simultaneously. +- Because it validates direct communication between two Envoy proxies, the troubleshooting tool does not support checks for service-to-service connections when communication between the services passes through a mesh gateway or a terminating gateway. + +## Usage + +Using the service-to-service troubleshooting tool is a two-step process: + +1. Find the identifier for the upstream service. +1. Use the upstream’s identifier to validate communication. + +In deployments without transparent proxies, the identifier is the _Envoy ID for the upstream service’s sidecar proxy_. If you use transparent proxies, the identifier is the _upstream service’s IP address_. For more information about using transparent proxies, refer to [Enable transparent proxy mode](/consul/docs/connect/transparent-proxy). + +### VMs + +To troubleshoot service-to-service communication issues in deployments that use VMs or bare-metal servers: + +1. Run the `consul troubleshoot upstreams` command to retrieve the upstream information for the service that is experiencing communication failures. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. + + ```shell-session + $ consul troubleshoot upstreams + ==> Upstreams (explicit upstreams only) (0) + ==> Upstreams IPs (transparent proxy only) (1) + [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] + If you cannot find the upstream address or cluster for a transparent proxy upstream: + - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. + - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. + ``` + +1. Run the `consul troubleshoot proxy` command and specify the Envoy ID or IP address with the `-upstream-ip` flag to identify the proxy you want to perform the troubleshooting process on. The following example uses the upstream IP to validate communication with the upstream service `backend`: + + ```shell-session + $ consul troubleshoot proxy -upstream-ip 10.4.6.160 + ==> Validation + ✓ Certificates are valid + ✓ Envoy has 0 rejected configurations + ✓ Envoy has detected 0 connection failure(s) + ✓ Listener for upstream "backend" found + ✓ Route for upstream "backend" found + ✓ Cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ Healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ Cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ! No healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + -> Check that your upstream service is healthy and running + -> Check that your upstream service is registered with Consul + -> Check that the upstream proxy is healthy and running + -> If you are explicitly configuring upstreams, ensure the name of the upstream is correct + ``` + +In the example, troubleshooting upstream communication reveals that the `backend` service has two service instances running in datacenter `dc1`. One of the services is healthy, but Consul cannot detect healthy endpoints for the second service instance. + +The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/discovery/dns#standard-lookup). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. + +### Kubernetes + +To troubleshoot service-to-service communication issues in deployments that use Kubernetes, retrieve the upstream information for the pod that is experiencing communication failures and use the upstream information to identify the proxy you want to perform the troubleshooting process on. + +1. Run the `consul-k8s troubleshoot upstreams` command and specify the pod ID with the `-pod` flag to retrieve upstream information. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. The following example displays all transparent proxy upstreams in Consul service mesh from the given pod. + + ```shell-session + $ consul-k8s troubleshoot upstreams -pod frontend-767ccfc8f9-6f6gx + ==> Upstreams (explicit upstreams only) (0) + ==> Upstreams IPs (transparent proxy only) (1) + [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] + If you cannot find the upstream address or cluster for a transparent proxy upstream: + - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. + - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. + ``` + +1. Run the `consul-k8s troubleshoot proxy` command and specify the pod ID and upstream IP address to identify the proxy you want to troubleshoot. The following example uses the upstream IP to validate communication with the upstream service `backend`: + +```shell-session + $ consul-k8s troubleshoot proxy -pod frontend-767ccfc8f9-6f6gx -upstream-ip 10.4.6.160 + ==> Validation + ✓ certificates are valid + ✓ Envoy has 0 rejected configurations + ✓ Envoy has detected 0 connection failure(s) + ✓ listener for upstream "backend" found + ✓ route for upstream "backend" found + ✓ cluster "backend.default.dc1.internal..consul" for upstream "backend" found + ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found + ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ``` + +In the example, troubleshooting upstream communication reveals that the `backend` service has two clusters in datacenter `dc1`. One of the clusters returns healthy endpoints, but Consul cannot detect healthy endpoints for the second cluster. + +The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/k8s/dns). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. diff --git a/website/data/docs-nav-data.json b/website/data/docs-nav-data.json index a15b5169ac29..1f42b604f927 100644 --- a/website/data/docs-nav-data.json +++ b/website/data/docs-nav-data.json @@ -805,6 +805,10 @@ { "title": "FAQ", "path": "troubleshoot/faq" + }, + { + "title": "Service-to-Service Troubleshooting", + "path": "troubleshoot/troubleshoot-services" } ] }, From 962a7bdf02840146ae498c1b263f75e2c3426311 Mon Sep 17 00:00:00 2001 From: boruszak Date: Wed, 22 Feb 2023 22:27:54 +0000 Subject: [PATCH 02/10] backport of commit f919a6e77b8f5f6ba964e671873a82c2dad9a903 --- .../troubleshoot/troubleshoot-services.mdx | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/website/content/docs/troubleshoot/troubleshoot-services.mdx b/website/content/docs/troubleshoot/troubleshoot-services.mdx index 7ce32beb5083..db1c67930a01 100644 --- a/website/content/docs/troubleshoot/troubleshoot-services.mdx +++ b/website/content/docs/troubleshoot/troubleshoot-services.mdx @@ -55,26 +55,28 @@ Using the service-to-service troubleshooting tool is a two-step process: In deployments without transparent proxies, the identifier is the _Envoy ID for the upstream service’s sidecar proxy_. If you use transparent proxies, the identifier is the _upstream service’s IP address_. For more information about using transparent proxies, refer to [Enable transparent proxy mode](/consul/docs/connect/transparent-proxy). -### VMs +### Troubleshoot on VMs To troubleshoot service-to-service communication issues in deployments that use VMs or bare-metal servers: 1. Run the `consul troubleshoot upstreams` command to retrieve the upstream information for the service that is experiencing communication failures. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. - ```shell-session +```shell-session $ consul troubleshoot upstreams + ==> Upstreams (explicit upstreams only) (0) ==> Upstreams IPs (transparent proxy only) (1) [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] If you cannot find the upstream address or cluster for a transparent proxy upstream: - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. - ``` +``` 1. Run the `consul troubleshoot proxy` command and specify the Envoy ID or IP address with the `-upstream-ip` flag to identify the proxy you want to perform the troubleshooting process on. The following example uses the upstream IP to validate communication with the upstream service `backend`: - ```shell-session +```shell-session $ consul troubleshoot proxy -upstream-ip 10.4.6.160 + ==> Validation ✓ Certificates are valid ✓ Envoy has 0 rejected configurations @@ -95,26 +97,28 @@ In the example, troubleshooting upstream communication reveals that the `backend The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/discovery/dns#standard-lookup). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. -### Kubernetes +### Troubleshoot on Kubernetes To troubleshoot service-to-service communication issues in deployments that use Kubernetes, retrieve the upstream information for the pod that is experiencing communication failures and use the upstream information to identify the proxy you want to perform the troubleshooting process on. 1. Run the `consul-k8s troubleshoot upstreams` command and specify the pod ID with the `-pod` flag to retrieve upstream information. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. The following example displays all transparent proxy upstreams in Consul service mesh from the given pod. - ```shell-session +```shell-session $ consul-k8s troubleshoot upstreams -pod frontend-767ccfc8f9-6f6gx + ==> Upstreams (explicit upstreams only) (0) ==> Upstreams IPs (transparent proxy only) (1) [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] If you cannot find the upstream address or cluster for a transparent proxy upstream: - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. - ``` +``` 1. Run the `consul-k8s troubleshoot proxy` command and specify the pod ID and upstream IP address to identify the proxy you want to troubleshoot. The following example uses the upstream IP to validate communication with the upstream service `backend`: ```shell-session $ consul-k8s troubleshoot proxy -pod frontend-767ccfc8f9-6f6gx -upstream-ip 10.4.6.160 + ==> Validation ✓ certificates are valid ✓ Envoy has 0 rejected configurations @@ -125,7 +129,7 @@ To troubleshoot service-to-service communication issues in deployments that use ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found - ``` +``` In the example, troubleshooting upstream communication reveals that the `backend` service has two clusters in datacenter `dc1`. One of the clusters returns healthy endpoints, but Consul cannot detect healthy endpoints for the second cluster. From 95ea43252ff3c7df5d2fcaad7dc7f6e26359dc2d Mon Sep 17 00:00:00 2001 From: boruszak Date: Wed, 22 Feb 2023 22:42:03 +0000 Subject: [PATCH 03/10] backport of commit e5527649ae2c2c6b663dbcc813c69744609fb680 --- website/content/api-docs/operator/usage.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/website/content/api-docs/operator/usage.mdx b/website/content/api-docs/operator/usage.mdx index 82202b13fb82..56a3c1ec3c44 100644 --- a/website/content/api-docs/operator/usage.mdx +++ b/website/content/api-docs/operator/usage.mdx @@ -49,6 +49,7 @@ $ curl \ + ```json { "Usage": { @@ -74,8 +75,11 @@ $ curl \ "ResultsFilteredByACLs": false } ``` + + + ```json { "Usage": { From e877ba980d26c865577a5c8fdf71db874fb1ab41 Mon Sep 17 00:00:00 2001 From: boruszak Date: Wed, 22 Feb 2023 22:43:15 +0000 Subject: [PATCH 04/10] backport of commit 5c40ba5360dfcb27b501da44a7c075968f0a4eb2 --- website/content/api-docs/operator/usage.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/content/api-docs/operator/usage.mdx b/website/content/api-docs/operator/usage.mdx index 56a3c1ec3c44..1d7e2f023a69 100644 --- a/website/content/api-docs/operator/usage.mdx +++ b/website/content/api-docs/operator/usage.mdx @@ -77,7 +77,6 @@ $ curl \ ``` - ```json @@ -131,6 +130,7 @@ $ curl \ "ResultsFilteredByACLs": false } ``` + From 89019342c1428ae5c2358568213c3187a677bda3 Mon Sep 17 00:00:00 2001 From: boruszak Date: Thu, 23 Feb 2023 16:28:10 +0000 Subject: [PATCH 05/10] backport of commit 51b6f5009f7408e13162f8a9444fd98b8bbac21d --- website/content/commands/troubleshoot/index.mdx | 2 +- website/content/commands/troubleshoot/proxy.mdx | 2 +- website/content/commands/troubleshoot/upstreams.mdx | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/website/content/commands/troubleshoot/index.mdx b/website/content/commands/troubleshoot/index.mdx index 521981a77e0b..0c992aab15c9 100644 --- a/website/content/commands/troubleshoot/index.mdx +++ b/website/content/commands/troubleshoot/index.mdx @@ -9,7 +9,7 @@ description: >- Command: `consul troubleshoot` -Use the `troubleshoot` command to diagnose Consul service mesh configuration or network issues. +Use the `troubleshoot` command to diagnose Consul service mesh configuration or network issues. For additional information about using the `troubleshoot` command, including explanations, requirements, usage instructions, refer to the [service-to-service troubleshooting overview](/consul/docs/troubleshoot/troubleshoot-services). ## Usage diff --git a/website/content/commands/troubleshoot/proxy.mdx b/website/content/commands/troubleshoot/proxy.mdx index 6c93581b155e..d9749c0c254f 100644 --- a/website/content/commands/troubleshoot/proxy.mdx +++ b/website/content/commands/troubleshoot/proxy.mdx @@ -9,7 +9,7 @@ description: >- Command: `consul troubleshoot proxy` -The `troubleshoot proxy` command diagnoses Consul service mesh configuration and network issues to an upstream. +The `troubleshoot proxy` command diagnoses Consul service mesh configuration and network issues to an upstream. For additional information about using the `troubleshoot proxy` command, including explanations, requirements, usage instructions, refer to the [service-to-service troubleshooting overview](/consul/docs/troubleshoot/troubleshoot-services). ## Usage diff --git a/website/content/commands/troubleshoot/upstreams.mdx b/website/content/commands/troubleshoot/upstreams.mdx index 752bb0463c51..425ec39e4642 100644 --- a/website/content/commands/troubleshoot/upstreams.mdx +++ b/website/content/commands/troubleshoot/upstreams.mdx @@ -9,7 +9,7 @@ description: >- Command: `consul troubleshoot upstreams` -The `troubleshoot upstreams` lists the available upstreams in the Consul service mesh from the current service. +The `troubleshoot upstreams` lists the available upstreams in the Consul service mesh from the current service. For additional information about using the `troubleshoot upstreams` command, including explanations, requirements, usage instructions, refer to the [service-to-service troubleshooting overview](/consul/docs/troubleshoot/troubleshoot-services). ## Usage From 96befe5bc40a0edc2eae447bf7952425c1418071 Mon Sep 17 00:00:00 2001 From: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Date: Thu, 23 Feb 2023 16:32:11 +0000 Subject: [PATCH 06/10] backport of commit 00ec4e5ff3b1b18ab23ef0beedb4cbdfcada22b8 --- website/content/docs/troubleshoot/troubleshoot-services.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/content/docs/troubleshoot/troubleshoot-services.mdx b/website/content/docs/troubleshoot/troubleshoot-services.mdx index db1c67930a01..b6347189d5c7 100644 --- a/website/content/docs/troubleshoot/troubleshoot-services.mdx +++ b/website/content/docs/troubleshoot/troubleshoot-services.mdx @@ -44,7 +44,7 @@ When troubleshooting service-to-service communication issues, be aware of the fo - The troubleshooting tool does not check service intentions. For more information about intentions, including precedence and match order, refer to [service mesh intentions](/consul/docs/connect/intentions). - The troubleshooting tool validates one direct connection between a downstream service and an upstream service. You must run the `consul troubleshoot` command with the Envoy ID for an individual upstream service. It does support validating multiple connections simultaneously. -- Because it validates direct communication between two Envoy proxies, the troubleshooting tool does not support checks for service-to-service connections when communication between the services passes through a mesh gateway or a terminating gateway. +- The troubleshooting tool only validates Envoy configurations for sidecar proxies. This means the troubleshooting tool does not validate Envoy configurations on upstream proxies such as mesh gateways and terminating gateways. ## Usage From b09b711b82223cf12f87c92b7b8884b1f90f2023 Mon Sep 17 00:00:00 2001 From: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Date: Thu, 23 Feb 2023 16:32:28 +0000 Subject: [PATCH 07/10] backport of commit 1405edeff9d303ac22bbc8f59c8812da9920297f --- .../troubleshoot/troubleshoot-services.mdx | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/website/content/docs/troubleshoot/troubleshoot-services.mdx b/website/content/docs/troubleshoot/troubleshoot-services.mdx index b6347189d5c7..60a83eb0ff1f 100644 --- a/website/content/docs/troubleshoot/troubleshoot-services.mdx +++ b/website/content/docs/troubleshoot/troubleshoot-services.mdx @@ -61,16 +61,16 @@ To troubleshoot service-to-service communication issues in deployments that use 1. Run the `consul troubleshoot upstreams` command to retrieve the upstream information for the service that is experiencing communication failures. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. -```shell-session - $ consul troubleshoot upstreams - - ==> Upstreams (explicit upstreams only) (0) - ==> Upstreams IPs (transparent proxy only) (1) - [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] - If you cannot find the upstream address or cluster for a transparent proxy upstream: - - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. - - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. -``` + ```shell-session + $ consul troubleshoot upstreams + + ==> Upstreams (explicit upstreams only) (0) + ==> Upstreams IPs (transparent proxy only) (1) + [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] + If you cannot find the upstream address or cluster for a transparent proxy upstream: + - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. + - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. + ``` 1. Run the `consul troubleshoot proxy` command and specify the Envoy ID or IP address with the `-upstream-ip` flag to identify the proxy you want to perform the troubleshooting process on. The following example uses the upstream IP to validate communication with the upstream service `backend`: From 7b03886deaea6f427910dcf3cf9429a7dc333385 Mon Sep 17 00:00:00 2001 From: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Date: Thu, 23 Feb 2023 16:33:15 +0000 Subject: [PATCH 08/10] backport of commit 42e93d59f68926c64b4f23f77e6c4585ca6d799e --- website/content/docs/troubleshoot/troubleshoot-services.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/content/docs/troubleshoot/troubleshoot-services.mdx b/website/content/docs/troubleshoot/troubleshoot-services.mdx index 60a83eb0ff1f..dae6e4b727b7 100644 --- a/website/content/docs/troubleshoot/troubleshoot-services.mdx +++ b/website/content/docs/troubleshoot/troubleshoot-services.mdx @@ -97,6 +97,8 @@ In the example, troubleshooting upstream communication reveals that the `backend The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/discovery/dns#standard-lookup). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. +For more information, refer to the [`consul troubleshoot` CLI documentation](/consul/commands/troubleshoot). + ### Troubleshoot on Kubernetes To troubleshoot service-to-service communication issues in deployments that use Kubernetes, retrieve the upstream information for the pod that is experiencing communication failures and use the upstream information to identify the proxy you want to perform the troubleshooting process on. From 6c65224fdc20307f45f17ad6232cc11f85883bf4 Mon Sep 17 00:00:00 2001 From: boruszak Date: Thu, 23 Feb 2023 16:34:41 +0000 Subject: [PATCH 09/10] backport of commit f867d2edabc26f71f338d6fbb95e85f1cc341a86 --- website/data/docs-nav-data.json | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/website/data/docs-nav-data.json b/website/data/docs-nav-data.json index 1f42b604f927..c3942cd5ac73 100644 --- a/website/data/docs-nav-data.json +++ b/website/data/docs-nav-data.json @@ -798,6 +798,10 @@ { "title": "Troubleshoot", "routes": [ + { + "title": "Service-to-Service Troubleshooting", + "path": "troubleshoot/troubleshoot-services" + }, { "title": "Common Error Messages", "path": "troubleshoot/common-errors" @@ -805,10 +809,6 @@ { "title": "FAQ", "path": "troubleshoot/faq" - }, - { - "title": "Service-to-Service Troubleshooting", - "path": "troubleshoot/troubleshoot-services" } ] }, From 06baf2b82351cb39fa0930f20bc086964de1650d Mon Sep 17 00:00:00 2001 From: boruszak Date: Thu, 23 Feb 2023 17:01:59 +0000 Subject: [PATCH 10/10] backport of commit 948227199879c9451f248e8acfd504a4899092a6 --- .../troubleshoot/troubleshoot-services.mdx | 104 ++++++++++-------- 1 file changed, 58 insertions(+), 46 deletions(-) diff --git a/website/content/docs/troubleshoot/troubleshoot-services.mdx b/website/content/docs/troubleshoot/troubleshoot-services.mdx index dae6e4b727b7..3451d2e50672 100644 --- a/website/content/docs/troubleshoot/troubleshoot-services.mdx +++ b/website/content/docs/troubleshoot/troubleshoot-services.mdx @@ -44,7 +44,7 @@ When troubleshooting service-to-service communication issues, be aware of the fo - The troubleshooting tool does not check service intentions. For more information about intentions, including precedence and match order, refer to [service mesh intentions](/consul/docs/connect/intentions). - The troubleshooting tool validates one direct connection between a downstream service and an upstream service. You must run the `consul troubleshoot` command with the Envoy ID for an individual upstream service. It does support validating multiple connections simultaneously. -- The troubleshooting tool only validates Envoy configurations for sidecar proxies. This means the troubleshooting tool does not validate Envoy configurations on upstream proxies such as mesh gateways and terminating gateways. +- The troubleshooting tool only validates Envoy configurations for sidecar proxies. As a result, the troubleshooting tool does not validate Envoy configurations on upstream proxies such as mesh gateways and terminating gateways. ## Usage @@ -61,39 +61,44 @@ To troubleshoot service-to-service communication issues in deployments that use 1. Run the `consul troubleshoot upstreams` command to retrieve the upstream information for the service that is experiencing communication failures. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. - ```shell-session - $ consul troubleshoot upstreams - - ==> Upstreams (explicit upstreams only) (0) - ==> Upstreams IPs (transparent proxy only) (1) - [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] - If you cannot find the upstream address or cluster for a transparent proxy upstream: - - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. - - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. - ``` + ```shell-session + $ consul troubleshoot upstreams + ==> Upstreams (explicit upstreams only) (0) + ==> Upstreams IPs (transparent proxy only) (1) + [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] + If you cannot find the upstream address or cluster for a transparent proxy upstream: + - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. + - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. + ``` 1. Run the `consul troubleshoot proxy` command and specify the Envoy ID or IP address with the `-upstream-ip` flag to identify the proxy you want to perform the troubleshooting process on. The following example uses the upstream IP to validate communication with the upstream service `backend`: - -```shell-session - $ consul troubleshoot proxy -upstream-ip 10.4.6.160 + ```shell-session + $ consul troubleshoot proxy -upstream-ip 10.4.6.160 ==> Validation - ✓ Certificates are valid - ✓ Envoy has 0 rejected configurations - ✓ Envoy has detected 0 connection failure(s) - ✓ Listener for upstream "backend" found - ✓ Route for upstream "backend" found - ✓ Cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found - ✓ Healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found - ✓ Cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found - ! No healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found - -> Check that your upstream service is healthy and running - -> Check that your upstream service is registered with Consul - -> Check that the upstream proxy is healthy and running - -> If you are explicitly configuring upstreams, ensure the name of the upstream is correct + ✓ Certificates are valid + ✓ Envoy has 0 rejected configurations + ✓ Envoy has detected 0 connection failure(s) + ✓ Listener for upstream "backend" found + ✓ Route for upstream "backend" found + ✓ Cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ Healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ Cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ! No healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + -> Check that your upstream service is healthy and running + -> Check that your upstream service is registered with Consul + -> Check that the upstream proxy is healthy and running + -> If you are explicitly configuring upstreams, ensure the name of the upstream is correct ``` -In the example, troubleshooting upstream communication reveals that the `backend` service has two service instances running in datacenter `dc1`. One of the services is healthy, but Consul cannot detect healthy endpoints for the second service instance. +In the example output, troubleshooting upstream communication reveals that the `backend` service has two service instances running in datacenter `dc1`. One of the services is healthy, but Consul cannot detect healthy endpoints for the second service instance. This information appears in the following lines of the example: + +```text hideClipboard + ✓ Cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ Healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ Cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ! No healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found +``` The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/discovery/dns#standard-lookup). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. @@ -105,34 +110,41 @@ To troubleshoot service-to-service communication issues in deployments that use 1. Run the `consul-k8s troubleshoot upstreams` command and specify the pod ID with the `-pod` flag to retrieve upstream information. Depending on your network’s configuration, the upstream information is either an Envoy ID or an IP address. The following example displays all transparent proxy upstreams in Consul service mesh from the given pod. -```shell-session + ```shell-session $ consul-k8s troubleshoot upstreams -pod frontend-767ccfc8f9-6f6gx - ==> Upstreams (explicit upstreams only) (0) ==> Upstreams IPs (transparent proxy only) (1) [10.4.6.160 240.0.0.3] true map[backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul] If you cannot find the upstream address or cluster for a transparent proxy upstream: - - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. - - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. -``` + - Check intentions: Tproxy upstreams are configured based on intentions. Make sure you have configured intentions to allow traffic to your upstream. + - To check that the right cluster is being dialed, run a DNS lookup for the upstream you are dialing. For example, run `dig backend.svc.consul` to return the IP address for the `backend` service. If the address you get from that is missing from the upstream IPs, it means that your proxy may be misconfigured. + ``` 1. Run the `consul-k8s troubleshoot proxy` command and specify the pod ID and upstream IP address to identify the proxy you want to troubleshoot. The following example uses the upstream IP to validate communication with the upstream service `backend`: -```shell-session + ```shell-session $ consul-k8s troubleshoot proxy -pod frontend-767ccfc8f9-6f6gx -upstream-ip 10.4.6.160 - ==> Validation - ✓ certificates are valid - ✓ Envoy has 0 rejected configurations - ✓ Envoy has detected 0 connection failure(s) - ✓ listener for upstream "backend" found - ✓ route for upstream "backend" found - ✓ cluster "backend.default.dc1.internal..consul" for upstream "backend" found - ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found - ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found - ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found -``` + ✓ certificates are valid + ✓ Envoy has 0 rejected configurations + ✓ Envoy has detected 0 connection failure(s) + ✓ listener for upstream "backend" found + ✓ route for upstream "backend" found + ✓ cluster "backend.default.dc1.internal..consul" for upstream "backend" found + ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found + ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ``` + +In the example output, troubleshooting upstream communication reveals that the `backend` service has two clusters in datacenter `dc1`. One of the clusters returns healthy endpoints, but Consul cannot detect healthy endpoints for the second cluster. This information appears in the following lines of the example: -In the example, troubleshooting upstream communication reveals that the `backend` service has two clusters in datacenter `dc1`. One of the clusters returns healthy endpoints, but Consul cannot detect healthy endpoints for the second cluster. + ```text hideClipboard + ✓ cluster "backend.default.dc1.internal..consul" for upstream "backend" found + ✓ healthy endpoints for cluster "backend.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found + ✓ cluster "backend2.default.dc1.internal..consul" for upstream "backend" found + ! no healthy endpoints for cluster "backend2.default.dc1.internal.e08fa6d6-e91e-dfe0-f6e1-ba097a828e31.consul" for upstream "backend" found +``` The output from the troubleshooting process identifies service instances according to their [Consul DNS address](/consul/docs/k8s/dns). Use the DNS information for failing services to diagnose the specific issues affecting the service instance. + +For more information, refer to the [`consul-k8s troubleshoot` CLI reference](/consul/docs/k8s/k8s-cli#troubleshoot). \ No newline at end of file