-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference in how connections are handled for draining listeners from v1.19.1 to v1.20.x,v1.21.x #20113
Comments
Could share config or steps to reproduce it? |
@daixiang0 please refer to the following: On v1.19.1:
The
Config: Stats: On v1.21.1
The
Config: Stats: |
Since the listener is updated filter chain by filter chain, it's probably hitting the issue I found and analyzed here
|
@lambdai thanks for sharing this. Based on #18677 (comment), it doesn't seem like there is an issue related to the bug you were fixing. Are you certain it's the same issue as this bug? Also note that I do not see this issue in v1.19.1, but consistently hit this when using v1.21.1. /cc @mattklein123 |
I can't reproduce this issue on I use a very simple listener as below
then I remove the Before reach to the drain time, I connect to the listener port, the connection isn't hang. It is refused directly.
@shashankram not sure I understand you case clearly, let me know if I didn't test the correct case. |
@soulxu could you kindly use the exact build type and image, I noticed you are using a DEBUG build, please use the following in your test:
I am using the I am also using https://github.com/envoyproxy/go-control-plane/releases/tag/v0.10.1 to program Envoy. |
I don't believe that issue is fixed. The PR is closed because it's not critical, plus a more elegant patch is desired |
@lambdai Are you certain the issue I am seeing (connection hang when listener is draining) is related to the issue you pointed out? Also why is this not seen in v1.19.1? With v1.21.1, I can consistently reproduce this in my internal environment. |
I can confirm that this issue seems to be introduced by the change ba474ac (/cc @mattklein123) I cannot reproduce this issue with the previous commit e0380ad |
As @lambdai said there is a known issue in the related issue. @lambdai I lost the thread on that since I thought you had fixed it. Should that issue be reopened so we can track it? In terms of this issue I can't say whether it's the same or not. Can you provide a self contained repro and I can take a look at some point? Thanks. |
I tried to reproduce in a standalone environment using filesystem based dynamic configuration. I used the example https://www.envoyproxy.io/docs/envoy/latest/start/quick-start/configuration-dynamic-filesystem and tried removing the listener so it starts to drain. I couldn't reproduce the issue exactly like in my original environment because I don't see the As per #20113 (comment), when this issue happens, I see the following stats when the listener is removed:
While using filesystem based configuration, I see:
*Note: CDS and LDS configs for filesystem based configs are taken from https://www.envoyproxy.io/docs/envoy/latest/start/quick-start/configuration-dynamic-filesystem. @mattklein123 Is there a way I can force the |
@mattklein123 I was able to repro this on a standalone environment. Steps:
Ensure
|
OK thanks for taking the time to provide a repro. @lambdai are you interested in looking at this? Otherwise I can try to find time to take a look. |
@mattklein123 Sorry I have too many things on my plate this week. I can look early next week |
@mattklein123, @lambdai were you able to repro this? |
Still blocked on this to be able to update to the latest Envoy version. Please let me know if there is a workaround. |
I got some bandwidth, but I still can't reproduce this yet. Are you using this image directly https://hub.docker.com/r/envoyproxy/envoy-alpine/tags ? |
See #20113 (comment) for repro steps. |
Yes, I try that steps, but I can't reproduce them. I update the config to remove listener
Then the connection will be refused directly.
|
hmn...I can't reproduce with this image also https://hub.docker.com/layers/envoy-alpine/envoyproxy/envoy-alpine/v1.20-latest/images/sha256-7b35149b22a2a00f3c244853eea071bebde985882088fc9b7e8d0e8d198138b7?context=explore |
I can still reproduce this using the I can confirm that this issue seems to be introduced by the change ba474ac (/cc @mattklein123) I cannot reproduce this issue with the previous commit e0380ad @soulxu please confirm you are following the steps exactly as described, and please share the detailed steps you followed. Note that you need to have multiple filter chains on the listener first, and then remove the entire listener to reproduce the issue. |
I can reproduce this issue when dynamic lds.yaml is changed from
Request does not hang:
Request does not hang:
Request hangs:
|
Sorry for the reply late, also thanks for those reproduce steps! I can reproduce now. Previously I only startup with lds-2.yaml, then remove the listener, it won't reproduce the problem. Now with one more step with lds-1.yaml, it can be reproduce now. I will dig into it. |
/assign @soulxu |
@soulxu, which release has this fix? |
Seems like this is available on v1.22.0 and v1.22.2 |
you are right, it is on v1.22.0, and greater than 1.22.0 |
- Updates Envoy to its latest available version (v1.22.2 for Linux, v1.22.1 for Windows). The latest version includes the latest released security fix. We could not update Envoy previously due to a blocking bug: envoyproxy/envoy#20113 - Updates filter names to custom names as wellknown names are deprecated in Envoy (with 1 exception for the http.rbac filter). Envoy will use the TypeURL in the proto to determine which filter to use instead. Wellknown names are not required and using them is confusing because not all filters are defined in the legacy wellknown pkg (e.g. http.local_ratelimit). See: envoyproxy/envoy#21759 envoyproxy/envoy#21763 envoyproxy/go-control-plane#293 envoyproxy/go-control-plane#552 - Uses the distroless image as the alpine image has been discontinued: envoyproxy/envoy#21758 - Updates tests to use custom filter names - Adds `proto_types.go` to aid dynamic proto resolution for typed configs using `any.Any()`. This helps resolve protos where dynamic resolution is necessary. - Updated Prometheus' ConfigMap to reflect changes to Envoy metrics prefixes Signed-off-by: Shashank Ram <[email protected]>
- Updates Envoy to its latest available version (v1.22.2 for Linux, v1.22.1 for Windows). The latest version includes the latest released security fix. We could not update Envoy previously due to a blocking bug: envoyproxy/envoy#20113 - Updates filter names to custom names as wellknown names are deprecated in Envoy (with 1 exception for the http.rbac filter). Envoy will use the TypeURL in the proto to determine which filter to use instead. Wellknown names are not required and using them is confusing because not all filters are defined in the legacy wellknown pkg (e.g. http.local_ratelimit). See: envoyproxy/envoy#21759 envoyproxy/envoy#21763 envoyproxy/go-control-plane#293 envoyproxy/go-control-plane#552 - Uses the distroless image as the alpine image has been discontinued: envoyproxy/envoy#21758 - Updates tests to use custom filter names - Adds `proto_types.go` to aid dynamic proto resolution for typed configs using `any.Any()`. This helps resolve protos where dynamic resolution is necessary. - Updated Prometheus' ConfigMap to reflect changes to Envoy metrics prefixes Signed-off-by: Shashank Ram <[email protected]>
Title: Difference in how connections are handled for draining listeners from v1.19.1 to v1.20.x,v1.21.x
Description:
I am observing a behavior difference in how HTTP connections are handled while the listener is draining in v1.20+ compared to v1.19.1.
Scenario: Egress listener is removed and is in draining state
In v1.19.1: curl http://foo.bar returns
curl: (7) ... connection refused
In v1.21.1: curl http://foo.bar hangs, and returns only after listener drain interval.
See #20113 (comment) for more details.
What has changed between these 2 versions to result in a change in behavior for connections during listener drain?
UPDATE:
I can confirm that this issue seems to be introduced by the change ba474ac
I cannot reproduce this issue with the previous commit e0380ad
The text was updated successfully, but these errors were encountered: