-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WASM Calling proxy_send_local_response
twice will stuck remote http client(e.g. curl) forever until timeout or interrupted
#28826
Comments
I can tackle this |
/assign @keithmattix |
/assign @krinkinmu |
krinkinmu is not allowed to assign users. |
/assign @krinkinmu |
@krinkinmu cannot be assigned to this issue. |
@alyssawilk any idea what the issue is here? |
I was able to reproduce the issue on a recent head build and with C++ based filter. Going through the issue with debugger suggest that roughly the following is happening:
As for the fix, I think the filter itself can avoid triggering a second local reply if it's processing a local reply already. That being said, I don't think that Envoy hanging in this case is desirable behavior, regardless of the filter logic, so I think Envoy could do better in this case. I see a couple of options:
NOTE: mostly for myself, a third option would be to not send local replies through the filter chain, but I don't think we should change the behavior that was around for years so I'm dismissing this option right away. I find option 2 rather confusing, what is the desired behaviour be in this case? The second local reply overwrites the first one? Because of this I prefer option 1, so I'm starting to work on the code change to implement option 1. |
One more thing to mention, the fix should be on the proxy-wasm-cpp-host side, I think. |
/assign @krinkinmu |
krinkinmu is not allowed to assign users. |
@keithmattix I thought that it was a propagation delay of some sort, but it does not seem to be the case anymore. Looking at what repokitteh does, I think that it is supposed to just call github API (see https://github.com/repokitteh/modules/blob/master/assign.star#L8-L10 and https://docs.github.com/en/rest/issues/assignees?apiVersion=2022-11-28#check-if-a-user-can-be-assigned). However, running the API check locally works for me:
and for the reference, this is how I think it looks when I don't have permissions
I'm confused why it does not work, but maybe we should discuss the repo access issues outside of this bug? |
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary: 1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders 2. When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy 3. The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called 4. onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing. As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped. I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better: 1. We can avoid stopping processing in such cases to prevent stuck connections on Envoy 2. We can return errors from proxy_send_local_response instead of silently ignoring them. Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response. However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin. With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing). I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call. NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically: 1. C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently) 2. Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case. On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit. Signed-off-by: Mikhail Krinkin <[email protected]>
I prepared a fix for proxy-wasm-host-cpp that resolves the issue: proxy-wasm/proxy-wasm-cpp-host#423. Will see if we can get consensus on the approach and get it merged there. |
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary: 1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders 2. When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy 3. The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called 4. onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing. As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped. I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better: 1. We can avoid stopping processing in such cases to prevent stuck connections on Envoy 2. We can return errors from proxy_send_local_response instead of silently ignoring them. Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response. However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin. With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing). I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call. NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically: 1. C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently) 2. Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case. On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit. Signed-off-by: Mikhail Krinkin <[email protected]>
For context see the Envoy issue envoyproxy/envoy#28826. Here is a shorter summary: 1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and onResponseHeaders 2. When proxy_send_local_reply is called from onRequestHeaders it triggers a local reply and that reply goes through the filter chain in Envoy 3. The same plugin is called again as part of the filter chain processing but this time onResponseHeaders is called 4. onResponseHeaders calls proxy_send_local_response which ultimately does not generate a local reply, but it stops filter chain processing. As a result we end up with a stuck connection on Envoy - no local reply and processing is stopped. I think that proxy wasm plugins shouldn't use proxy_send_local_response this way, so ultimately whoever created such a plugin shot themselves in the foot. That being said, I think there are a few improvements that could be made here on Envoy/proxy-wasm side to handle this situation somewhat better: 1. We can avoid stopping processing in such cases to prevent stuck connections on Envoy 2. We can return errors from proxy_send_local_response instead of silently ignoring them. Currently Envoy implementation of sendLocalResponse can detect when a second local response is requested and returns an error in this case without actually trying to send a local response. However, even though Envoy reports an error, send_local_response ignores the result of the host specific sendLocalResponse implementation and stops processing and returns success to the wasm plugin. With this change, send_local_response will check the result of the host-specific implementation of the sendLocalResponse. In cases when sendLocalResponse fails it will just propagate the error to the caller and do nothing else (including stopping processing). I think this behavior makes sense in principle because on the one hand we don't ignore the failure from sendLocalResponse and on the other hand, when the failure happens we don't trigger any side-effects expected from the successful proxy_send_local_response call. NOTE: Even though I do think that this is a more resonable behavior, it's still a change from the previous behavior and it might break existing proxy-wasm plugins. Specifically: 1. C++ plugins that proactively check the result of proxy_send_local_response will change behavior (e.g., before proxy_send_local_response failed silently) 2. Rust plugins, due to the way Rust SDK handles errors from proxy_send_local_response will crash in runtime in this case. On the bright side of things, the plugins that are affected by this change currently just cause stuck connections in Envoy, so we are changing one undesirable behavior for another, but more explicit. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Change to Envoy implementation of failStream used by proxy-wasm in case of critical errors 3. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Why do we need to change the implementation of the failStream? failStream gets called when Wasm VM crashes (e.g., null pointer derefernce or abort inside the VM or any other unrecoverable error with the VM). Current implementation just calls sendLocalReply in this case. Let's consider what happens during the HTTP request processing when Wasm VM crashes: 1. Wasm VM crashes 2. proxy-wasm calls failStream which calls sendLocalReply 3. Envoy prepares local reply and eventually sends it through the filter chain 4. proxy-wasm plugin with a crashed VM is called to process the reply proxy-wasm in this case can't really do anything and just stops the iteration. Which is a fine way of dealing with the issue, but we can do slightly better and close the connection in this case instead of just pausing the iteration. And we are not losing anything in this case by replacing sendLocalReply with resetStream, because the local reply doesn't get send anyways. > NOTE: The same issue does not happen if the VM crashes during response > processing, because sendLocalReply in this case will send the response > directly ignoring the filter chain. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Change to Envoy implementation of failStream used by proxy-wasm in case of critical errors 3. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Why do we need to change the implementation of the failStream? failStream gets called when Wasm VM crashes (e.g., null pointer derefernce or abort inside the VM or any other unrecoverable error with the VM). Current implementation just calls sendLocalReply in this case. Let's consider what happens during the HTTP request processing when Wasm VM crashes: 1. Wasm VM crashes 2. proxy-wasm calls failStream which calls sendLocalReply 3. Envoy prepares local reply and eventually sends it through the filter chain 4. proxy-wasm plugin with a crashed VM is called to process the reply proxy-wasm in this case can't really do anything and just stops the iteration. Which is a fine way of dealing with the issue, but we can do slightly better and close the connection in this case instead of just pausing the iteration. And we are not losing anything in this case by replacing sendLocalReply with resetStream, because the local reply doesn't get send anyways. > NOTE: The same issue does not happen if the VM crashes during response > processing, because sendLocalReply in this case will send the response > directly ignoring the filter chain. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
/assign @krinkinmu |
…6809) Commit Message: This change fixes #28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in #28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see #23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of #23049 2. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back #23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in #24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in #23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Additional Description: Risk Level: low Testing: Manually, by reproducing the case reported in #28826. I also added new unit tests and verified that they pass and aren't flaky: ``` bazel test --runs_per_test=1000 //test/extensions/common/wasm:all --config=docker-clang ``` Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Fixes #28826 --------- Signed-off-by: Mikhail Krinkin <[email protected]>
Title: WASM Calling
proxy_send_local_response
twice will stuck remote http client(e.g. curl) forever until timeout or interruptedDescription:
I have a wasm running on envoy, and I found http request will be stuck if I call
proxy_send_local_response
both inon_http_request_headers
andon_http_response_headers
.Repro steps:
Here is the shortest wasm code:
envoy docker-compose configuration:
and envoy.yaml:
Logs:
And here is log on client side:
I found out that it's caused by this PR #23049
It occurs since envoy 1.24+ (also Istio 1.16+).
If I revert these codes and build, it works again.
I don't know if this behaviour is expected or changed by accident.
It only occurs in my QA cluster (with a higher version of envoy, 1.24+)
The text was updated successfully, but these errors were encountered: