-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix sendLocalResponse in wasm #23049
Conversation
/wait |
}); | ||
} | ||
local_reply_hold_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both hold and sent? Doesn't your "hold" case already cover the sent case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
envoy/source/extensions/common/wasm/context.cc
Line 1609 in f09ed36
decoder_callbacks_->sendLocalReply(Envoy::Http::Code::ServiceUnavailable, "", nullptr, |
This is to avoid sending the local reply twice if the vm fails. I added another test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to @mathetake @lizan here, can you folks take a pass as CODEOWNERS? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah unfortunately I think it is inevitable to have both of these two booleans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnlanni this is fine I think, can you add some documentation in the form of comments for posterity, as the reasons we need this is subtle as per your above comment? Everything else LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch done
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
This reverts commit 01d9b4cbbb77f5508a285590e6f83b98c9e19496. Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
There seems to be a problem with this clang tidy check, this using will be used in the context.cc like:
|
}); | ||
} | ||
local_reply_hold_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to @mathetake @lizan here, can you folks take a pass as CODEOWNERS? Thanks.
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: johnlanni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Change to Envoy implementation of failStream used by proxy-wasm in case of critical errors 3. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Why do we need to change the implementation of the failStream? failStream gets called when Wasm VM crashes (e.g., null pointer derefernce or abort inside the VM or any other unrecoverable error with the VM). Current implementation just calls sendLocalReply in this case. Let's consider what happens during the HTTP request processing when Wasm VM crashes: 1. Wasm VM crashes 2. proxy-wasm calls failStream which calls sendLocalReply 3. Envoy prepares local reply and eventually sends it through the filter chain 4. proxy-wasm plugin with a crashed VM is called to process the reply proxy-wasm in this case can't really do anything and just stops the iteration. Which is a fine way of dealing with the issue, but we can do slightly better and close the connection in this case instead of just pausing the iteration. And we are not losing anything in this case by replacing sendLocalReply with resetStream, because the local reply doesn't get send anyways. > NOTE: The same issue does not happen if the VM crashes during response > processing, because sendLocalReply in this case will send the response > directly ignoring the filter chain. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Change to Envoy implementation of failStream used by proxy-wasm in case of critical errors 3. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Why do we need to change the implementation of the failStream? failStream gets called when Wasm VM crashes (e.g., null pointer derefernce or abort inside the VM or any other unrecoverable error with the VM). Current implementation just calls sendLocalReply in this case. Let's consider what happens during the HTTP request processing when Wasm VM crashes: 1. Wasm VM crashes 2. proxy-wasm calls failStream which calls sendLocalReply 3. Envoy prepares local reply and eventually sends it through the filter chain 4. proxy-wasm plugin with a crashed VM is called to process the reply proxy-wasm in this case can't really do anything and just stops the iteration. Which is a fine way of dealing with the issue, but we can do slightly better and close the connection in this case instead of just pausing the iteration. And we are not losing anything in this case by replacing sendLocalReply with resetStream, because the local reply doesn't get send anyways. > NOTE: The same issue does not happen if the VM crashes during response > processing, because sendLocalReply in this case will send the response > directly ignoring the filter chain. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
This change fixes envoyproxy#28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in envoyproxy#28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see envoyproxy#23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of envoyproxy#23049 2. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back envoyproxy#23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in envoyproxy#24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in envoyproxy#23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Signed-off-by: Mikhail Krinkin <[email protected]>
…6809) Commit Message: This change fixes #28826. Some additional discussions for context can be found in proxy-wasm/proxy-wasm-cpp-host#423. The issue reported in #28826 happens when proxy-wasm plugin calls proxy_send_local_response during the HTTP request proessing and HTTP response processing. This happens because in attempt to mitigate a use-after-free issue (see #23049) we added logic to proxy-wasm that avoids calling sendLocalReply multiple times. So now when proxy-wasm plugin calls proxy_send_local_response only the first call will result in sendLocalReply, while all subsequent calls will get ignored. At the same time, when proxy-wasm plugins call proxy_send_local_response, because it's used to report an error in the plugin, proxy-wasm also stops iteration. During HTTP request processing this leads to the following chain of events: 1. During request proxy-wasm plugin calls proxy_send_local_response 2. proxy_send_local_response calls sendLocalReply, which schedules the local reply to be processed later through the filter chain 3. Request processing filter chain gets aborted and Envoy sends the previous created local reply though the filter chain 4. Proxy-wasm plugin gets called to process the response it generated and it calls proxy_send_local_response 5. proxy_send_local_response **does not** call sendLocalReply, because proxy-wasm prevents multiple calls to sendLocalReply currently 6. proxy-wasm stops iteration So in the end the filter chain iteration is stopped for the response and because proxy_send_local_respose does not actually call sendLocalReply we don't send another locally generated response either. I think we can do slightly better and close the connection in this case. This change includes the following parts: 1. Partial rollback of #23049 2. Tests covering this case and some other using the actual FilterManager. The most important question is why rolling back #23049 now is safe? The reason why it's safe, is that since introduction of prepareLocalReplyViaFilterChain in #24367, calling sendLocalReply multiple times is safe - that PR basically address the issue in a generic way for all the plugins, so a proxy-wasm specific fix is not needed anymore. On top of being safe, there are additional benefits to making this change: 1. We don't end up with a stuck connection in case of errors, which is slightly better 2. We remove a failure mode from proxy_send_local_response that was introduced in #23049 - which is good, because proxy-wasm plugins don't have a good fallback when proxy_send_local_response is failing. Finally, why replace the current mocks with a real FilterManager? Mock implementation of sendLocalReply works fine for tests that just need to assert that sendLocalReply gets called. However, in this case we rely on the fact that it's safe to call sendLocalReply multiple times and it will do the right thing and we want to assert that the connection will get closed in the end - that cannot be tested by just checking that the sendLocalReply gets called or by relying on a simplistic mock implementation of sendLocalReply. Additional Description: Risk Level: low Testing: Manually, by reproducing the case reported in #28826. I also added new unit tests and verified that they pass and aren't flaky: ``` bazel test --runs_per_test=1000 //test/extensions/common/wasm:all --config=docker-clang ``` Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Fixes #28826 --------- Signed-off-by: Mikhail Krinkin <[email protected]>
fix #23047
Commit Message:
Additional Description:
Risk Level: low
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]