Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sendLocalResponse in wasm #23049

Merged
merged 11 commits into from
Oct 11, 2022
Merged

fix sendLocalResponse in wasm #23049

merged 11 commits into from
Oct 11, 2022

Conversation

johnlanni
Copy link
Contributor

fix #23047

Commit Message:
Additional Description:
Risk Level: low
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@johnlanni johnlanni requested a review from lizan as a code owner September 9, 2022 08:09
@mathetake
Copy link
Member

/wait

@johnlanni johnlanni requested review from htuch and removed request for lizan September 30, 2022 02:17
test/extensions/common/wasm/wasm_test.cc Outdated Show resolved Hide resolved
test/extensions/common/wasm/wasm_test.cc Outdated Show resolved Hide resolved
});
}
local_reply_hold_ = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both hold and sent? Doesn't your "hold" case already cover the sent case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder_callbacks_->sendLocalReply(Envoy::Http::Code::ServiceUnavailable, "", nullptr,

https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/4fcf895fa2433a1cdf20704926b8b7e4039a6a04/src/context.cc#L34

This is to avoid sending the local reply twice if the vm fails. I added another test for this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @mathetake @lizan here, can you folks take a pass as CODEOWNERS? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unfortunately I think it is inevitable to have both of these two booleans.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnlanni this is fine I think, can you add some documentation in the form of comments for posterity, as the reasons we need this is subtle as per your above comment? Everything else LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch done

Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
This reverts commit 01d9b4cbbb77f5508a285590e6f83b98c9e19496.

Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
Signed-off-by: johnlanni <[email protected]>
@johnlanni
Copy link
Contributor Author

https://dev.azure.com/cncf/envoy/_build/results?buildId=117861&view=logs&j=b7634614-24f3-5416-e791-4f3affaaed6c&t=21e6aa7d-f369-5abd-5e4e-e888cac18e9c

using proxy_wasm::ContextBase;

There seems to be a problem with this clang tidy check, this using will be used in the context.cc like:

Context::Context(Wasm* wasm) : ContextBase(wasm) {}

@johnlanni johnlanni requested a review from htuch October 1, 2022 14:18
test/extensions/common/wasm/test_data/test_context_cpp.cc Outdated Show resolved Hide resolved
});
}
local_reply_hold_ = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @mathetake @lizan here, can you folks take a pass as CODEOWNERS? Thanks.

Signed-off-by: johnlanni <[email protected]>
@johnlanni johnlanni requested review from htuch and mathetake and removed request for htuch and mathetake October 3, 2022 05:23
@johnlanni johnlanni requested review from mathetake and removed request for htuch October 6, 2022 13:42
test/extensions/common/wasm/wasm_test.cc Outdated Show resolved Hide resolved
test/extensions/common/wasm/wasm_test.cc Outdated Show resolved Hide resolved
Signed-off-by: johnlanni <[email protected]>
@johnlanni johnlanni requested a review from mathetake October 7, 2022 05:03
Signed-off-by: johnlanni <[email protected]>
mathetake
mathetake previously approved these changes Oct 9, 2022
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johnlanni johnlanni requested a review from htuch October 10, 2022 02:09
Signed-off-by: johnlanni <[email protected]>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@htuch htuch merged commit ff49762 into envoyproxy:main Oct 11, 2022
krinkinmu added a commit to krinkinmu/envoy that referenced this pull request Oct 24, 2024
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Change to Envoy implementation of failStream used by proxy-wasm in
   case of critical errors
3. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Why do we need to change the implementation of the failStream?

failStream gets called when Wasm VM crashes (e.g., null pointer derefernce
or abort inside the VM or any other unrecoverable error with the VM).
Current implementation just calls sendLocalReply in this case. Let's
consider what happens during the HTTP request processing when Wasm VM
crashes:

1. Wasm VM crashes
2. proxy-wasm calls failStream which calls sendLocalReply
3. Envoy prepares local reply and eventually sends it through the filter
   chain
4. proxy-wasm plugin with a crashed VM is called to process the reply

proxy-wasm in this case can't really do anything and just stops the
iteration. Which is a fine way of dealing with the issue, but we can
do slightly better and close the connection in this case instead of
just pausing the iteration.

And we are not losing anything in this case by replacing sendLocalReply
with resetStream, because the local reply doesn't get send anyways.

> NOTE: The same issue does not happen if the VM crashes during response
> processing, because sendLocalReply in this case will send the response
> directly ignoring the filter chain.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
krinkinmu added a commit to krinkinmu/envoy that referenced this pull request Oct 24, 2024
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Change to Envoy implementation of failStream used by proxy-wasm in
   case of critical errors
3. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Why do we need to change the implementation of the failStream?

failStream gets called when Wasm VM crashes (e.g., null pointer derefernce
or abort inside the VM or any other unrecoverable error with the VM).
Current implementation just calls sendLocalReply in this case. Let's
consider what happens during the HTTP request processing when Wasm VM
crashes:

1. Wasm VM crashes
2. proxy-wasm calls failStream which calls sendLocalReply
3. Envoy prepares local reply and eventually sends it through the filter
   chain
4. proxy-wasm plugin with a crashed VM is called to process the reply

proxy-wasm in this case can't really do anything and just stops the
iteration. Which is a fine way of dealing with the issue, but we can
do slightly better and close the connection in this case instead of
just pausing the iteration.

And we are not losing anything in this case by replacing sendLocalReply
with resetStream, because the local reply doesn't get send anyways.

> NOTE: The same issue does not happen if the VM crashes during response
> processing, because sendLocalReply in this case will send the response
> directly ignoring the filter chain.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
krinkinmu added a commit to krinkinmu/envoy that referenced this pull request Oct 24, 2024
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
kyessenov pushed a commit that referenced this pull request Oct 31, 2024
…6809)

Commit Message:
This change fixes #28826. Some
additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in #28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue (see
#23049) we added logic to
proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only the
first call will result in sendLocalReply, while all subsequent calls
will get ignored. At the same time, when proxy-wasm plugins call
proxy_send_local_response, because it's used to report an error in the
plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain of
events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules the
local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response and
because proxy_send_local_respose does not actually call sendLocalReply
we don't send another locally generated response either.

I think we can do slightly better and close the connection in this case.
This change includes the following parts:

1. Partial rollback of #23049
2. Tests covering this case and some other using the actual
FilterManager.

The most important question is why rolling back
#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a
generic way for all the plugins, so a proxy-wasm specific fix is not
needed anymore.

On top of being safe, there are additional benefits to making this
change:

1. We don't end up with a stuck connection in case of errors, which is
slightly better
2. We remove a failure mode from proxy_send_local_response that was
introduced in #23049 - which is
good, because proxy-wasm plugins don't have a good fallback when
proxy_send_local_response is failing.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just
need to assert that sendLocalReply gets called. However, in this case we
rely on the fact that it's safe to call sendLocalReply multiple times
and it will do the right thing and we want to assert that the connection
will get closed in the end - that cannot be tested by just checking that
the sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Additional Description:
Risk Level: low
Testing: Manually, by reproducing the case reported in
#28826. I also added new unit
tests and verified that they pass and aren't flaky:

```
bazel test --runs_per_test=1000 //test/extensions/common/wasm:all --config=docker-clang
```

Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Fixes #28826

---------

Signed-off-by: Mikhail Krinkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calling proxy_send_local_response twice in wasm code results in access to freed memory space
4 participants