Skip to content

Commit

Permalink
Prevent stuck connections after local reply in proxy-wasm
Browse files Browse the repository at this point in the history
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Change to Envoy implementation of failStream used by proxy-wasm in
   case of critical errors
3. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Why do we need to change the implementation of the failStream?

failStream gets called when Wasm VM crashes (e.g., null pointer derefernce
or abort inside the VM or any other unrecoverable error with the VM).
Current implementation just calls sendLocalReply in this case. Let's
consider what happens during the HTTP request processing when Wasm VM
crashes:

1. Wasm VM crashes
2. proxy-wasm calls failStream which calls sendLocalReply
3. Envoy prepares local reply and eventually sends it through the filter
   chain
4. proxy-wasm plugin with a crashed VM is called to process the reply

proxy-wasm in this case can't really do anything and just stops the
iteration. Which is a fine way of dealing with the issue, but we can
do slightly better and close the connection in this case instead of
just pausing the iteration.

And we are not losing anything in this case by replacing sendLocalReply
with resetStream, because the local reply doesn't get send anyways.

> NOTE: The same issue does not happen if the VM crashes during response
> processing, because sendLocalReply in this case will send the response
> directly ignoring the filter chain.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
  • Loading branch information
krinkinmu committed Oct 24, 2024
1 parent 742a3b0 commit 39251ed
Show file tree
Hide file tree
Showing 6 changed files with 220 additions and 73 deletions.
30 changes: 11 additions & 19 deletions source/extensions/common/wasm/context.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1562,12 +1562,17 @@ constexpr absl::string_view FailStreamResponseDetails = "wasm_fail_stream";
void Context::failStream(WasmStreamType stream_type) {
switch (stream_type) {
case WasmStreamType::Request:
if (decoder_callbacks_ && !local_reply_sent_) {
decoder_callbacks_->sendLocalReply(Envoy::Http::Code::ServiceUnavailable, "", nullptr,
Grpc::Status::WellKnownGrpcStatus::Unavailable,
FailStreamResponseDetails);
local_reply_sent_ = true;
}
// If Wasm crashes during the request processing we can't send local reply back to
// the client - the response will just get stuck.
//
// The problem is that this local reply will be send through the filter chain and
// Wasm plugin will be called to process the response. However, because Wasm plugin
// crashed it can't process it and will end up just asking Envoy to pause iteration
// and the local reply will never get sent.
//
// The same issue does not happen during response processing, because local replies
// generated during the response processing are sent ignoring filter chain.
closeStream(stream_type);
break;
case WasmStreamType::Response:
if (encoder_callbacks_ && !local_reply_sent_) {
Expand Down Expand Up @@ -1595,12 +1600,6 @@ void Context::failStream(WasmStreamType stream_type) {
WasmResult Context::sendLocalResponse(uint32_t response_code, std::string_view body_text,
Pairs additional_headers, uint32_t grpc_status,
std::string_view details) {
// This flag is used to avoid calling sendLocalReply() twice, even if wasm code has this
// logic. We can't reuse "local_reply_sent_" here because it can't avoid calling nested
// sendLocalReply() during encodeHeaders().
if (local_reply_hold_) {
return WasmResult::BadArgument;
}
// "additional_headers" is a collection of string_views. These will no longer
// be valid when "modify_headers" is finally called below, so we must
// make copies of all the headers.
Expand All @@ -1625,11 +1624,6 @@ WasmResult Context::sendLocalResponse(uint32_t response_code, std::string_view b
modify_headers = std::move(modify_headers), grpc_status,
details = StringUtil::replaceAllEmptySpace(
absl::string_view(details.data(), details.size()))] {
// When the wasm vm fails, failStream() is called if the plugin is fail-closed, we need
// this flag to avoid calling sendLocalReply() twice.
if (local_reply_sent_) {
return;
}
// C++, Rust and other SDKs use -1 (InvalidCode) as the default value if gRPC code is not set,
// which should be mapped to nullopt in Envoy to prevent it from sending a grpc-status trailer
// at all.
Expand All @@ -1640,10 +1634,8 @@ WasmResult Context::sendLocalResponse(uint32_t response_code, std::string_view b
}
decoder_callbacks_->sendLocalReply(static_cast<Envoy::Http::Code>(response_code), body_text,
modify_headers, grpc_status_code, details);
local_reply_sent_ = true;
});
}
local_reply_hold_ = true;
return WasmResult::Ok;
}

Expand Down
1 change: 0 additions & 1 deletion source/extensions/common/wasm/context.h
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,6 @@ class Context : public proxy_wasm::ContextBase,
bool buffering_response_body_ = false;
bool end_of_stream_ = false;
bool local_reply_sent_ = false;
bool local_reply_hold_ = false;
ProtobufWkt::Struct temporary_metadata_;

// MB: must be a node-type map as we take persistent references to the entries.
Expand Down
1 change: 1 addition & 0 deletions test/extensions/common/wasm/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ envoy_cc_test(
"//test/extensions/common/wasm/test_data:test_context_cpp_plugin",
"//test/extensions/common/wasm/test_data:test_cpp_plugin",
"//test/extensions/common/wasm/test_data:test_restriction_cpp_plugin",
"//test/mocks/local_reply:local_reply_mocks",
"//test/mocks/server:server_mocks",
"//test/test_common:environment_lib",
"//test/test_common:registry_lib",
Expand Down
58 changes: 51 additions & 7 deletions test/extensions/common/wasm/test_data/test_context_cpp.cc
Original file line number Diff line number Diff line change
Expand Up @@ -94,22 +94,60 @@ FilterDataStatus DupReplyContext::onRequestBody(size_t, bool) {
return FilterDataStatus::Continue;
}

class PanicReplyContext : public Context {
class LocalReplyInRequestAndResponseContext : public Context {
public:
explicit PanicReplyContext(uint32_t id, RootContext* root) : Context(id, root) {}
explicit LocalReplyInRequestAndResponseContext(uint32_t id, RootContext* root) : Context(id, root) {}
FilterHeadersStatus onRequestHeaders(uint32_t, bool) override;
FilterHeadersStatus onResponseHeaders(uint32_t, bool) override;
private:
EnvoyRootContext* root() { return static_cast<EnvoyRootContext*>(Context::root()); }
};

FilterHeadersStatus LocalReplyInRequestAndResponseContext::onRequestHeaders(uint32_t, bool) {
sendLocalResponse(200, "ok", "body", {});
return FilterHeadersStatus::Continue;
}

FilterHeadersStatus LocalReplyInRequestAndResponseContext::onResponseHeaders(uint32_t, bool) {
sendLocalResponse(200, "ok", "body", {});
return FilterHeadersStatus::Continue;
}

class PanicInRequestContext : public Context {
public:
explicit PanicInRequestContext(uint32_t id, RootContext* root) : Context(id, root) {}
FilterDataStatus onRequestBody(size_t body_buffer_length, bool end_of_stream) override;

private:
EnvoyRootContext* root() { return static_cast<EnvoyRootContext*>(Context::root()); }
};

FilterDataStatus PanicReplyContext::onRequestBody(size_t, bool) {
FilterDataStatus PanicInRequestContext::onRequestBody(size_t, bool) {
sendLocalResponse(200, "not send", "body", {});
int* badptr = nullptr;
*badptr = 0; // NOLINT(clang-analyzer-core.NullDereference)
abort();
return FilterDataStatus::Continue;
}

class PanicInResponseContext : public Context {
public:
explicit PanicInResponseContext(uint32_t id, RootContext* root) : Context(id, root) {}
FilterHeadersStatus onResponseHeaders(uint32_t, bool) override;
FilterHeadersStatus onRequestHeaders(uint32_t, bool) override;

private:
EnvoyRootContext* root() { return static_cast<EnvoyRootContext*>(Context::root()); }
};

FilterHeadersStatus PanicInResponseContext::onRequestHeaders(uint32_t, bool) {
sendLocalResponse(200, "ok", "body", {});
return FilterHeadersStatus::Continue;
}

FilterHeadersStatus PanicInResponseContext::onResponseHeaders(uint32_t, bool) {
abort();
return FilterHeadersStatus::Continue;
}

class InvalidGrpcStatusReplyContext : public Context {
public:
explicit InvalidGrpcStatusReplyContext(uint32_t id, RootContext* root) : Context(id, root) {}
Expand All @@ -127,9 +165,15 @@ FilterDataStatus InvalidGrpcStatusReplyContext::onRequestBody(size_t size, bool)
static RegisterContextFactory register_DupReplyContext(CONTEXT_FACTORY(DupReplyContext),
ROOT_FACTORY(EnvoyRootContext),
"send local reply twice");
static RegisterContextFactory register_PanicReplyContext(CONTEXT_FACTORY(PanicReplyContext),
static RegisterContextFactory register_LocalReplyInRequestAndResponseContext(CONTEXT_FACTORY(LocalReplyInRequestAndResponseContext),
ROOT_FACTORY(EnvoyRootContext),
"local reply in request and response");
static RegisterContextFactory register_PanicInRequestContext(CONTEXT_FACTORY(PanicInRequestContext),
ROOT_FACTORY(EnvoyRootContext),
"panic during request processing");
static RegisterContextFactory register_PanicInResponseContext(CONTEXT_FACTORY(PanicInResponseContext),
ROOT_FACTORY(EnvoyRootContext),
"panic after sending local reply");
"panic during response processing");

static RegisterContextFactory register_InvalidGrpcStatusReplyContext(CONTEXT_FACTORY(InvalidGrpcStatusReplyContext),
ROOT_FACTORY(EnvoyRootContext),
Expand Down
Loading

0 comments on commit 39251ed

Please sign in to comment.