Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASM Calling proxy_send_local_response twice will stuck remote http client(e.g. curl) forever until timeout or interrupted #28826

Closed
orangetangerine opened this issue Aug 4, 2023 · 15 comments · Fixed by #36809
Assignees

Comments

@orangetangerine
Copy link

orangetangerine commented Aug 4, 2023

Title: WASM Calling proxy_send_local_response twice will stuck remote http client(e.g. curl) forever until timeout or interrupted

Description:
I have a wasm running on envoy, and I found http request will be stuck if I call proxy_send_local_response both in on_http_request_headers and on_http_response_headers.

Repro steps:
Here is the shortest wasm code:

use log::info;
use proxy_wasm::traits::*;
use proxy_wasm::types::*;

proxy_wasm::main! {{
    proxy_wasm::set_log_level(LogLevel::Debug);
    proxy_wasm::set_root_context(|_| -> Box<dyn RootContext> { Box::new(HttpAuthProxyRoot{})});
}}


struct HttpAuthProxyRoot {}

impl RootContext for HttpAuthProxyRoot {
    fn create_http_context(&self, context_id: u32) -> Option<Box<dyn HttpContext>> {
        Some(Box::new(HttpAuthProxy::new(context_id)))
    }

    fn get_type(&self) -> Option<ContextType> {
        Some(ContextType::HttpContext)
    }
}

impl Context for HttpAuthProxyRoot {}

#[derive(Default)]
struct HttpAuthProxy {
    context_id: u32,
}

impl HttpAuthProxy {
    fn new(context_id: u32) -> HttpAuthProxy {
        HttpAuthProxy { context_id }
    }

    fn test_response(&self) -> Action {
        self.send_http_response(401, vec![], None);
        Action::Continue
    }
}

impl HttpContext for HttpAuthProxy {
    fn on_http_request_headers(&mut self, _num_headers: usize, _end_of_stream: bool) -> Action {
        let ret = self.test_response();
        info!("on_http_request_headers done");
        ret
    }

    fn on_http_response_headers(&mut self, _num_headers: usize, _end_of_stream: bool) -> Action {
        let ret = self.test_response();
        info!("on_http_response_headers done");
        ret
    }

    fn on_log(&mut self) {
        info!("#{} completed", self.context_id);
    }
}

impl Context for HttpAuthProxy {}

envoy docker-compose configuration:

services:
  envoy:
    image: envoyproxy/envoy:v1.24.0
    command: -c /etc/envoy/envoy.yaml -l debug
    hostname: envoy
    ports:
      - "10000:10000"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./target/wasm32-wasi/release:/etc/envoy/proxy-wasm-plugins
    networks:
      - envoymesh
networks:
  envoymesh: {}

and envoy.yaml:

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 15000
static_resources:
  clusters:
    - name: test_cluster
      type: STATIC
      connect_timeout: 0.250s
      load_assignment:
        cluster_name: test_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 15000
  listeners:
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                codec_type: AUTO
                route_config:
                  name: local_routes
                  virtual_hosts:
                    - name: local_service
                      domains:
                        - "*"
                      routes:
                        - match:
                            prefix: "/"
                          direct_response:
                            status: 200
                            body:
                              inline_string: "Request /hello and be welcomed!\n"
                http_filters:
                  - name: envoy.filters.http.wasm
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: "http_headers"
                          vm_config:
                            runtime: "envoy.wasm.runtime.v8"
                            code:
                              local:
                                filename: "/etc/envoy/proxy-wasm-plugins/plugin.wasm"
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

Logs:
image

And here is log on client side:
image

I found out that it's caused by this PR #23049

It occurs since envoy 1.24+ (also Istio 1.16+).
If I revert these codes and build, it works again.

I don't know if this behaviour is expected or changed by accident.

It only occurs in my QA cluster (with a higher version of envoy, 1.24+)

@orangetangerine orangetangerine added bug triage Issue requires triage labels Aug 4, 2023
@htuch htuch added help wanted Needs help! area/wasm and removed triage Issue requires triage labels Aug 4, 2023
@htuch
Copy link
Member

htuch commented Aug 4, 2023

@mpwarres

@keithmattix
Copy link
Contributor

I can tackle this

@keithmattix
Copy link
Contributor

/assign @keithmattix

@krinkinmu
Copy link
Contributor

/assign @krinkinmu

Copy link

krinkinmu is not allowed to assign users.

🐱

Caused by: a #28826 (comment) was created by @krinkinmu.

see: more, trace.

@keithmattix
Copy link
Contributor

/assign @krinkinmu

Copy link

@krinkinmu cannot be assigned to this issue.

🐱

Caused by: a #28826 (comment) was created by @keithmattix.

see: more, trace.

@keithmattix
Copy link
Contributor

@alyssawilk any idea what the issue is here?

@krinkinmu
Copy link
Contributor

I was able to reproduce the issue on a recent head build and with C++ based filter. Going through the issue with debugger suggest that roughly the following is happening:

  1. onRequestHeaders triggers local reply
  2. local reply goes throgh the filter chain
  3. it triggers onResponseHeaders that also attempts to send a local reply again, while we are still processing the previous local reply
  4. Second local reply request gets ignored (as expected), but because the plugin tries to trigger a local reply, proxy wasm pauses filter chain processing (because that's how it always reacts to local replies, see https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/3212034ee6e75d54cbf91b65cc659772166442f4/src/exports.cc#L158 and https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/3212034ee6e75d54cbf91b65cc659772166442f4/src/context.cc#L513-L519) and at that point Envoy get stuck.

As for the fix, I think the filter itself can avoid triggering a second local reply if it's processing a local reply already. That being said, I don't think that Envoy hanging in this case is desirable behavior, regardless of the filter logic, so I think Envoy could do better in this case.

I see a couple of options:

  1. Given that second local reply was ignored, maybe we should just avoid pausing filter chain processing in this case, given that we didn't actually trigger a local send reply
  2. Actually trigger the second local reply and not ignore it.

NOTE: mostly for myself, a third option would be to not send local replies through the filter chain, but I don't think we should change the behavior that was around for years so I'm dismissing this option right away.

I find option 2 rather confusing, what is the desired behaviour be in this case? The second local reply overwrites the first one? Because of this I prefer option 1, so I'm starting to work on the code change to implement option 1.

@krinkinmu
Copy link
Contributor

One more thing to mention, the fix should be on the proxy-wasm-cpp-host side, I think.

@krinkinmu
Copy link
Contributor

/assign @krinkinmu

Copy link

krinkinmu is not allowed to assign users.

🐱

Caused by: a #28826 (comment) was created by @krinkinmu.

see: more, trace.

@krinkinmu
Copy link
Contributor

@alyssawilk any idea what the issue is here?

@keithmattix I thought that it was a propagation delay of some sort, but it does not seem to be the case anymore. Looking at what repokitteh does, I think that it is supposed to just call github API (see https://github.com/repokitteh/modules/blob/master/assign.star#L8-L10 and https://docs.github.com/en/rest/issues/assignees?apiVersion=2022-11-28#check-if-a-user-can-be-assigned).

However, running the API check locally works for me:

$ gh api \                                     
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/envoyproxy/envoy/assignees/krinkinmu
$ echo $?
0

and for the reference, this is how I think it looks when I don't have permissions

$ gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /repos/proxy-wasm/proxy-wasm-cpp-sdk/assignees/krinkinmu
{
  "message": "Not Found",
  "documentation_url": "https://docs.github.com/rest/issues/assignees#check-if-a-user-can-be-assigned",
  "status": "404"
}
gh: Not Found (HTTP 404)
$ echo $?
1

I'm confused why it does not work, but maybe we should discuss the repo access issues outside of this bug?

krinkinmu added a commit to krinkinmu/proxy-wasm-cpp-host that referenced this issue Oct 16, 2024
For context see the Envoy issue envoyproxy/envoy#28826.
Here is a shorter summary:

1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and
   onResponseHeaders
2. When proxy_send_local_reply is called from onRequestHeaders it triggers
   a local reply and that reply goes through the filter chain in Envoy
3. The same plugin is called again as part of the filter chain processing
   but this time onResponseHeaders is called
4. onResponseHeaders calls proxy_send_local_response which ultimately does
   not generate a local reply, but it stops filter chain processing.

As a result we end up with a stuck connection on Envoy - no local reply
and processing is stopped.

I think that proxy wasm plugins shouldn't use proxy_send_local_response this
way, so ultimately whoever created such a plugin shot themselves in the foot.
That being said, I think there are a few improvements that could be made here
on Envoy/proxy-wasm side to handle this situation somewhat better:

1. We can avoid stopping processing in such cases to prevent stuck connections
   on Envoy
2. We can return errors from proxy_send_local_response instead of silently
   ignoring them.

Currently Envoy implementation of sendLocalResponse can detect when a second
local response is requested and returns an error in this case without actually
trying to send a local response.

However, even though Envoy reports an error, send_local_response ignores the
result of the host specific sendLocalResponse implementation and stops
processing and returns success to the wasm plugin.

With this change, send_local_response will check the result of the
host-specific implementation of the sendLocalResponse. In cases when
sendLocalResponse fails it will just propagate the error to the caller and
do nothing else (including stopping processing).

I think this behavior makes sense in principle because on the one hand we don't
ignore the failure from sendLocalResponse and on the other hand, when the
failure happens we don't trigger any side-effects expected from the successful
proxy_send_local_response call.

NOTE: Even though I do think that this is a more resonable behavior, it's
still a change from the previous behavior and it might break existing
proxy-wasm plugins. Specifically:

1. C++ plugins that proactively check the result of proxy_send_local_response
   will change behavior (e.g., before proxy_send_local_response failed silently)
2. Rust plugins, due to the way Rust SDK handles errors from
   proxy_send_local_response will crash in runtime in this case.

On the bright side of things, the plugins that are affected by this change
currently just cause stuck connections in Envoy, so we are changing one
undesirable behavior for another, but more explicit.

Signed-off-by: Mikhail Krinkin <[email protected]>
@krinkinmu
Copy link
Contributor

I prepared a fix for proxy-wasm-host-cpp that resolves the issue: proxy-wasm/proxy-wasm-cpp-host#423. Will see if we can get consensus on the approach and get it merged there.

krinkinmu added a commit to krinkinmu/proxy-wasm-cpp-host that referenced this issue Oct 18, 2024
For context see the Envoy issue envoyproxy/envoy#28826.
Here is a shorter summary:

1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and
   onResponseHeaders
2. When proxy_send_local_reply is called from onRequestHeaders it triggers
   a local reply and that reply goes through the filter chain in Envoy
3. The same plugin is called again as part of the filter chain processing
   but this time onResponseHeaders is called
4. onResponseHeaders calls proxy_send_local_response which ultimately does
   not generate a local reply, but it stops filter chain processing.

As a result we end up with a stuck connection on Envoy - no local reply
and processing is stopped.

I think that proxy wasm plugins shouldn't use proxy_send_local_response this
way, so ultimately whoever created such a plugin shot themselves in the foot.
That being said, I think there are a few improvements that could be made here
on Envoy/proxy-wasm side to handle this situation somewhat better:

1. We can avoid stopping processing in such cases to prevent stuck connections
   on Envoy
2. We can return errors from proxy_send_local_response instead of silently
   ignoring them.

Currently Envoy implementation of sendLocalResponse can detect when a second
local response is requested and returns an error in this case without actually
trying to send a local response.

However, even though Envoy reports an error, send_local_response ignores the
result of the host specific sendLocalResponse implementation and stops
processing and returns success to the wasm plugin.

With this change, send_local_response will check the result of the
host-specific implementation of the sendLocalResponse. In cases when
sendLocalResponse fails it will just propagate the error to the caller and
do nothing else (including stopping processing).

I think this behavior makes sense in principle because on the one hand we don't
ignore the failure from sendLocalResponse and on the other hand, when the
failure happens we don't trigger any side-effects expected from the successful
proxy_send_local_response call.

NOTE: Even though I do think that this is a more resonable behavior, it's
still a change from the previous behavior and it might break existing
proxy-wasm plugins. Specifically:

1. C++ plugins that proactively check the result of proxy_send_local_response
   will change behavior (e.g., before proxy_send_local_response failed silently)
2. Rust plugins, due to the way Rust SDK handles errors from
   proxy_send_local_response will crash in runtime in this case.

On the bright side of things, the plugins that are affected by this change
currently just cause stuck connections in Envoy, so we are changing one
undesirable behavior for another, but more explicit.

Signed-off-by: Mikhail Krinkin <[email protected]>
krinkinmu added a commit to krinkinmu/proxy-wasm-cpp-host that referenced this issue Oct 18, 2024
For context see the Envoy issue envoyproxy/envoy#28826.
Here is a shorter summary:

1. A wasm plugin calls proxy_send_local_response from both onRequestHeaders and
   onResponseHeaders
2. When proxy_send_local_reply is called from onRequestHeaders it triggers
   a local reply and that reply goes through the filter chain in Envoy
3. The same plugin is called again as part of the filter chain processing
   but this time onResponseHeaders is called
4. onResponseHeaders calls proxy_send_local_response which ultimately does
   not generate a local reply, but it stops filter chain processing.

As a result we end up with a stuck connection on Envoy - no local reply
and processing is stopped.

I think that proxy wasm plugins shouldn't use proxy_send_local_response this
way, so ultimately whoever created such a plugin shot themselves in the foot.
That being said, I think there are a few improvements that could be made here
on Envoy/proxy-wasm side to handle this situation somewhat better:

1. We can avoid stopping processing in such cases to prevent stuck connections
   on Envoy
2. We can return errors from proxy_send_local_response instead of silently
   ignoring them.

Currently Envoy implementation of sendLocalResponse can detect when a second
local response is requested and returns an error in this case without actually
trying to send a local response.

However, even though Envoy reports an error, send_local_response ignores the
result of the host specific sendLocalResponse implementation and stops
processing and returns success to the wasm plugin.

With this change, send_local_response will check the result of the
host-specific implementation of the sendLocalResponse. In cases when
sendLocalResponse fails it will just propagate the error to the caller and
do nothing else (including stopping processing).

I think this behavior makes sense in principle because on the one hand we don't
ignore the failure from sendLocalResponse and on the other hand, when the
failure happens we don't trigger any side-effects expected from the successful
proxy_send_local_response call.

NOTE: Even though I do think that this is a more resonable behavior, it's
still a change from the previous behavior and it might break existing
proxy-wasm plugins. Specifically:

1. C++ plugins that proactively check the result of proxy_send_local_response
   will change behavior (e.g., before proxy_send_local_response failed silently)
2. Rust plugins, due to the way Rust SDK handles errors from
   proxy_send_local_response will crash in runtime in this case.

On the bright side of things, the plugins that are affected by this change
currently just cause stuck connections in Envoy, so we are changing one
undesirable behavior for another, but more explicit.

Signed-off-by: Mikhail Krinkin <[email protected]>
krinkinmu added a commit to krinkinmu/envoy that referenced this issue Oct 24, 2024
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Change to Envoy implementation of failStream used by proxy-wasm in
   case of critical errors
3. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Why do we need to change the implementation of the failStream?

failStream gets called when Wasm VM crashes (e.g., null pointer derefernce
or abort inside the VM or any other unrecoverable error with the VM).
Current implementation just calls sendLocalReply in this case. Let's
consider what happens during the HTTP request processing when Wasm VM
crashes:

1. Wasm VM crashes
2. proxy-wasm calls failStream which calls sendLocalReply
3. Envoy prepares local reply and eventually sends it through the filter
   chain
4. proxy-wasm plugin with a crashed VM is called to process the reply

proxy-wasm in this case can't really do anything and just stops the
iteration. Which is a fine way of dealing with the issue, but we can
do slightly better and close the connection in this case instead of
just pausing the iteration.

And we are not losing anything in this case by replacing sendLocalReply
with resetStream, because the local reply doesn't get send anyways.

> NOTE: The same issue does not happen if the VM crashes during response
> processing, because sendLocalReply in this case will send the response
> directly ignoring the filter chain.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
krinkinmu added a commit to krinkinmu/envoy that referenced this issue Oct 24, 2024
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Change to Envoy implementation of failStream used by proxy-wasm in
   case of critical errors
3. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Why do we need to change the implementation of the failStream?

failStream gets called when Wasm VM crashes (e.g., null pointer derefernce
or abort inside the VM or any other unrecoverable error with the VM).
Current implementation just calls sendLocalReply in this case. Let's
consider what happens during the HTTP request processing when Wasm VM
crashes:

1. Wasm VM crashes
2. proxy-wasm calls failStream which calls sendLocalReply
3. Envoy prepares local reply and eventually sends it through the filter
   chain
4. proxy-wasm plugin with a crashed VM is called to process the reply

proxy-wasm in this case can't really do anything and just stops the
iteration. Which is a fine way of dealing with the issue, but we can
do slightly better and close the connection in this case instead of
just pausing the iteration.

And we are not losing anything in this case by replacing sendLocalReply
with resetStream, because the local reply doesn't get send anyways.

> NOTE: The same issue does not happen if the VM crashes during response
> processing, because sendLocalReply in this case will send the response
> directly ignoring the filter chain.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
krinkinmu added a commit to krinkinmu/envoy that referenced this issue Oct 24, 2024
This change fixes envoyproxy#28826.
Some additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in envoyproxy#28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue
(see envoyproxy#23049) we added logic
to proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only
the first call will result in sendLocalReply, while all subsequent
calls will get ignored. At the same time, when proxy-wasm plugins
call proxy_send_local_response, because it's used to report an
error in the plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain
of events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules
   the local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
   previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
   and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
   proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response
and because proxy_send_local_respose does not actually call
sendLocalReply we don't send another locally generated response
either.

I think we can do slightly better and close the connection in this
case. This change includes the following parts:

1. Partial rollback of envoyproxy#23049
2. Tests covering this case and some other using the actual FilterManager.

The most important question is why rolling back
envoyproxy#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
envoyproxy#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a generic
way for all the plugins, so a proxy-wasm specific fix is not needed anymore.

On top of being safe, there are additional benefits to making this change:

1. We don't end up with a stuck connection in case of errors, which is
   slightly better
2. We remove a failure mode from proxy_send_local_response that was
   introduced in envoyproxy#23049 - which
   is good, because proxy-wasm plugins don't have a good fallback when
   proxy_send_local_response is failing.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just need
to assert that sendLocalReply gets called. However, in this case we rely
on the fact that it's safe to call sendLocalReply multiple times and it
will do the right thing and we want to assert that the connection will
get closed in the end - that cannot be tested by just checking that the
sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Signed-off-by: Mikhail Krinkin <[email protected]>
@krinkinmu
Copy link
Contributor

/assign @krinkinmu

kyessenov pushed a commit that referenced this issue Oct 31, 2024
…6809)

Commit Message:
This change fixes #28826. Some
additional discussions for context can be found in
proxy-wasm/proxy-wasm-cpp-host#423.

The issue reported in #28826
happens when proxy-wasm plugin calls proxy_send_local_response during
the HTTP request proessing and HTTP response processing.

This happens because in attempt to mitigate a use-after-free issue (see
#23049) we added logic to
proxy-wasm that avoids calling sendLocalReply multiple times.

So now when proxy-wasm plugin calls proxy_send_local_response only the
first call will result in sendLocalReply, while all subsequent calls
will get ignored. At the same time, when proxy-wasm plugins call
proxy_send_local_response, because it's used to report an error in the
plugin, proxy-wasm also stops iteration.

During HTTP request processing this leads to the following chain of
events:

1. During request proxy-wasm plugin calls proxy_send_local_response
2. proxy_send_local_response calls sendLocalReply, which schedules the
local reply to be processed later through the filter chain
3. Request processing filter chain gets aborted and Envoy sends the
previous created local reply though the filter chain
4. Proxy-wasm plugin gets called to process the response it generated
and it calls proxy_send_local_response
5. proxy_send_local_response **does not** call sendLocalReply, because
proxy-wasm prevents multiple calls to sendLocalReply currently
6. proxy-wasm stops iteration

So in the end the filter chain iteration is stopped for the response and
because proxy_send_local_respose does not actually call sendLocalReply
we don't send another locally generated response either.

I think we can do slightly better and close the connection in this case.
This change includes the following parts:

1. Partial rollback of #23049
2. Tests covering this case and some other using the actual
FilterManager.

The most important question is why rolling back
#23049 now is safe?

The reason why it's safe, is that since introduction of
prepareLocalReplyViaFilterChain in
#24367, calling sendLocalReply
multiple times is safe - that PR basically address the issue in a
generic way for all the plugins, so a proxy-wasm specific fix is not
needed anymore.

On top of being safe, there are additional benefits to making this
change:

1. We don't end up with a stuck connection in case of errors, which is
slightly better
2. We remove a failure mode from proxy_send_local_response that was
introduced in #23049 - which is
good, because proxy-wasm plugins don't have a good fallback when
proxy_send_local_response is failing.

Finally, why replace the current mocks with a real FilterManager?

Mock implementation of sendLocalReply works fine for tests that just
need to assert that sendLocalReply gets called. However, in this case we
rely on the fact that it's safe to call sendLocalReply multiple times
and it will do the right thing and we want to assert that the connection
will get closed in the end - that cannot be tested by just checking that
the sendLocalReply gets called or by relying on a simplistic mock
implementation of sendLocalReply.

Additional Description:
Risk Level: low
Testing: Manually, by reproducing the case reported in
#28826. I also added new unit
tests and verified that they pass and aren't flaky:

```
bazel test --runs_per_test=1000 //test/extensions/common/wasm:all --config=docker-clang
```

Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Fixes #28826

---------

Signed-off-by: Mikhail Krinkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants