Envoy v1.20.0 slower to re-initialize dynamic listeners #18616

sunjayBhatia · 2021-10-13T21:06:10Z

Description:
It appears Envoy v1.20.0 is a bit slower to re-initialize dynamic listeners for which configuration has changed.

As part of regular dependency bumps, when bumping Contour's dependency on Envoy from v1.19.1 to v1.20.0 (see PR) we have seen our end-to-end tests fail/take quite a lot longer to run to completion. These test repeatedly create/delete k8s Ingress/Contour HTTPProxy/Gateway API *Route objects that the Contour controller programs Envoy dynamically with over xDS. The tests update these k8s resources and then assert requests to Envoy succeed. We've seen the time taken for requests to succeed when config is updated increase substantially when bumping to v1.20.0 with the same Contour controller code.

With a git bisect of commits betwen v1.19.0 and v1.20.0, it seems that we can narrow this slowdown to this commit that for one changes the default for Envoy to use the SO_REUSPORT socket option but also changes how listener updates work. Granted this is not the most exact/minimal test case but it appears this commit might be related.

Repro steps:
I can try to work up a smaller repro but for now, running the Contour e2e tests has been how we have experimented, capturing logs and fiddling with timeouts etc.:

Clone https://github.com/projectcontour/contour
Ensure Docker, kind, go, and ginkgo are installed
Run make e2e
Note run time of test suites
Update Envoy image to v1.20.0 here
Re-run make e2e
Note increased run time of test suites/failures

Theories/mitigations

We are using the default Go HTTP client in tests and logging request errors in our test suite we see TLS handshake timeout errors for HTTPS requests/plain HTTP requests hang forever (Go default client has no request timeout, oops on our part using that!)
It appears the updated listener update/worker logic holds open connections that never succeed, maybe as part of what is described in the linked change in the effort to not drop connections on draining listeners
In the Contour tests, we can mitigate slowdowns by adding more strict timeouts to HTTP requests, though we still see listener updates as much slower anecdotally

The text was updated successfully, but these errors were encountered:

howardjohn · 2021-10-13T21:10:37Z

We are seeing the same in Istio. Essentially it looks like when a listener drains, some (maybe 50%, but I didn't get a large sample size to say exactly 50/50) hang. netstat shows them stuck in Recv-Q.

Suspecting the reuseport stuff, we disabled it via runtime flag. When we did this, we got a different error - NACKs from address already in use.

cc @mattklein123 @lambdai

sunjayBhatia · 2021-10-13T21:11:12Z

Suspecting the reuseport stuff, we disabled it via runtime flag. When we did this, we got a different error - NACKs from address already in use.

Same for Contour, except set enable reuseport setting to false, rather than use runtime config

lambdai · 2021-10-13T21:19:25Z

Regarding enable reuse port:

The underlying expectation is that kernel should requeue the socket of the dying passive socket(old listener) to another passive socket(new listener).
What's importantly, the libevent should somehow be notified and wake 1+ worker thread.

I am not sure which piece went broken.

mattklein123 · 2021-10-13T21:32:09Z

The underlying expectation is that kernel should requeue the socket of the dying passive socket(old listener) to another passive socket(new listener). What's importantly, the libevent should somehow be notified and wake 1+ worker thread.

Unfortunately this is not the way it works. Sockets are not re-queued. A lot of care was put into make sure that we don't wind up with dead sockets that are never accepted, and I'm surprised to hear this is happening. I'm happy to look into this if we can come up with a more self contained repro?

lambdai · 2021-10-13T22:11:06Z

I am interested in the behavior of reuse port disabled.

Since 1.20 the underlying fd is duplicated anyway regardless the old listener's listen fd is closed or not. See also ListenerManagerImpl::setNewOrDrainingSocketFactory

sunjayBhatia · 2021-10-14T19:30:06Z

a more self contained repro

working on something smaller, was able to get a little spike to do what is happening with Contour, was able to reproduce earlier in this smaller setup but now its not reproducible consistently enough to share yet 😅

lambdai · 2021-10-15T22:50:13Z

This is what I watched today. Ideally we want new listener fd always refers the underlying sock as fd=64 and fd=58. However, I do see below unexpected scenario.
To avoid confusing, the concurrency is 4 and this is port reuse enabled.

istio-proxy@istio-ingressgateway-85bfd47c66-rfv2r:/$ ss -nltp
State            Recv-Q           Send-Q                     Local Address:Port                       Peer Address:Port           Process           
// hit less listener update                                                                             
LISTEN           0                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=64),("envoy",pid=16,fd=58))           
LISTEN           0                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=63),("envoy",pid=16,fd=57))           
LISTEN           0                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=62),("envoy",pid=16,fd=56))           
LISTEN           0                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=59),("envoy",pid=16,fd=55))           
// request lose update. The 5 + 1 + 2 accepted socket will not be served. 
LISTEN           5                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=54))                                  
LISTEN           1                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=53))                                  
LISTEN           0                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=52))                                  
LISTEN           2                1024                             0.0.0.0:8080                            0.0.0.0:*               users:(("envoy",pid=16,fd=51))

mattklein123 · 2021-10-16T15:16:37Z

Ideally we want new listener fd always refers the underlying sock as fd=64 and fd=58

I think what you are seeing here looks correct to me. Each worker as its own fd, which is duplicated.

@lambdai if you have a repro, can you check the following. In the case of a listener update, we should be "cloning" the socket factory which goes through this code:

envoy/source/server/listener_impl.cc

Lines 118 to 132 in 5f7d6ef

    
           for (auto& socket : factory_to_clone.sockets_) { 
        
             // In the cloning case we always duplicate() the socket. This makes sure that during listener 
        
             // update/drain we don't lose any incoming connections when using reuse_port. Specifically on 
        
             // Linux the use of SO_REUSEPORT causes the kernel to allocate a separate queue for each socket 
        
             // on the same address. Incoming connections are immediately assigned to one of these queues. 
        
             // If connections are in the queue when the socket is closed, they are closed/reset, not sent to 
        
             // another queue. So avoid making extra queues in the kernel, even temporarily. 
        
             // 
        
             // TODO(mattklein123): In the current code as long as the address matches, the socket factory 
        
             // will be cloned, effectively ignoring any changed socket options. The code should probably 
        
             // block any updates to listeners that use the same address but different socket options. 
        
             // (It's possible we could handle changing some socket options, but this would be tricky and 
        
             // probably not worth the difficulty.) 
        
             sockets_.push_back(socket->duplicate()); 
        
           }

Can you make sure this code is being invoked during the update and duplicating the socket for each worker? Assuming this is working, the only things I can think of are something like:

There is some kernel bug around duplicating SO_REUSEPORT sockets such that connections are still getting dropped.
Somehow events are getting dropped so that we aren't waking up the duplicated sockets to accept.

It's hard for me to tell in the above dump exactly what is going on (how many listeners, whether duplication occurred, etc.). If you can repro, perhaps if you add some more logging we can have a better idea of what is going on?

lambdai · 2021-10-18T16:20:26Z

@lambdai if you have a repro, can you check the following. In the case of a listener update, we should be "cloning" the socket factory which goes through this code:

This piece seems good.

I may have found the root cause.

During the in place update, a ListenerImpl is created and the ListenSocketFactory is cloned. The listen sockets are cloned, too.

However, per in place update, the old listener is not entirely drained. The old ListenSocketFactory::closeAllSockets is not invoked immediately after the new listener is warmed up.

The old listen sockets are closed in RAII style, which is after the drain_timeout.

That says, OS will schedule accept sockets to the old listen fd and new listen fd, but no handle is watching and accept old listen fd. These sockets are lost.

lambdai · 2021-10-18T16:23:39Z

I have an experimental fix at local that takes ownership of the listener socket factory if the update is in place update.
This fix massages the istio integration tests

mattklein123 · 2021-10-18T16:31:58Z

During the in place update, a ListenerImpl is created and the ListenSocketFactory is cloned. The listen sockets are cloned, too.

OK, this makes sense. I'm OK with a targeted fix, but I remember when we originally reviewed the in-place update changes, I commented on how it seemed strange that we were creating an entire new listener that we don't really use. Can we do additional cleanups here to prevent this type of issue? For an in-place update perhaps there should be no new socket factory at all?

lambdai · 2021-10-19T00:33:48Z

During the in place update, a ListenerImpl is created and the ListenSocketFactory is cloned. The listen sockets are cloned, too.

OK, this makes sense. I'm OK with a targeted fix, but I remember when we originally reviewed the in-place update changes, I commented on how it seemed strange that we were creating an entire new listener that we don't really use. Can we do additional cleanups here to prevent this type of issue? For an in-place update perhaps there should be no new socket factory at all?

I transfer the ownership of listen socket factory to the in place updated new listener, all looks good until I hit the case updating an updating listener. Still fixing...

mattklein123 · 2021-10-19T01:52:11Z

I transfer the ownership of listen socket factory to the in place updated new listener, all looks good until I hit the case updating an updating listener. Still fixing...

OK lmk if you need help fixing. Just ping me on Slack.

sunjayBhatia added bug triage Issue requires triage labels Oct 13, 2021

sunjayBhatia mentioned this issue Oct 13, 2021

Bump Envoy to v1.20.1 projectcontour/contour#4075

Merged

mattklein123 added area/listener investigate Potential bug that needs verification no stalebot Disables stalebot from closing an issue and removed bug triage Issue requires triage labels Oct 13, 2021

mattklein123 self-assigned this Oct 13, 2021

howardjohn mentioned this issue Oct 15, 2021

Automator: update proxy@master in istio/istio@master istio/istio#34188

Merged

lambdai mentioned this issue Oct 18, 2021

Kick-off release 1.20.1 #18658

Merged

davinci26 mentioned this issue Oct 18, 2021

envoy: Update to Envoy version to 1.20.0 openservicemesh/osm#4214

Closed

This was referenced Oct 19, 2021

listener: ListenerSocketFactoryImpl does not dup listen fd at in place update #18677

Closed

listener: listen socket factory can be cloned from listener draining … #18686

Merged

mattklein123 closed this as completed in #18686 Oct 20, 2021

shashankram mentioned this issue Feb 28, 2022

Difference in how connections are handled for draining listeners from v1.19.1 to v1.20.x,v1.21.x #20113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Envoy v1.20.0 slower to re-initialize dynamic listeners #18616

Envoy v1.20.0 slower to re-initialize dynamic listeners #18616

sunjayBhatia commented Oct 13, 2021

howardjohn commented Oct 13, 2021

sunjayBhatia commented Oct 13, 2021 •

edited

Loading

lambdai commented Oct 13, 2021

mattklein123 commented Oct 13, 2021 •

edited

Loading

lambdai commented Oct 13, 2021

sunjayBhatia commented Oct 14, 2021 •

edited

Loading

lambdai commented Oct 15, 2021

mattklein123 commented Oct 16, 2021 •

edited

Loading

lambdai commented Oct 18, 2021

lambdai commented Oct 18, 2021

mattklein123 commented Oct 18, 2021

lambdai commented Oct 19, 2021

mattklein123 commented Oct 19, 2021

Envoy v1.20.0 slower to re-initialize dynamic listeners #18616

Envoy v1.20.0 slower to re-initialize dynamic listeners #18616

Comments

sunjayBhatia commented Oct 13, 2021

howardjohn commented Oct 13, 2021

sunjayBhatia commented Oct 13, 2021 • edited Loading

lambdai commented Oct 13, 2021

mattklein123 commented Oct 13, 2021 • edited Loading

lambdai commented Oct 13, 2021

sunjayBhatia commented Oct 14, 2021 • edited Loading

lambdai commented Oct 15, 2021

mattklein123 commented Oct 16, 2021 • edited Loading

lambdai commented Oct 18, 2021

lambdai commented Oct 18, 2021

mattklein123 commented Oct 18, 2021

lambdai commented Oct 19, 2021

mattklein123 commented Oct 19, 2021

sunjayBhatia commented Oct 13, 2021 •

edited

Loading

mattklein123 commented Oct 13, 2021 •

edited

Loading

sunjayBhatia commented Oct 14, 2021 •

edited

Loading

mattklein123 commented Oct 16, 2021 •

edited

Loading