Skip to content

Commit

Permalink
listener: make reuse port the default (#17259)
Browse files Browse the repository at this point in the history
1) Deprecate existing reuse_port field
2) Add new enable_reuse_port field which uses a WKT
3) Make the new default hot restart aware so the default is
   not changed during hot restart.
4) Allow the default to be reverted using the
   "envoy.reloadable_features.listener_reuse_port_default_enabled"
   feature flag.
5) Change listener init so that almost all error handling occurs on
   the main thread. This a) vastly simplifies error handling and
   b) makes it so that we pre-create all sockets on the main thread
   and can use them all during hot restart.
6) Change hot restart to pass reuse port sockets by socket/worker
   index. This works around a race condition in which a draining
   listener has a new connection on its accept queue, but it's
   never accepted by the old process worker. It will be dropped.
   By passing all sockets (even reuse port sockets) we make sure
   the accept queue is fully processed.

Fixes #15794

Risk Level: High, scary stuff involving hot restart and listener init
Testing: New and existing tests. It was very hard to get the tests to pass which gives me more confidence.
Docs Changes: N/A
Release Notes: Added
Platform Specific Features: N/A

Signed-off-by: Matt Klein <[email protected]>
  • Loading branch information
mattklein123 authored Jul 20, 2021
1 parent e0380ad commit ba474ac
Show file tree
Hide file tree
Showing 97 changed files with 1,299 additions and 1,130 deletions.
29 changes: 21 additions & 8 deletions api/envoy/config/listener/v3/listener.proto
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ message ListenerCollection {
repeated xds.core.v3.CollectionEntry entries = 1;
}

// [#next-free-field: 29]
// [#next-free-field: 30]
message Listener {
option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.Listener";

Expand Down Expand Up @@ -255,17 +255,30 @@ message Listener {
// enable the balance config in Y1 and Y2 to balance the connections among the workers.
ConnectionBalanceConfig connection_balance_config = 20;

// Deprecated. Use `enable_reuse_port` instead.
bool reuse_port = 21 [deprecated = true, (envoy.annotations.deprecated_at_minor_version) = "3.0"];

// When this flag is set to true, listeners set the *SO_REUSEPORT* socket option and
// create one socket for each worker thread. This makes inbound connections
// distribute among worker threads roughly evenly in cases where there are a high number
// of connections. When this flag is set to false, all worker threads share one socket.
// of connections. When this flag is set to false, all worker threads share one socket. This field
// defaults to true.
//
// .. attention::
//
// Although this field defaults to true, it has different behavior on different platforms. See
// the following text for more information.
//
// Before Linux v4.19-rc1, new TCP connections may be rejected during hot restart
// (see `3rd paragraph in 'soreuseport' commit message
// <https://github.com/torvalds/linux/commit/c617f398edd4db2b8567a28e89>`_).
// This issue was fixed by `tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket
// <https://github.com/torvalds/linux/commit/40a1227ea845a37ab197dd1caffb60b047fa36b1>`_.
bool reuse_port = 21;
// * On Linux, reuse_port is respected for both TCP and UDP listeners. It also works correctly
// with hot restart.
// * On macOS, reuse_port for TCP does not do what it does on Linux. Instead of load balancing,
// the last socket wins and receives all connections/packets. For TCP, reuse_port is force
// disabled and the user is warned. For UDP, it is enabled, but only one worker will receive
// packets. For QUIC/H3, SW routing will send packets to other workers. For "raw" UDP, only
// a single worker will currently receive packets.
// * On Windows, reuse_port for TCP has undefined behavior. It is force disabled and the user
// is warned similar to macOS. It is left enabled for UDP with undefined behavior currently.
google.protobuf.BoolValue enable_reuse_port = 29;

// Configuration for :ref:`access logs <arch_overview_access_logs>`
// emitted by this listener.
Expand Down
30 changes: 20 additions & 10 deletions api/envoy/config/listener/v4alpha/listener.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion configs/envoyproxy_io_proxy_http3_downstream.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ static_resources:
protocol: UDP
address: 0.0.0.0
port_value: 10000
reuse_port: true
udp_listener_config:
quic_options: {}
downstream_socket_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ admin:
static_resources:
listeners:
- name: listener_0
reuse_port: true
address:
socket_address:
protocol: UDP
Expand Down
15 changes: 15 additions & 0 deletions docs/root/intro/arch_overview/operations/hot_restart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,18 @@ independently.
.. note::

This feature is not supported on Windows.

Socket handling
---------------

By default, Envoy uses :ref:`reuse_port
<envoy_v3_api_field_config.listener.v3.Listener.enable_reuse_port>` sockets on Linux for better
performance. This feature workers correctly during hot restart because Envoy passes each socket
to the new process by worker index. Thus, no connections are dropped in the accept queues of
the draining process.

.. attention::

In the uncommon case in which concurrency changes during hot restart, no connections will be
dropped if concurrency increases. However, if concurrency decreases some connections may be
dropped in the accept queues of the old process workers.
13 changes: 13 additions & 0 deletions docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@ Minor Behavior Changes
which defines the minimal number of headers in a request/response/trailers required for using a
dictionary in addition to the list. Setting the `envoy.http.headermap.lazy_map_min_size` runtime
feature to a non-negative number will override the default value.
* listener: added the :ref:`enable_reuse_port <envoy_v3_api_field_config.listener.v3.Listener.enable_reuse_port>`
field and changed the default for reuse_port from false to true, as the feature is now well
supported on the majority of production Linux kernels in use. The default change is aware of hot
restart, as otherwise the change would not be backwards compatible between restarts. This means
that hot restarting on to a new binary will retain the default of false until the binary undergoes
a full restart. To retain the previous behavior, either explicitly set the new configuration
field to false, or set the runtime feature flag `envoy.reloadable_features.listener_reuse_port_default_enabled`
to false. As part of this change, the use of reuse_port for TCP listeners on both macOS and
Windows has been disabled due to suboptimal behavior. See the field documentation for more
information.

Bug Fixes
---------
Expand All @@ -38,4 +48,7 @@ Deprecated
* http: the HeaderMatcher fields :ref:`exact_match <envoy_v3_api_field_config.route.v3.HeaderMatcher.exact_match>`, :ref:`safe_regex_match <envoy_v3_api_field_config.route.v3.HeaderMatcher.safe_regex_match>`,
:ref:`prefix_match <envoy_v3_api_field_config.route.v3.HeaderMatcher.prefix_match>`, :ref:`suffix_match <envoy_v3_api_field_config.route.v3.HeaderMatcher.suffix_match>` and
:ref:`contains_match <envoy_v3_api_field_config.route.v3.HeaderMatcher.contains_match>` are deprecated by :ref:`string_match <envoy_v3_api_field_config.route.v3.HeaderMatcher.string_match>`.
* listener: :ref:`reuse_port <envoy_v3_api_field_config.listener.v3.Listener.reuse_port>` has been
deprecated in favor of :ref:`enable_reuse_port <envoy_v3_api_field_config.listener.v3.Listener.enable_reuse_port>`.
At the same time, the default has been changed from false to true. See above for more information.

5 changes: 2 additions & 3 deletions envoy/event/dispatcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -241,12 +241,11 @@ class Dispatcher : public DispatcherBase, public ScopeTracker {
* @param socket supplies the socket to listen on.
* @param cb supplies the callbacks to invoke for listener events.
* @param bind_to_port controls whether the listener binds to a transport port or not.
* @param backlog_size controls listener pending connections backlog
* @return Network::ListenerPtr a new listener that is owned by the caller.
*/
virtual Network::ListenerPtr createListener(Network::SocketSharedPtr&& socket,
Network::TcpListenerCallbacks& cb, bool bind_to_port,
uint32_t backlog_size) PURE;
Network::TcpListenerCallbacks& cb,
bool bind_to_port) PURE;

/**
* Creates a logical udp listener on a specific port.
Expand Down
5 changes: 5 additions & 0 deletions envoy/network/connection_handler.h
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,11 @@ class ActiveUdpListenerFactory {
* @return true if the UDP passing through listener doesn't form stateful connections.
*/
virtual bool isTransportConnectionless() const PURE;

/**
* @return socket options specific to this factory that should be applied to all sockets.
*/
virtual const Network::Socket::OptionsSharedPtr& socketOptions() const PURE;
};

using ActiveUdpListenerFactoryPtr = std::unique_ptr<ActiveUdpListenerFactory>;
Expand Down
10 changes: 5 additions & 5 deletions envoy/network/exception.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,20 @@ namespace Envoy {
namespace Network {

/**
* Thrown when there is a runtime error creating/binding a listener.
* Thrown when a socket option cannot be applied.
*/
class CreateListenerException : public EnvoyException {
class SocketOptionException : public EnvoyException {
public:
CreateListenerException(const std::string& what) : EnvoyException(what) {}
SocketOptionException(const std::string& what) : EnvoyException(what) {}
};

/**
* Thrown when there is a runtime error binding a socket.
*/
class SocketBindException : public CreateListenerException {
class SocketBindException : public EnvoyException {
public:
SocketBindException(const std::string& what, int error_number)
: CreateListenerException(what), error_number_(error_number) {}
: EnvoyException(what), error_number_(error_number) {}

// This can't be called errno because otherwise the standard errno macro expansion replaces it.
int errorNumber() const { return error_number_; }
Expand Down
26 changes: 21 additions & 5 deletions envoy/network/listener.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ namespace Network {
class ActiveUdpListenerFactory;
class UdpListenerWorkerRouter;

class ListenSocketFactory;
using ListenSocketFactoryPtr = std::unique_ptr<ListenSocketFactory>;

/**
* ListenSocketFactory is a member of ListenConfig to provide listen socket.
* Listeners created from the same ListenConfig instance have listening sockets
Expand All @@ -36,10 +39,12 @@ class ListenSocketFactory {

/**
* Called during actual listener creation.
* @param worker_index supplies the worker index to get the socket for. All sockets are created
* ahead of time.
* @return the socket to be used for a certain listener, which might be shared
* with other listeners of the same config on other worker threads.
*/
virtual SocketSharedPtr getListenSocket() PURE;
virtual SocketSharedPtr getListenSocket(uint32_t worker_index) PURE;

/**
* @return the type of the socket getListenSocket() returns.
Expand All @@ -53,12 +58,23 @@ class ListenSocketFactory {
virtual const Address::InstanceConstSharedPtr& localAddress() const PURE;

/**
* @return the socket shared by worker threads if any; otherwise return null.
* Clone this socket factory so it can be used by a new listener (e.g., if the address is shared).
*/
virtual SocketOptRef sharedSocket() const PURE;
};
virtual ListenSocketFactoryPtr clone() const PURE;

/**
* Close all sockets. This is used during draining scenarios.
*/
virtual void closeAllSockets() PURE;

using ListenSocketFactorySharedPtr = std::shared_ptr<ListenSocketFactory>;
/**
* Perform any initialization that must occur immediately prior to using the listen socket on
* workers. For example, the actual listen() call, post listen socket options, etc. This is done
* so that all error handling can occur on the main thread and the gap between performing these
* actions and using the socket is minimized.
*/
virtual void doFinalPreWorkerInit() PURE;
};

/**
* Configuration for a UDP listener.
Expand Down
13 changes: 10 additions & 3 deletions envoy/server/hot_restart.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ class HotRestart {
uint64_t parent_connections_ = 0;
};

struct AdminShutdownResponse {
time_t original_start_time_;
bool enable_reuse_port_default_;
};

virtual ~HotRestart() = default;

/**
Expand All @@ -40,9 +45,11 @@ class HotRestart {
* Retrieve a listening socket on the specified address from the parent process. The socket will
* be duplicated across process boundaries.
* @param address supplies the address of the socket to duplicate, e.g. tcp://127.0.0.1:5000.
* @param worker_index supplies the socket/worker index to fetch. When using reuse_port sockets
* each socket is fetched individually to ensure no connection loss.
* @return int the fd or -1 if there is no bound listen port in the parent.
*/
virtual int duplicateParentListenSocket(const std::string& address) PURE;
virtual int duplicateParentListenSocket(const std::string& address, uint32_t worker_index) PURE;

/**
* Initialize the parent logic of our restarter. Meant to be called after initialization of a
Expand All @@ -54,9 +61,9 @@ class HotRestart {
/**
* Shutdown admin processing in the parent process if applicable. This allows admin processing
* to start up in the new process.
* @param original_start_time will be filled with information from our parent, if retrieved.
* @return response if the parent is alive.
*/
virtual void sendParentAdminShutdownRequest(time_t& original_start_time) PURE;
virtual absl::optional<AdminShutdownResponse> sendParentAdminShutdownRequest() PURE;

/**
* Tell our parent process to gracefully terminate itself.
Expand Down
9 changes: 9 additions & 0 deletions envoy/server/instance.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,15 @@ class Instance {
*/
virtual void
setDefaultTracingConfig(const envoy::config::trace::v3::Tracing& tracing_config) PURE;

/**
* Return the default for whether reuse_port is enabled or not. This was added as part of
* fixing https://github.com/envoyproxy/envoy/issues/15794. It is required to know what the
* default was of parent processes during hot restart was, because otherwise switching the
* default on the fly will break existing deployments.
* TODO(mattklein123): This can be removed when version 1.20.0 is no longer supported.
*/
virtual bool enableReusePortDefault() PURE;
};

} // namespace Server
Expand Down
30 changes: 13 additions & 17 deletions envoy/server/listener_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,6 @@ class LdsApi {

using LdsApiPtr = std::unique_ptr<LdsApi>;

struct ListenSocketCreationParams {
ListenSocketCreationParams(bool bind_to_port, bool duplicate_parent_socket = true)
: bind_to_port(bind_to_port), duplicate_parent_socket(duplicate_parent_socket) {}

// For testing.
bool operator==(const ListenSocketCreationParams& rhs) const;
bool operator!=(const ListenSocketCreationParams& rhs) const;

// whether to actually bind the socket.
bool bind_to_port;
// whether to duplicate socket from hot restart parent.
bool duplicate_parent_socket;
};

/**
* Factory for creating listener components.
*/
Expand All @@ -63,19 +49,29 @@ class ListenerComponentFactory {
virtual LdsApiPtr createLdsApi(const envoy::config::core::v3::ConfigSource& lds_config,
const xds::core::v3::ResourceLocator* lds_resources_locator) PURE;

enum class BindType {
// The listener will not bind.
NoBind,
// The listener will bind a socket shared by all workers.
NoReusePort,
// The listener will use reuse_port sockets independently on each worker.
ReusePort
};

/**
* Creates a socket.
* @param address supplies the socket's address.
* @param socket_type the type of socket (stream or datagram) to create.
* @param options to be set on the created socket just before calling 'bind()'.
* @param params used to control how a socket being created.
* @param bind_type supplies the bind type of the listen socket.
* @param worker_index supplies the socket/worker index of the new socket.
* @return Network::SocketSharedPtr an initialized and potentially bound socket.
*/
virtual Network::SocketSharedPtr
createListenSocket(Network::Address::InstanceConstSharedPtr address,
Network::Socket::Type socket_type,
const Network::Socket::OptionsSharedPtr& options,
const ListenSocketCreationParams& params) PURE;
const Network::Socket::OptionsSharedPtr& options, BindType bind_type,
uint32_t worker_index) PURE;

/**
* Creates a list of filter factories.
Expand Down
4 changes: 1 addition & 3 deletions envoy/server/worker.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,8 @@ class Worker {
/**
* Completion called when a listener has been added on a worker and is listening for new
* connections.
* @param success supplies whether the addition was successful or not. FALSE can be returned
* when there is a race condition between bind() and listen().
*/
using AddListenerCompletion = std::function<void(bool success)>;
using AddListenerCompletion = std::function<void()>;

/**
* Add a listener to the worker and replace the previous listener if any. If the previous listener
Expand Down
1 change: 0 additions & 1 deletion examples/udp/envoy.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
static_resources:
listeners:
- name: listener_0
reuse_port: true
address:
socket_address:
protocol: UDP
Expand Down
Loading

0 comments on commit ba474ac

Please sign in to comment.