Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A mechanism to delay client connection listener startup #13153

Open
michaelklishin opened this issue Jan 24, 2025 · 0 comments
Open

A mechanism to delay client connection listener startup #13153

michaelklishin opened this issue Jan 24, 2025 · 0 comments
Milestone

Comments

@michaelklishin
Copy link
Member

Is your feature request related to a problem? Please describe.

In certain scenarios, most (in)famously on Kubernetes with a sequential podManagementPolicy, nodes can start but fail to sync its metadata store tables from peers.

This can affect all kinds of environments, for example, #3837 and #13151 are examples where the same scenario looks probable, even though we do not have conclusive evidence (or much information about the environment in general).

This can also be a matter of timing: that is, a node does begin syncing its metadata store with peers upon restart but it takes a certain amount of time, even if often this would be fractions of a second,
all while a client connection or CLI tools try to perform an operation that cannot possibly
succeed without all metadata store tables being in place.

Client listeners were moved to the very last step of the boot step graph, and even that does not
help in certain environments. Unless

Describe the solution you'd like

A while ago, probably in late 2023, @dumbbell and I have discussed this and agreed to the following feature:
a configurable startup delay for client listeners.

By default, the delay will be set to 0 to not delay node startup and not affect, for example, every integration test suite in both RabbitMQ itself and its client libraries, various tools that start a RabbitMQ node, and so on.

To introduce a non-zero delay, the user will have to opt-in like so:

# a delay of ten seconds, applied once for all listeners on the node
listeners.startup_delay = 10

Describe alternatives you've considered

There aren't that many alternatives.

We could consider checking for metadata table/tree presence before starting the listeners but the risk of
false positives will still exist.

An opt-in delay is very easy to reason about, as we have seen in other parts of RabbitMQ (peer discovery, the Raft algorithm's randomized delay during leader elections).

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant