[Other] A quorum queue that was deleted and re-declared under the same name when one of the nodes was stopped, won't be able to "reconcile" and recover #13131
Replies: 3 comments 2 replies
-
@gomoripeti the Raft protocol is very explicit about what happens when an old leader comes online and discovers the new leader. It does not take (Raft) cluster deletion into account, all "incarnations" are tracked using the leader term. When a QQ is deleted and re-declared, the leader term is reset and the Raft algorithm does not assume this can happen. As things stand right now, a leader that restarts and encounters a different QQ under the same name has no way of detecting that the Raft cluster has changed. Quorum queues are long lived, so this should not affect a lot of deployments. |
Beta Was this translation helpful? Give feedback.
-
@gomoripeti I think we discussed a similar thing in another issue. I think the simplest fix would be to persist (in the queue record) not just a list of the nodes for a given queue but also their local UIDs such that the recovery phase can detect if a locally found member is actually part of the cluster and delete and re-create as appropriate. Amending Ra to handle this would require an extension to the replication protocol which is quite a tricky thing to roll out and ideally I'd like to avoid it if possible. |
Beta Was this translation helpful? Give feedback.
-
thanks for your answers! The user did not get back to us with what is their use case for deleting and redeclaring quorum queues. They are not doing this continuously (thansfully) but I don't know how unlucky they are that during each 3 restarts of a 3 node rolling upgrade they had multiple queues deleted and redeclared. I can only guess they somehow do this when a client is disconnected and reconnects (why?) I'm not suggesting to implement either of the bellow but more of thought experiments:
what would be the extension? a new message type between the ra_servers or additional info in some existing message? (I'm not familiar with the details of Ra, just asking because of curiosity) I realise that if two members of a raft cluster from two different incarnations join each other that can lead to all kinds of confusion (not just crash) and their states have nothing to do with each other. We can say that RabbitMQ does not support such situation, so can this be disallowed. Could this be prevented from outside Ra eg by the metadata store? That a QQ that couldn't delete all members when deleted cannot be redeclared until the missing rabbitmq node comes online and does the cleanup (or is forgotten from the rabbitmq cluster)? (Eg if some members fail to terminate, a tombstone is left in the metadata store). Similarly to when Khepri does not have quorum it is not possible to declare queues. This disallowing might be inconvenient for users but quorum queues should be long lived, so this should not affect a lot of deployments. |
Beta Was this translation helpful? Give feedback.
-
Community Support Policy
RabbitMQ version used
4.0.5 and latest main
How is RabbitMQ deployed?
Debian package
Steps to reproduce the behavior in question
We observed on multiple production clusters that during a rolling restart of a RabbitMQ cluster during an upgrade 3.13.7 -> 4.0.5 several quorum queues crashed.
As it turns out this behaviour is not related to an upgrade and those versions and is reproducible on latest main.
A quorum queue which already taken a snapshot is deleted and re-declared with the same name while one of the member nodes is stopped. After the stopped node is started and tries to sync the leader crashes with the bellow reason. The leaders crash on both nodes (which were not restarted) until the supervisor gives up leaving the queue on a mostly dead state.
The direct reason is that the restarted member has a higher index than the re-declared leader, and the leader does not have a snapshot https://github.com/rabbitmq/ra/blob/v2.15.1/src/ra_server.erl#L1963-L1965
Reproduction
after a while...
and finally...
Expected behaviour
Ideally newer incarnations of the same quorum queue could detect an old incarnation member that tries to re-join and reject it (restart it?).
But more practically it would be nice if the queue would at least remain available after such steps (ie handle the case of missing entry and no snapshot)
Logs of repro:
[email protected]
[email protected]
[email protected]
.
Beta Was this translation helpful? Give feedback.
All reactions