[Other] A quorum queue that was deleted and re-declared under the same name when one of the nodes was stopped, won't be able to "reconcile" and recover #13131

gomoripeti · 2025-01-22T23:29:02Z

gomoripeti
Jan 22, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy

RabbitMQ version used

4.0.5 and latest main

How is RabbitMQ deployed?

Debian package

Steps to reproduce the behavior in question

We observed on multiple production clusters that during a rolling restart of a RabbitMQ cluster during an upgrade 3.13.7 -> 4.0.5 several quorum queues crashed.

As it turns out this behaviour is not related to an upgrade and those versions and is reproducible on latest main.

A quorum queue which already taken a snapshot is deleted and re-declared with the same name while one of the member nodes is stopped. After the stopped node is started and tries to sync the leader crashes with the bellow reason. The leaders crash on both nodes (which were not restarted) until the supervisor gives up leaving the queue on a mostly dead state.
The direct reason is that the restarted member has a higher index than the re-declared leader, and the leader does not have a snapshot https://github.com/rabbitmq/ra/blob/v2.15.1/src/ra_server.erl#L1963-L1965

handle_leader err {'EXIT',
                      {{case_clause,undefined},
                       [{ra_server,make_rpc_effect,5,
                            [{file,"src/ra_server.erl"},{line,1965}]},
                        {ra_server,'-make_rpcs_for/2-fun-0-',3,
                            [{file,"src/ra_server.erl"},{line,1945}]},
                        {maps,fold_1,4,[{file,"maps.erl"},{line,416}]},
                        {ra_server,make_all_rpcs,1,
                            [{file,"src/ra_server.erl"},{line,1936}]},
                        {ra_server,handle_leader,2,
                            [{file,"src/ra_server.erl"},{line,795}]},
                        {ra_server_proc,handle_leader,2,
                            [{file,"src/ra_server_proc.erl"},{line,1157}]},

Reproduction

start a 3 node cluster in a cloned rabbitmq-server git repo (tested with main @ 289da1d, and 4.0.x)

make start-cluster TEST_TMPDIR=/tmp/rabbitmq RABBITMQ_ENABLED_PLUGINS=

declare a quorum queue with leader on rabbit-1

sbin/rabbitmqctl eval 'rabbit_amqqueue:declare({resource,<<"/">>,queue,<<"qq1">>}, true, false, [{<<"x-queue-type">>,longstr,<<"quorum">>}], none, <<"guest">>).' --node rabbit-1

publish 10_000 messages (I used an erlang client)

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ leader     │ voter      │ 10003          │ 10003        │ 10003        │ 10003        │ -1             │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ follower   │ voter      │ 10003          │ 10003        │ 10003        │ 10003        │ -1             │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ follower   │ voter      │ 10003          │ 10003        │ 10003        │ 10003        │ -1             │ 1    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

purge the queue (this should trigger a snapshot)

sbin/rabbitmqctl eval '{ok, Q} = rabbit_amqqueue:lookup(rabbit_misc:r(<<"/">>, queue, <<"qq1">>)), rabbit_queue_type:purge(Q).' --node rabbit-1

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ leader     │ voter      │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ follower   │ voter      │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ follower   │ voter      │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

drain and stop_app rabbit-3 (this simulates a graceful shutdown)

sbin/rabbitmq-upgrade drain --node rabbit-3
sbin/rabbitmqctl stop_app --node rabbit-3

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ leader     │ voter      │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ follower   │ voter      │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ noproc     │ unknown    │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

delete queue (visible in logs that it fails to delete member on stopped node)

sbin/rabbitmqctl eval '{ok, Q} = rabbit_amqqueue:lookup(rabbit_misc:r(<<"/">>, queue, <<"qq1">>)), rabbit_amqqueue:delete(Q,false,false,<<"dummy_user">>).' --node rabbit-1
{ok,0}

redeclare queue with same name (fails to start a member on the stopped node but queue is still created successfully)

sbin/rabbitmqctl eval 'rabbit_amqqueue:declare({resource,<<"/">>,queue,<<"qq1">>}, true, false, [{<<"x-queue-type">>,longstr,<<"quorum">>}], none, <<"guest">>).' --node rabbit-1

23:59:51.155 [error] ra: failed to start a server %{id: {:"%2F_qq1", :"rabbit-3@Peters-MBP"},
...

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ leader     │ voter      │ 2              │ 2            │ 2            │ 2            │ -1             │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ follower   │ voter      │ 2              │ 2            │ 2            │ 2            │ -1             │ 1    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ noproc     │ unknown    │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

transfer leadership to rabbit-2 to bump term on online nodes (just to make sure rabbit-1 is not elected as a leader)

sbin/rabbitmqctl eval '{ok, Q} = rabbit_amqqueue:lookup(rabbit_misc:r(<<"/">>, queue, <<"qq1">>)), rabbit_quorum_queue:transfer_leadership(Q, node()).' --node rabbit-2

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ follower   │ voter      │ 3              │ 3            │ 3            │ 3            │ -1             │ 2    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ leader     │ voter      │ 3              │ 3            │ 3            │ 3            │ -1             │ 2    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ noproc     │ unknown    │ 10004          │ 10004        │ 10004        │ 10004        │ 10004          │ 1    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

start_app on rabbit-3

sbin/rabbitmqctl start_app --node rabbit-3

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬─────────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State      │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼─────────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ follower        │ voter      │ 3              │ 3            │ 3            │ 3            │ -1             │ 2    │ 5               │
├─────────────────────┼─────────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ leader          │ voter      │ 3              │ 3            │ 3            │ 3            │ -1             │ 2    │ 5               │
├─────────────────────┼─────────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ await_condition │ voter      │ 10004          │ 10004        │ 10004        │ 3            │ 10004          │ 2    │ 5               │
└─────────────────────┴─────────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

after a while...

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ follower   │ voter      │ 3              │ 3            │ 3            │ 3            │ -1             │ 2    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ leader     │ voter      │ 3              │ 3            │ 3            │ 3            │ -1             │ 2    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ follower   │ voter      │ 10004          │ 10004        │ 10004        │ 3            │ 10004          │ 2    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

and finally...

sbin/rabbitmq-queues quorum_status qq1 --node rabbit-1
Status of quorum queue qq1 on node rabbit-1@Peters-MBP ...
┌─────────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name           │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@Peters-MBP │ noproc     │ unknown    │ 8              │ 8            │ 8            │ 8            │ -1             │ 7    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@Peters-MBP │ noproc     │ unknown    │ 6              │ 6            │ 6            │ 6            │ -1             │ 5    │ 5               │
├─────────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@Peters-MBP │ pre_vote   │ voter      │ 10004          │ 10004        │ 10004        │ 7            │ 10004          │ 7    │ 5               │
└─────────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘

Expected behaviour

Ideally newer incarnations of the same quorum queue could detect an old incarnation member that tries to re-join and reject it (restart it?).
But more practically it would be nice if the queue would at least remain available after such steps (ie handle the case of missing entry and no snapshot)

Logs of repro:
[email protected]
[email protected]
[email protected]
.

michaelklishin · 2025-01-22T23:41:16Z

michaelklishin
Jan 22, 2025
Maintainer

@gomoripeti the Raft protocol is very explicit about what happens when an old leader comes online and discovers the new leader. It does not take (Raft) cluster deletion into account, all "incarnations" are tracked using the leader term. When a QQ is deleted and re-declared, the leader term is reset and the Raft algorithm does not assume this can happen.

As things stand right now, a leader that restarts and encounters a different QQ under the same name has no way of detecting that the Raft cluster has changed.

Quorum queues are long lived, so this should not affect a lot of deployments.

0 replies

kjnilsson · 2025-01-23T09:18:49Z

kjnilsson
Jan 23, 2025
Maintainer

@gomoripeti I think we discussed a similar thing in another issue. I think the simplest fix would be to persist (in the queue record) not just a list of the nodes for a given queue but also their local UIDs such that the recovery phase can detect if a locally found member is actually part of the cluster and delete and re-create as appropriate.

Amending Ra to handle this would require an extension to the replication protocol which is quite a tricky thing to roll out and ideally I'd like to avoid it if possible.

0 replies

gomoripeti · 2025-01-30T14:14:49Z

gomoripeti
Jan 30, 2025
Author

thanks for your answers!

The user did not get back to us with what is their use case for deleting and redeclaring quorum queues. They are not doing this continuously (thansfully) but I don't know how unlucky they are that during each 3 restarts of a 3 node rolling upgrade they had multiple queues deleted and redeclared. I can only guess they somehow do this when a client is disconnected and reconnects (why?)

I'm not suggesting to implement either of the bellow but more of thought experiments:
Regarding the "ideal" solution I wonder if it could make use of the fact that when a raft member joins the first time (it is seen the first time connected by other members) it cannot have a "mature" state (eg cannot be in "voter" membership but some other status that represents a freshly starting member like promotable, non_voter or a new thing. Or maybe it cannot have any committed index entries) So if a leader detects such a member with a "mature" state it is rejected to join.

Amending Ra to handle this would require an extension to the replication protocol

what would be the extension? a new message type between the ra_servers or additional info in some existing message? (I'm not familiar with the details of Ra, just asking because of curiosity)

I realise that if two members of a raft cluster from two different incarnations join each other that can lead to all kinds of confusion (not just crash) and their states have nothing to do with each other. We can say that RabbitMQ does not support such situation, so can this be disallowed. Could this be prevented from outside Ra eg by the metadata store? That a QQ that couldn't delete all members when deleted cannot be redeclared until the missing rabbitmq node comes online and does the cleanup (or is forgotten from the rabbitmq cluster)? (Eg if some members fail to terminate, a tombstone is left in the metadata store). Similarly to when Khepri does not have quorum it is not possible to declare queues. This disallowing might be inconvenient for users but quorum queues should be long lived, so this should not affect a lot of deployments.

2 replies

kjnilsson Jan 30, 2025
Maintainer

what's wrong with my proposed solution above? It seems simple enough to be deliverable in the next minor and doesn't add any additional restrictions. Extending the Ra inter node protocol is distinctly non-trivial so that is why I don't favour that approach.

gomoripeti Jan 30, 2025
Author

sorry, I misunderstood, I thought the two are one solution that you want to avoid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Other] A quorum queue that was deleted and re-declared under the same name when one of the nodes was stopped, won't be able to "reconcile" and recover #13131

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Other] A quorum queue that was deleted and re-declared under the same name when one of the nodes was stopped, won't be able to "reconcile" and recover #13131

gomoripeti Jan 22, 2025

Community Support Policy

RabbitMQ version used

How is RabbitMQ deployed?

Steps to reproduce the behavior in question

Replies: 3 comments · 2 replies

michaelklishin Jan 22, 2025 Maintainer

kjnilsson Jan 23, 2025 Maintainer

gomoripeti Jan 30, 2025 Author

kjnilsson Jan 30, 2025 Maintainer

gomoripeti Jan 30, 2025 Author

gomoripeti
Jan 22, 2025

Replies: 3 comments 2 replies

michaelklishin
Jan 22, 2025
Maintainer

kjnilsson
Jan 23, 2025
Maintainer

gomoripeti
Jan 30, 2025
Author

kjnilsson Jan 30, 2025
Maintainer

gomoripeti Jan 30, 2025
Author