Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update durable queues outside of Khepri transactions (backport #10742) #10807

Merged
merged 2 commits into from
Mar 20, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Mar 20, 2024

@mkuratczyk found a bug that happens when the khepri_db feature flag is enabled and a vhost fails to recover. Specifically this block:

rabbit_amqqueue:mark_local_durable_queues_stopped(VHost),
rabbit_log:error("Unable to recover vhost ~tp data. Reason ~tp~n"
" Stacktrace ~tp",
[VHost, Reason, Stacktrace]),
{stop, Reason}

The rabbit_amqqueue:mark_local_durable_queues_stopped/1 call fails when Khepri is enabled because it tries to create a transaction function which Khepri disallows - it finds queues which are not alive (via an RPC call) and marks them as stopped. That's unsafe to do in a Khepri transaction since it would be executed on each node and the value of node() would change, and side effects like RPC calls would be repeated by each Khepri cluster member. So that function currently errors. (For example start a 3.13.0 broker, enable khepri_db and execute rabbit_amqqueue:mark_local_durable_queues_stopped(<<"/">>) - this will error out).

We can work around this by using Khepri's advanced API to get the version number in the database of each queue, filter and update each queue and then use a transaction to apply the updates. This way we get transaction-like behavior of atomically updating all queues or none without the restrictions of Khepri transaction functions for FilterFun and UpdateFun.


This is an automatic backport of pull request #10742 done by Mergify.

`rabbit_db_queue:update_durable/2`'s caller
(`rabbit_amqqueue:mark_local_durable_queues_stopped`/1) passes a filter
function that performs some operations that aren't allowed within
Khepri transactions like looking up and using the current node and
executing an RPC. Calling
`rabbit_amqqueue:mark_local_durable_queues_stopped/1` on a Rabbit with
the `khepri_db` feature flag enabled will result in an error.

We can safely update a number of queues by using Khepri's
`khepri_adv:get_many/3` advanced API which returns the internal version
number of each queue. We can filter and update the queues outside of
a transaction function and then perform all updates at once, failing if
any queue has changed since the `khepri_adv:get_many/3` query. So we
get the main benefits of a transaction but we can still execute any
update or filter function.

(cherry picked from commit 091d74c)
@the-mikedavis the-mikedavis merged commit 289b110 into v3.13.x Mar 20, 2024
16 checks passed
@the-mikedavis the-mikedavis deleted the mergify/bp/v3.13.x/pr-10742 branch March 20, 2024 14:57
@the-mikedavis the-mikedavis added this to the 3.13.1 milestone Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant