Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add force checkpoint functions for quorum queues and command line tool #13175

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aaron-seo
Copy link
Contributor

@aaron-seo aaron-seo commented Jan 29, 2025

Proposed Changes

Addresses #13137.

Adds functions to force checkpoint for a quorum queue or for all quorum queues matching a VhostSpec and QueueSpec.

Also adds a CLI tool to invoke that function.

Hand-in-hand with doc change PR: rabbitmq/rabbitmq-website#2175

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)
  • Build system and/or CI

Checklist

Put an x in the boxes that apply.
You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.

Further Comments

The test case for force_checkpoint_on_queue is a bit sensitive to changes in rabbit_fifo:aux... and rabbit_fifo:checkpoint records. Please feel free to suggest a different testing approach.

Some local testing for CLI tool

gmake run-broker RABBITMQ_NODENAME="force_checkpoint2"

Create these queues and vhost:

aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmqctl -n force_checkpoint2 list_queues --vhost test name,type
Timeout: 60.0 seconds ...
Listing queues for vhost test ...
name    type
qwe     quorum
qq1     quorum
qq2     quorum
cq1     classic
aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmqctl -n force_checkpoint2 list_queues name,type
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name    type
qq2     quorum
qwe     quorum
qq1     quorum
cq1     classic
aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint
Forcing checkpoint for all matching quorum queues...
vhost   name    result
/       qq2     ok
/       qwe     ok
/       qq1     ok
test    qwe     ok
test    qq1     ok
test    qq2     ok

aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint --queue-pattern "qq.*"
Forcing checkpoint for all matching quorum queues...
vhost   name    result
/       qq2     ok
/       qq1     ok
test    qq1     ok
test    qq2     ok

aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint --vhost-pattern "test"
Forcing checkpoint for all matching quorum queues...
vhost   name    result
test    qwe     ok
test    qq1     ok
test    qq2     ok

Some local testing for checkpoint value

  • gmake run-broker RABBITMQ_NODENAME="force_checkpoint2"
  • Create a quorum queue: testqq.
  • observer:start() --> Applications --> ra --> %2F_testqq --> State -->
aux =>
           {aux_v3,'%2F_testqq',
               {empty,true},
               {inactive,-576460727567095,1,1.0},
               {aux_gc,0},
               <0.1644.0>,#{},
               {checkpoint,0,1738181974685,2,0,4096,[]}},
  • Send some messages to testqq
  • Observe no change in checkpoint.
  • Force checkpoints
(force_checkpoint2@b0f1d85a952f)2> rabbit_quorum_queue:force_checkpoint(".*", ".*").
[{{resource,<<"/">>,queue,<<"qq2">>},{ok}},
 {{resource,<<"/">>,queue,<<"qwe">>},{ok}},
 {{resource,<<"/">>,queue,<<"testqq">>},{ok}},
 {{resource,<<"/">>,queue,<<"qq1">>},{ok}},
 {{resource,<<"test">>,queue,<<"qwe">>},{ok}},
 {{resource,<<"test">>,queue,<<"qq1">>},{ok}},
 {{resource,<<"test">>,queue,<<"qq2">>},{ok}}]
(force_checkpoint2@b0f1d85a952f)3>

-- Observe change in aux => checkpoint

       aux =>
           {aux_v3,'%2F_testqq',
               {empty,false},
               {inactive,-576460749650951,1,1.0},
               {aux_gc,0},
               <0.1278.0>,#{},
               {checkpoint,15,1738182269854,4,4,4096,[]}},
               ```

@aaron-seo
Copy link
Contributor Author

aaron-seo commented Jan 29, 2025

checking failed test case (edit: some issue with meck usage, fixing now)

deps/rabbit/src/rabbit_quorum_queue.erl Outdated Show resolved Hide resolved
{error, classic_queue_not_supported};
{ok, Q} when ?amqqueue_is_quorum(Q) ->
{RaName, _} = amqqueue:get_pid(Q),
rpc:call(Node, ra, cast_aux_command, [{RaName, Node}, force_checkpoint]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a call with an implicit timeout (most likely of 5s). Timeouts lower than 15s are very likely to cause false positives sooner or later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look to see how to invoke the aux command with a specified timeout of 15s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to find a way to increase the implicit timeout, would appreciate any pointers here :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also rabbit_misc:rpc_call/5 but I doubt it is very relevant on Erlang 26+.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, by "implicit timeout" I misunderstood and thought you were mentioning a timeout to do within ra and not rpc:call/5.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was referring to the typical OTP timeout when it comes to rpc calls.

{ok, Q} when ?amqqueue_is_quorum(Q) ->
{RaName, _} = amqqueue:get_pid(Q),
rpc:call(Node, ra, cast_aux_command, [{RaName, Node}, force_checkpoint]),
rabbit_log:debug("Sent command to force checkpoint ~ts", [QNameFmt]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to force ~ts to take a checkpoint "

deps/rabbit/src/rabbit_quorum_queue.erl Outdated Show resolved Hide resolved
deps/rabbit/test/quorum_queue_SUITE.erl Outdated Show resolved Hide resolved
use RabbitMQ.CLI.Core.RequiresRabbitAppRunning
use RabbitMQ.CLI.Core.AcceptsNoPositionalArguments

def run([], %{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned value can easily be formatted as JSON but formatter: "json" is not supported in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems formatter: "json" is supported (presumably via RabbitMQ.CLI.DefaultOutput
?)

aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint --formatter json
[
{"vhost":"/","name":"qq2","result":"ok"}
,{"vhost":"/","name":"qwe","result":"ok"}
,{"vhost":"/","name":"testqq","result":"ok"}
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tabular-looking data (so, lists of maps), yes, you can at least try DefaultOutput and see if there may be any reasons to override what it defines.

# Implementation
#

defp format_result({:ok}) do
Copy link
Member

@michaelklishin michaelklishin Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reinvent the output interface used extensively by all existing commands.

:ok, {:error, :timeout}, {:error, _} are all handled by RabbitMQ.CLI.DefaultOutput, including JSON formatting of certain common returned values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom format_result does have a cosmetic effect, adopted from a couple other commands (namely grow and shrink). For example

With just RabbitMQ.CLI.DefaultOutput:

aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint
Forcing checkpoint for all matching quorum queues...
vhost   name    result
/       qq2     {ok}
/       qwe     {ok}
/       testqq  {ok}
/       qq1     {ok}
test    qwe     {ok}
test    qq1     {ok}
test    qq2     {ok}

...
aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint --formatter json
[
{"vhost":"/","name":"qq2","result":["ok"]}
,{"vhost":"/","name":"qwe","result":["ok"]}
,{"vhost":"/","name":"testqq","result":["ok"]}
...

With the custom format_result:

Forcing checkpoint for all matching quorum queues...
vhost   name    result
/       qq2     ok
/       qwe     ok
/       testqq  ok
/       qq1     ok
test    qwe     ok
test    qq1     ok
test    qq2     ok

...
aaronseo: ~/workplace/rabbitMq/rabbitmq-server/sbin $ ./rabbitmq-queues -n force_checkpoint2 force_checkpoint --formatter json
[
{"vhost":"/","name":"qq2","result":"ok"}
,{"vhost":"/","name":"qwe","result":"ok"}
,{"vhost":"/","name":"testqq","result":"ok"}
...

The cosmetic changes are, I think, more beneficial with the error messages, too.

However, I'm also open to just going to the RabbitMQ.CLI.DefaultOutput formatting, if still found to be preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can override output/2 before the line where RabbitMQ.CLI.DefaultOutput is included, and rely on RabbitMQ.CLI.DefaultOutput as a catch-all for, say, rpc:call/4 error reporting.

Copy link
Contributor Author

@aaron-seo aaron-seo Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to forego with the custom format_result, I think the default output is a fine alternative, and allows better structuring of the error messages. Please lmk what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants