CQ: Merge lazy/default behavior into a unified mode #4522

lhoguin · 2022-04-11T13:08:26Z

No longer reduce memory usage as well (except an explicit GC that I am pondering about removing).

Beyond simplifying the implementation, this branch also provides better performance than either default or lazy modes.

See this comment before reviewing/merging: #4522 (comment)

mergify · 2022-06-03T09:55:12Z

This pull request modifies the erlang.mk build only. If it is a deps update or PROJECT_ENV change, remember to sync any changes to the bazel files.

lhoguin · 2022-08-01T11:10:25Z

There's a very rare crash that remains and I am still hunting for. Almost all issues are fixed. After that, clean up, and prepare to merge to master.

Makefile

deps/rabbit/src/rabbit_amqqueue_process.erl

deps/rabbit/src/rabbit_classic_queue_store_v2.erl

deps/rabbit/test/backing_queue_SUITE.erl

lhoguin · 2022-09-08T11:00:46Z

At least some of the test failures are legitimate. It seems this broke something related to mirrored queues. Investigating...

lhoguin · 2022-09-09T12:24:55Z

OK now that CI is green I am opening this for review. The goal is to have this as part of 3.12 so it would make sense to merge it soon after 3.11 gets released. I will work on updating the CQ documentation, especially considering this removes default/lazy distinction (they can still be configured but they now act the same) and a couple settings no longer do anything (because CQs no longer reduce memory usage).

@mkuratczyk now would be a great time to do a compare of 3.11 against this branch (it was rebased earlier this week).

No longer reduce memory usage as well (except an explicit GC that I am pondering about removing).

Gets rid of an explicit GC that might have caused slower performance than v2-lazy on master.

The rabbit_msg_store_flying ets table runs into lock trouble with large fan outs. This should help.

The rabbit_msg_store_cur_file ets table runs into lock trouble with large fan outs. This should help.

This removes the use of delayed_write and instead uses an internal buffer that's also used as a cache, similar to how the index works. The offset and size of messages in the file are calculated using erlang:external_size/1 and as a result the files may be a little bigger than before, but they should not be significantly, especially considering messages are mostly made of atoms and binaries. The performance is boosted by around 10% on my machine as a result of these changes.

Brings the behavior in line with QQs and streams.

This is an attempt to fix a race condition.

Also always check the CRC32 even if not currently configured to do so, if the CRC is available in the data.

This callback was removed in a previous commit and was only used for bump_reduce_memory_use.

This just restores behavior that was there before via reduce_memory_use. I am not sure if it is of any use but it doesn't hurt to have it.

When purging the queue we want to read the maximum number of messages from disk (2048) because these messages will quickly be gone. Using the outgoing rate could end up making us read 1 message at a time which makes the performance much worse.

Somehow the CQ changes made one of the test in this suite fail with the wrong message count. This is in essence a followup to d5e81c9 which already added a timeout to other tests in the suite.

CQs without consumers will have only one message in memory.

Since ram_pending_acks is now a map the test must support both map and gb_trees to continue working. Also updated the state to reflect a field name change.

michaelklishin

In my tests this CQv2 implementation has demonstrated impressive memory footprint stability with several workloads, including when there are no consumers at all or when consumers recover but are yet outpaced by producers, or when there is a 10M backlog of messages and consumers recover and catch up.

Throughput numbers and node memory footprint are comparable to those of CQv2 in main but memory stability seems to be improved further.

lhoguin · 2022-10-03T07:54:18Z

Thanks!

essen force-pushed the loic-cq-dont-reduce-memory-usage branch 2 times, most recently from 5fb04da to fd21f4a Compare May 11, 2022 13:11

mergify bot added the make label Jun 3, 2022

essen force-pushed the loic-cq-dont-reduce-memory-usage branch 2 times, most recently from 1cb78e0 to 5d9713a Compare June 8, 2022 14:56

essen force-pushed the loic-cq-dont-reduce-memory-usage branch from fd1d7f1 to 1db83a9 Compare July 22, 2022 09:59

lhoguin commented Sep 5, 2022

View reviewed changes

essen force-pushed the loic-cq-dont-reduce-memory-usage branch from 64328c6 to 3c1c83f Compare September 6, 2022 10:15

mergify bot added the bazel label Sep 9, 2022

lhoguin marked this pull request as ready for review September 9, 2022 12:25

lukebakken requested review from michaelklishin and kjnilsson September 11, 2022 00:07

lukebakken assigned lhoguin Sep 11, 2022

lukebakken requested a review from mkuratczyk September 11, 2022 00:07

essen force-pushed the loic-cq-dont-reduce-memory-usage branch 2 times, most recently from 85397f9 to a7d5444 Compare September 22, 2022 07:41

michaelklishin added this to the 3.12.0 milestone Sep 22, 2022

essen force-pushed the loic-cq-dont-reduce-memory-usage branch from 119e85a to a7d5444 Compare September 23, 2022 13:49

lhoguin added 8 commits September 27, 2022 12:00

CQ: Merge lazy/default behavior into a unified mode

341e908

No longer reduce memory usage as well (except an explicit GC that I am pondering about removing).

WIP putting back flush in reduce_memory_usage and other tweaks

892db25

WIP Remove unused callback

1aed8c4

Don't force writing to disk in publish_delivered1

397b051

When DeltaCount = 0 we can look at Len directly

42e5411

Don't reduce memory use

ace4987

Gets rid of an explicit GC that might have caused slower performance than v2-lazy on master.

Remove a missed distinction between default/lazy

a5bf555

Don't use q1 q2 and q4 at all anymore

ea0f4c1

lhoguin and others added 21 commits September 27, 2022 12:00

CQ: Enable read/write concurrency for old msg store ets

615e667

The rabbit_msg_store_flying ets table runs into lock trouble with large fan outs. This should help.

CQ: Enable read/write concurrency for old msg store ets

2b291b1

The rabbit_msg_store_cur_file ets table runs into lock trouble with large fan outs. This should help.

CQv2: Always do the CRC32 check if it was computed on write

3b8ee13

Brings the behavior in line with QQs and streams.

CQv2 store: Use raw for file:write_file for the file header

649ebbb

This is an attempt to fix a race condition.

CQv2: Read many messages at once from v2 store when possible

432f8d2

CQv2: Fix property suite

962cc0a

Also always check the CRC32 even if not currently configured to do so, if the CRC is available in the data.

CQv2: Sync/handle confirms before conversion

f3963a5

CQv2: Small fixes of and via the property suite

8051b00

CQ: Some cleanup

723cc54

CQ: Some more cleanup

23f1346

CQ: Enable borken checks in backing_queue_SUITE again

f590201

CQ: Fix test compilation error following rebase

f1ae007

CQ: Remove a couple more unneded callbacks

1d7ce62

This callback was removed in a previous commit and was only used for bump_reduce_memory_use.

CQ: Make the resume/1 function sync to disk

0e0635f

This just restores behavior that was there before via reduce_memory_use. I am not sure if it is of any use but it doesn't hurt to have it.

system_SUITE: wait for messages to be queued

ef25732

Somehow the CQ changes made one of the test in this suite fail with the wrong message count. This is in essence a followup to d5e81c9 which already added a timeout to other tests in the suite.

rabbit_prometheus_http_SUITE: Update tests for new CQs

73dd0ac

CQs without consumers will have only one message in memory.

CQ: Update shards count for the property suite

69efad9

CQ: Fix channel_operation_timeout_SUITE mixed versions

e09cbeb

Since ram_pending_acks is now a map the test must support both map and gb_trees to continue working. Also updated the state to reflect a field name change.

CQ: Update long description at the top of the module

1eb1710

essen force-pushed the loic-cq-dont-reduce-memory-usage branch from a7d5444 to 1eb1710 Compare September 27, 2022 10:00

lhoguin requested review from michaelklishin and removed request for michaelklishin and kjnilsson September 27, 2022 10:10

michaelklishin approved these changes Oct 1, 2022

View reviewed changes

michaelklishin merged commit 69b06d3 into main Oct 1, 2022

michaelklishin deleted the loic-cq-dont-reduce-memory-usage branch October 1, 2022 16:11

Ayanda-D mentioned this pull request Mar 28, 2024

Remove deprecated queue_explicit_gc_run_operation_threshold config #10880

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQ: Merge lazy/default behavior into a unified mode #4522

CQ: Merge lazy/default behavior into a unified mode #4522

lhoguin commented Apr 11, 2022 •

edited by lukebakken

Loading

mergify bot commented Jun 3, 2022

lhoguin commented Aug 1, 2022

lhoguin commented Sep 8, 2022

lhoguin commented Sep 9, 2022

michaelklishin left a comment

lhoguin commented Oct 3, 2022

CQ: Merge lazy/default behavior into a unified mode #4522

CQ: Merge lazy/default behavior into a unified mode #4522

Conversation

lhoguin commented Apr 11, 2022 • edited by lukebakken Loading

mergify bot commented Jun 3, 2022

lhoguin commented Aug 1, 2022

lhoguin commented Sep 8, 2022

lhoguin commented Sep 9, 2022

michaelklishin left a comment

Choose a reason for hiding this comment

lhoguin commented Oct 3, 2022

lhoguin commented Apr 11, 2022 •

edited by lukebakken

Loading