Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQ: Merge lazy/default behavior into a unified mode #4522

Merged
merged 39 commits into from
Oct 1, 2022

Conversation

lhoguin
Copy link
Contributor

@lhoguin lhoguin commented Apr 11, 2022

No longer reduce memory usage as well (except an explicit GC that I am pondering about removing).

Beyond simplifying the implementation, this branch also provides better performance than either default or lazy modes.

See this comment before reviewing/merging: #4522 (comment)

See #2832

@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch 2 times, most recently from 5fb04da to fd21f4a Compare May 11, 2022 13:11
@mergify mergify bot added the make label Jun 3, 2022
@mergify
Copy link

mergify bot commented Jun 3, 2022

This pull request modifies the erlang.mk build only. If it is a deps update or PROJECT_ENV change, remember to sync any changes to the bazel files.

@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch 2 times, most recently from 1cb78e0 to 5d9713a Compare June 8, 2022 14:56
@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch from fd1d7f1 to 1db83a9 Compare July 22, 2022 09:59
@lhoguin
Copy link
Contributor Author

lhoguin commented Aug 1, 2022

There's a very rare crash that remains and I am still hunting for. Almost all issues are fixed. After that, clean up, and prepare to merge to master.

Makefile Outdated Show resolved Hide resolved
deps/rabbit/src/rabbit_amqqueue_process.erl Outdated Show resolved Hide resolved
deps/rabbit/src/rabbit_classic_queue_store_v2.erl Outdated Show resolved Hide resolved
deps/rabbit/src/rabbit_classic_queue_store_v2.erl Outdated Show resolved Hide resolved
deps/rabbit/test/backing_queue_SUITE.erl Outdated Show resolved Hide resolved
deps/rabbit/test/backing_queue_SUITE.erl Outdated Show resolved Hide resolved
deps/rabbit/test/backing_queue_SUITE.erl Show resolved Hide resolved
deps/rabbit/test/backing_queue_SUITE.erl Outdated Show resolved Hide resolved
deps/rabbit/test/backing_queue_SUITE.erl Outdated Show resolved Hide resolved
@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch from 64328c6 to 3c1c83f Compare September 6, 2022 10:15
@lhoguin
Copy link
Contributor Author

lhoguin commented Sep 8, 2022

At least some of the test failures are legitimate. It seems this broke something related to mirrored queues. Investigating...

@mergify mergify bot added the bazel label Sep 9, 2022
@lhoguin
Copy link
Contributor Author

lhoguin commented Sep 9, 2022

OK now that CI is green I am opening this for review. The goal is to have this as part of 3.12 so it would make sense to merge it soon after 3.11 gets released. I will work on updating the CQ documentation, especially considering this removes default/lazy distinction (they can still be configured but they now act the same) and a couple settings no longer do anything (because CQs no longer reduce memory usage).

@mkuratczyk now would be a great time to do a compare of 3.11 against this branch (it was rebased earlier this week).

@lhoguin lhoguin marked this pull request as ready for review September 9, 2022 12:25
@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch 2 times, most recently from 85397f9 to a7d5444 Compare September 22, 2022 07:41
@michaelklishin michaelklishin added this to the 3.12.0 milestone Sep 22, 2022
@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch from 119e85a to a7d5444 Compare September 23, 2022 13:49
lhoguin and others added 21 commits September 27, 2022 12:00
The rabbit_msg_store_flying ets table runs into lock trouble
with large fan outs. This should help.
The rabbit_msg_store_cur_file ets table runs into lock trouble
with large fan outs. This should help.
This removes the use of delayed_write and instead uses an
internal buffer that's also used as a cache, similar to how
the index works. The offset and size of messages in the file
are calculated using erlang:external_size/1 and as a result
the files may be a little bigger than before, but they should
not be significantly, especially considering messages are
mostly made of atoms and binaries.

The performance is boosted by around 10% on my machine
as a result of these changes.
Brings the behavior in line with QQs and streams.
This is an attempt to fix a race condition.
Also always check the CRC32 even if not currently configured
to do so, if the CRC is available in the data.
This callback was removed in a previous commit and was only
used for bump_reduce_memory_use.
This just restores behavior that was there before via
reduce_memory_use. I am not sure if it is of any use
but it doesn't hurt to have it.
When purging the queue we want to read the maximum number
of messages from disk (2048) because these messages will
quickly be gone. Using the outgoing rate could end up
making us read 1 message at a time which makes the
performance much worse.
Somehow the CQ changes made one of the test in this suite
fail with the wrong message count. This is in essence a
followup to d5e81c9 which
already added a timeout to other tests in the suite.
CQs without consumers will have only one message in memory.
Since ram_pending_acks is now a map the test must support both
map and gb_trees to continue working. Also updated the state to
reflect a field name change.
@essen essen force-pushed the loic-cq-dont-reduce-memory-usage branch from a7d5444 to 1eb1710 Compare September 27, 2022 10:00
@lhoguin lhoguin requested review from michaelklishin and removed request for michaelklishin and kjnilsson September 27, 2022 10:10
Copy link
Member

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests this CQv2 implementation has demonstrated impressive memory footprint stability with several workloads, including when there are no consumers at all or when consumers recover but are yet outpaced by producers, or when there is a 10M backlog of messages and consumers recover and catch up.

Throughput numbers and node memory footprint are comparable to those of CQv2 in main but memory stability seems to be improved further.

@michaelklishin michaelklishin merged commit 69b06d3 into main Oct 1, 2022
@michaelklishin michaelklishin deleted the loic-cq-dont-reduce-memory-usage branch October 1, 2022 16:11
@lhoguin
Copy link
Contributor Author

lhoguin commented Oct 3, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants