-
Describe the bugAfter upgrade from 3.12.13 to 3.13.0 used disk space grows up. Average messages sum (count, bytes, publish, delivery) remained the same.
After restart of rabbitmq folder size drops to 500G (still too much) and then begins to grow again. Reproduction steps
Expected behaviorDirectory size =~ total size of all queues. Additional context |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 27 replies
-
@urusha thanks for using RabbitMQ. You have not provided enough information and are basically asking us to spend time guessing how to reproduce your issue.
Ideally, you would provide a docker compose project that represents your workload and starts with RabbitMQ 3.12. |
Beta Was this translation helpful? Give feedback.
-
@urusha I'm afraid the assumption that Quorum queues and streams can store a large amount of data on disk and there's nothing new in 3.13 compared to 3.12. Not only every message has protocol-level metadata, internal format metadata (which did change in 3.13, as release notes state) but that data is not deleted the moment a message is consumed. Specifically, quorum queues store the entire Raft log, including messages and other state machine transitions, until the moment when the oldest unacknowledged message in the log is consumed and acknowledged. Stuck or slow consumers can affect this a great deal, which is why a consumer delivery timeout exists: to make sure a stuck consumer does not prevent a quorum queue from reclaiming disk space. New clusters can use a different Raft WAL segment size. Changing this on existing clusters is dangerous and must not be attempted. Streams retain as much data as you configure. |
Beta Was this translation helpful? Give feedback.
-
The seesaw pattern on the chart above is large what I'd expect from quorum queues with many workloads, and streams as well. You have peaks and you have troughs. Few workloads result in a very even disk space use over time. Without inspecting node data directory using CQv2 may result in higher peaks but we do not see this in practice, or somehow I've missed those cases. @lhoguin and @mkuratczyk would know better than I do. |
Beta Was this translation helpful? Give feedback.
-
@urusha to confirm -
|
Beta Was this translation helpful? Give feedback.
-
Hello, since space is not being reclaimed it's possible there is an issue with GCing messages, which has changed in 3.13. Please set the log levels to |
Beta Was this translation helpful? Give feedback.
-
The original issue reported here was resolved, and I do not recall it being reported elsewhere since March. @urusha please use the recommendations above (from today) and start a new one if you have something else to report, with a detailed set of steps to reproduce. Most likely we will need a data directory from the node that fails to start. |
Beta Was this translation helpful? Give feedback.
-
Like I said above, a node that has run out of disk space won't always be able to recover. Overprovision free disk space, put adequate guardrails such as queue length limits and max message size in place. The issue with CQ compaction falling behind, reported in this thread in March, has been addressed and never reported again. A recovering node cannot know what was not written to disk, so there always will be scenarios where it won't be able to safely recover. If your node has run out of disk space, consider it to be ready to be replaced. |
Beta Was this translation helpful? Give feedback.
Thank you. This is consistent with what we are observing. I am working on a fix. For what it's worth I do not think it's related to hardware anymore, we have found a small piece of code that is very inefficient when there are many messages that make compactions much slower than they should be. So for the time being we will put back a limit on the number of compactions in-flight.