-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node can use significant (e.g. 80GB) amounts of RAM during startup #2254
Comments
For more context, see rabbitmq/rabbitmq-server#2254 Signed-off-by: Gerhard Lazu <[email protected]>
Since everything is going to disk, I believe that it will make a big difference for memory usage, and I also suspect that it will help with rabbitmq/rabbitmq-server#2254. Let's try it out! Signed-off-by: Gerhard Lazu <[email protected]>
For more context, see rabbitmq/rabbitmq-server#2254 Signed-off-by: Gerhard Lazu <[email protected]>
Since everything is going to disk, I believe that it will make a big difference for memory usage, and I also suspect that it will help with rabbitmq/rabbitmq-server#2254. Let's try it out! Signed-off-by: Gerhard Lazu <[email protected]>
After doing some investigation, I believe this is caused by the rebuilding of message refcounts by rabbit_msg_store. We can the disk asynchronously, and in a few limited tests, I saw that we scan and enqueue faster than we consume, resulting in a very large backlog of messages in memory. |
@pjk25 we already have a mechanism for the producer-consumer problem and unbounded buffers in RabbitMQ. It's called credit flow. It's a producer side solution so if we don't control the "producer side" of this process interaction it cannot be used but sounds like we do. No need for blocking puts, producers can self-limit using |
After further exploration, I have learned that the |
@michaelklishin thanks for the insight. The queues I were looking at were these rabbitmq-server/src/gatherer.erl Line 90 in f8d4797
|
It's worth noting that even though we shut down the broker with rabbitmq-server/src/rabbit_msg_store.erl Line 1735 in abc0b52
Also, the call to rabbitmq-server/src/rabbit_msg_store.erl Line 1737 in abc0b52
|
I've just tested this in the context of TGIR S01E03 and can confirm that #2274 fixes this. The memory usage doesn't go above 8GiB and even though the recovery time didn't improve (it still took ~45 minutes for the node to restart), the memory usage was perfect: the ETS data structures used the expected memory, all other components only used what they needed. Watch the video if you're interested to see how this was confirmed 🙌🏻 TL;DW |
Part of TGIR S01E02: Help! RabbitMQ ate my RAM! I came across the following unexpected behaviour when rabbit app gets restart via
rabbitmqctl stop_app ; rabbitmqctl start_app
A worker process is using 56GB of physical memory while reading through all queue segments, across all queues. This operation takes ~30 minutes to complete (13:44:16 - 14:15:46) with the disk reads starting at 410MiB/s and maxing out at 965MiB/s for 15 minutes straight:
This is what the system physical memory breakdown looks like:
Notice:
I have captured the runtime thread stats to show the 88% idle regular schedulers (most time is spent in
send
) and 94% idle dirty_io schedulers. All other thread types are 98% or above idle.I am attaching the netdata export: netdata-c8a3200a7c2f-20200224-130158-6846.snapshot.zip
There is no Prometheus metrics snapshot for the Erlang VM since this behaviour happens before the Prometheus app is started:
The text was updated successfully, but these errors were encountered: