Node can use significant (e.g. 80GB) amounts of RAM during startup #2254

gerhard · 2020-02-24T13:59:53Z

Part of TGIR S01E02: Help! RabbitMQ ate my RAM! I came across the following unexpected behaviour when rabbit app gets restart via rabbitmqctl stop_app ; rabbitmqctl start_app

A worker process is using 56GB of physical memory while reading through all queue segments, across all queues. This operation takes ~30 minutes to complete (13:44:16 - 14:15:46) with the disk reads starting at 410MiB/s and maxing out at 965MiB/s for 15 minutes straight:

This is what the system physical memory breakdown looks like:

Notice:

Slow 1.8GiB -> 7.7GiB memory growth spanning ~13 minutes (13:28:00 - 13:41:00)
Large spike to 42GiB the next minute (13:42:00)
Another spike to 60GiB for which cached memory is made available (13:43:00)
Last spike to 80GiB (13:43:45) 🤯
A drop to 60GiB for 30 minutes during which the disk gets hammered (13:44:15)
Drop to 8GiB (14:16:00) followed by a slow ramp-up during which publishers/consumers get reconnected
Regular 11.5GiB -> 17.5GiB spikes, as expected for this workload.

I have captured the runtime thread stats to show the 88% idle regular schedulers (most time is spent in send) and 94% idle dirty_io schedulers. All other thread types are 98% or above idle.

rabbitmq-diagnostics runtime_thread_stats --sample-interval 60 | grep -v async
Will collect runtime thread stats on rabbit@tgir-s01e02-gerhard-rabbitmq-durable for 60 seconds...
Average thread real-time    :  60152930 us
Accumulated system run-time : 162534659 us
Average scheduler run-time  :   7169362 us

        Thread    alloc      aux      bifbusy_wait check_io emulator      ets       gc  gc_full      nif    other     port     send    sleep   timers

Stats per thread:
       aux( 1)    0.59%    0.64%    0.00%    0.00%    0.12%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   98.65%    0.00%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 5)    0.57%    0.00%    0.00%    2.62%    0.00%    0.02%    0.00%    1.75%    0.77%    0.00%    0.01%    0.00%    0.00%   94.26%    0.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 7)    0.56%    0.00%    0.00%    2.61%    0.00%    0.02%    0.00%    1.68%    0.72%    0.00%    0.01%    0.00%    0.00%   94.40%    0.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_(11)    0.66%    0.00%    0.00%    2.96%    0.00%    0.02%    0.00%    1.92%    0.83%    0.00%    0.01%    0.00%    0.00%   93.60%    0.00%
dirty_cpu_(12)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_(13)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_(14)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_(15)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_(16)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 2)    0.03%    0.00%    0.00%    1.22%    0.00%    0.02%    0.00%    0.00%    0.00%    6.10%    0.01%    0.00%    0.00%   92.63%    0.00%
dirty_io_s( 3)    0.01%    0.00%    0.00%    0.49%    0.00%    0.01%    0.00%    0.00%    0.00%    2.62%    0.00%    0.00%    0.00%   96.87%    0.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 5)    0.05%    0.00%    0.00%    2.14%    0.00%    0.03%    0.00%    0.00%    0.00%   10.72%    0.01%    0.00%    0.00%   87.05%    0.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 7)    0.06%    0.00%    0.00%    1.67%    0.00%    0.02%    0.00%    0.00%    0.00%    8.26%    0.01%    0.00%    0.00%   89.98%    0.00%
dirty_io_s( 8)    0.06%    0.00%    0.00%    2.36%    0.00%    0.03%    0.00%    0.00%    0.00%   11.10%    0.01%    0.00%    0.00%   86.44%    0.00%
dirty_io_s( 9)    0.06%    0.00%    0.00%    2.20%    0.00%    0.03%    0.00%    0.00%    0.00%   11.07%    0.01%    0.00%    0.00%   86.63%    0.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
 scheduler( 1)    3.07%    0.36%    0.07%    4.51%    0.01%   13.89%    0.69%    0.12%    0.00%    0.01%    0.08%    0.00%   21.67%   55.52%    0.01%
 scheduler( 2)    4.03%    0.29%    0.10%    4.62%    0.00%   18.43%    0.89%    0.16%    0.00%    0.01%    0.10%    0.00%   27.04%   44.34%    0.00%
 scheduler( 3)    1.93%    0.25%    0.07%    2.81%    0.00%    8.98%    0.48%    0.08%    0.00%    0.01%    0.05%    0.00%   23.87%   61.46%    0.00%
 scheduler( 4)    2.84%    0.27%    0.08%    3.69%    0.00%   13.01%    0.67%    0.12%    0.00%    0.01%    0.07%    0.00%   26.82%   52.40%    0.00%
 scheduler( 5)    0.00%    0.00%    0.00%    0.36%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.63%    0.00%
 scheduler( 6)    0.00%    0.01%    0.00%    0.43%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.56%    0.00%
 scheduler( 7)    0.00%    0.00%    0.00%    0.31%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.68%    0.00%
 scheduler( 8)    0.00%    0.00%    0.00%    0.31%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.68%    0.00%
 scheduler( 9)    0.00%    0.00%    0.00%    0.32%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.67%    0.00%
 scheduler(10)    0.00%    0.00%    0.00%    0.32%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.38%    0.00%    0.00%   99.30%    0.00%
 scheduler(11)    0.00%    0.00%    0.00%    0.30%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.70%    0.00%
 scheduler(12)    0.00%    0.00%    0.00%    0.40%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.59%    0.00%
 scheduler(13)    0.00%    0.00%    0.00%    0.33%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.67%    0.00%
 scheduler(14)    0.00%    0.00%    0.00%    0.32%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.67%    0.00%
 scheduler(15)    0.00%    0.00%    0.00%    0.31%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.68%    0.00%
 scheduler(16)    0.00%    0.00%    0.00%    0.28%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.71%    0.00%

Stats per type:
           aux    0.59%    0.64%    0.00%    0.00%    0.12%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   98.65%    0.00%
dirty_cpu_sche    0.11%    0.00%    0.00%    0.51%    0.00%    0.00%    0.00%    0.33%    0.14%    0.00%    0.00%    0.00%    0.00%   98.89%    0.00%
dirty_io_sched    0.03%    0.00%    0.00%    1.01%    0.00%    0.01%    0.00%    0.00%    0.00%    4.99%    0.01%    0.00%    0.00%   93.96%    0.00%
          poll    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     scheduler    0.74%    0.08%    0.02%    1.23%    0.00%    3.39%    0.17%    0.03%    0.00%    0.00%    0.04%    0.00%    6.21%   88.08%    0.00%

I am attaching the netdata export: netdata-c8a3200a7c2f-20200224-130158-6846.snapshot.zip

There is no Prometheus metrics snapshot for the Erlang VM since this behaviour happens before the Prometheus app is started:

The text was updated successfully, but these errors were encountered:

For more context, see rabbitmq/rabbitmq-server#2254 Signed-off-by: Gerhard Lazu <[email protected]>

Since everything is going to disk, I believe that it will make a big difference for memory usage, and I also suspect that it will help with rabbitmq/rabbitmq-server#2254. Let's try it out! Signed-off-by: Gerhard Lazu <[email protected]>

For more context, see rabbitmq/rabbitmq-server#2254 Signed-off-by: Gerhard Lazu <[email protected]>

Since everything is going to disk, I believe that it will make a big difference for memory usage, and I also suspect that it will help with rabbitmq/rabbitmq-server#2254. Let's try it out! Signed-off-by: Gerhard Lazu <[email protected]>

HoloRin · 2020-03-11T08:45:53Z

After doing some investigation, I believe this is caused by the rebuilding of message refcounts by rabbit_msg_store. We can the disk asynchronously, and in a few limited tests, I saw that we scan and enqueue faster than we consume, resulting in a very large backlog of messages in memory.
The [erlang] queues in question are unbounded, so my initial thought is to bound those queues in a matter that puts are blocking, which will throttle the disk reads to limit the data transiently in memory.

michaelklishin · 2020-03-11T17:39:33Z

@pjk25 we already have a mechanism for the producer-consumer problem and unbounded buffers in RabbitMQ. It's called credit flow. It's a producer side solution so if we don't control the "producer side" of this process interaction it cannot be used but sounds like we do. No need for blocking puts, producers can self-limit using credit_flow.

HoloRin · 2020-03-11T21:04:42Z

After further exploration, I have learned that the gatherer managing the above mentioned queues blocks its publishers until messages are consumed, invalidating the above theory.

HoloRin · 2020-03-12T08:22:46Z

@michaelklishin thanks for the insight. The queues I were looking at were these

rabbitmq-server/src/gatherer.erl

Line 90 in f8d4797

{ok, #gstate { forks = 0, values = queue:new(), blocked = queue:new() },

, so not "rabbit" queues, although I'm sure I could have made my comment more clear.

HoloRin · 2020-03-18T15:40:44Z

It's worth noting that even though we shut down the broker with rabbitmqctl stop_app first, we still follow this code path, which I believe corresponds to an unclean shutdown

rabbitmq-server/src/rabbit_msg_store.erl

Line 1735 in abc0b52

build_index(false, {MsgRefDeltaGen, MsgRefDeltaGenInit},

Also, the call to

rabbitmq-server/src/rabbit_msg_store.erl

Line 1737 in abc0b52

ok = count_msg_refs(MsgRefDeltaGen, MsgRefDeltaGenInit, State),

takes about 10 minutes to complete, before we get to the point where memory is consumed rapidly.

gerhard · 2020-03-26T20:51:19Z

I've just tested this in the context of TGIR S01E03 and can confirm that #2274 fixes this.

The memory usage doesn't go above 8GiB and even though the recovery time didn't improve (it still took ~45 minutes for the node to restart), the memory usage was perfect: the ETS data structures used the expected memory, all other components only used what they needed.

Watch the video if you're interested to see how this was confirmed 🙌🏻

TL;DW

gerhard changed the title ~~[WIP] Should a node with 50 mil durable messages @ 26KB each spread across 50 durable queues require 60GB of memory to boot?~~ Should a node with 50 mil persistent messages @ 26KB each spread across 50 durable queues require 60GB of memory to boot? Feb 24, 2020

gerhard changed the title ~~Should a node with 50mil persistent messages of size 26KB and spread across 50 durable queues (1mil msg per queue) require 60GB of memory to boot?~~ Should a node with 50mil persistent messages of size 26KB require 60GB of RAM during startup? Feb 24, 2020

gerhard changed the title ~~Should a node with 50mil persistent messages of size 26KB require 60GB of RAM during startup?~~ Should a node with 50mil persistent messages of size 26KB require 80GB of RAM during startup? Feb 24, 2020

gerhard added a commit to rabbitmq/tgir that referenced this issue Feb 24, 2020

Bump machine size & memory watermakr for durable workload

870ce82

For more context, see rabbitmq/rabbitmq-server#2254 Signed-off-by: Gerhard Lazu <[email protected]>

gerhard added a commit to rabbitmq/tgir that referenced this issue Feb 26, 2020

Bump machine size & memory watermakr for durable workload

b9463a2

For more context, see rabbitmq/rabbitmq-server#2254 Signed-off-by: Gerhard Lazu <[email protected]>

michaelklishin changed the title ~~Should a node with 50mil persistent messages of size 26KB require 80GB of RAM during startup?~~ Node can use significant (e.g. 80GB) amounts of RAM during startup Mar 11, 2020

HoloRin mentioned this issue Mar 12, 2020

Reduce memory consumption during startup when there are a large amount of messages on disk #2274

Closed

HoloRin mentioned this issue Mar 18, 2020

Reduce memory usage during startup #2279

Merged

gerhard added this to the 3.8.4 milestone Mar 25, 2020

gerhard closed this as completed Mar 26, 2020

HoloRin mentioned this issue Apr 22, 2020

Message indices fail to recover, even if the broker shuts down gracefully with rabbitmqctl #2324

Closed

rabbitmq deleted a comment from JamesYLiberis Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node can use significant (e.g. 80GB) amounts of RAM during startup #2254

Node can use significant (e.g. 80GB) amounts of RAM during startup #2254

gerhard commented Feb 24, 2020 •

edited

Loading

HoloRin commented Mar 11, 2020

michaelklishin commented Mar 11, 2020

HoloRin commented Mar 11, 2020

HoloRin commented Mar 12, 2020

HoloRin commented Mar 18, 2020

gerhard commented Mar 26, 2020

Node can use significant (e.g. 80GB) amounts of RAM during startup #2254

Node can use significant (e.g. 80GB) amounts of RAM during startup #2254

Comments

gerhard commented Feb 24, 2020 • edited Loading

HoloRin commented Mar 11, 2020

michaelklishin commented Mar 11, 2020

HoloRin commented Mar 11, 2020

HoloRin commented Mar 12, 2020

HoloRin commented Mar 18, 2020

gerhard commented Mar 26, 2020

gerhard commented Feb 24, 2020 •

edited

Loading