Memory use of a node spikes around the time of mass client disconnect #3226
-
we have an Rocky openstack deployment that includes 3 controller and 500 computes.just at one moment,nova-compute detect that rabbitmq connection was broken ,then reconnected.In 15 minutes,memory consumption on rabbitmq-server increased abruptly,from 3G orinally to 150G, reached 40% watermark. rabbitmq.log 2021-07-05 15:58:28.633 8 ERROR oslo.messaging._drivers.impl_rabbit [req-a09d4a8b-c24b-4b30-b433-64fe4f6bace5 - - - - -] [8ed1f425-ad67-4b98-874c-e4516aaf3134] AMQP server on 145.247.103.16:5671 is unreachable: . Trying again in 1 seconds.: timeout then rabbitmq report huge connections was closed by client. =WARNING REPORT==== 5-Jul-2021::15:57:59 === after 10 minutes ,cluster was blocked with 0.4 memory watermark. =INFO REPORT==== 5-Jul-2021::16:19:29 === *** Publishers will be blocked until this alarm clears *** However ,after the publishers were bloked ,rabbitmq pod still result in memory leaking,in the end, the node OOM,system force pod to restart. rabbitmq-management : on |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I will convert this issue to a GitHub discussion. Currently GitHub will automatically close and lock the issue even though your question will be transferred and responded to elsewhere. This is to let you know that we do not intend to ignore this but this is how the current GitHub conversion mechanism makes it seem for the users :( |
Beta Was this translation helpful? Give feedback.
-
I'm afraid that's not a whole lot of evidence of a leak. Messages are not the only thing that consumes resources. Connections do, too, in particular in case of high churn that you have provided evidence of (mass client disconnections). RabbitMQ 3.6.16 has been out of support for over three years. Erlang memory allocators and GC have changed since Erlang 19 as well (latest releases are 24.x). I'm afraid the only piece of advice we have is
|
Beta Was this translation helpful? Give feedback.
-
I now recall that around Erlang 18-19 series, difficult to explain massive heap allocations were relatively common. Heap fragmentation can still be observed with Erlang 23 and 24 but There were quite a few potentially relevant changes, including around memory allocator behavior and available metrics starting with Erlang 21. |
Beta Was this translation helpful? Give feedback.
I'm afraid that's not a whole lot of evidence of a leak. Messages are not the only thing that consumes resources. Connections do, too, in particular in case of high churn that you have provided evidence of (mass client disconnections).
There are tools and metrics that would help you understand what exactly uses the memory.
RabbitMQ 3.6.16 has been out of support for over three years. Erlang memory allocators and GC have changed since Erlang 19 as well (latest releases are 24.x).
I'm afraid the only piece of advice we have is
rabbitmq-diagnostics memory_breakdown
is available inrabbitmqctl status
in 3.6)