[Questions] [Custom OCI image] Node reports multiple schema database tables as missing after a restart #13151

PaarthShah · 2025-01-24T21:16:49Z

PaarthShah
Jan 24, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.5

Erlang version used

27.2.x

Operating system (distribution) used

Rocky Linux 8.10

How is RabbitMQ deployed?

Community Docker image

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

Status of node rabbit@rabbitmq ...
Error:
{:aborted, {:no_exists, :rabbit_vhost}}

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

2025-01-24 20:46:35.070436+00:00 [info] <0.745106.0> accepting AMQP connection 10.0.1.248:33264 -> 10.0.1.44:5672
2025-01-24 20:46:35.071085+00:00 [warning] <0.745106.0> Mnesia->Khepri fallback handling: Mnesia function failed 99 times. Possibly an infinite retry loop; trying one last time
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>   crasher:
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     initial call: rabbit_reader:init/3
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     pid: <0.745106.0>
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     registered_name: []
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     exception exit: {aborted,
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>                         {no_exists,[rabbit_runtime_parameters,cluster_name]}}
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in function  mnesia:abort/1 (mnesia.erl, line 683)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_db_rtparams:get_in_mnesia/1 (rabbit_db_rtparams.erl, line 150)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_runtime_parameters:lookup0/2 (rabbit_runtime_parameters.erl, line 362)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_runtime_parameters:value0/1 (rabbit_runtime_parameters.erl, line 356)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_nodes:cluster_name/0 (rabbit_nodes.erl, line 103)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_reader:server_properties/1 (rabbit_reader.erl, line 243)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_reader:start_091_connection/3 (rabbit_reader.erl, line 1140)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>       in call from rabbit_reader:handle_input/3 (rabbit_reader.erl, line 1103)
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     ancestors: [<0.745103.0>,<0.779.0>,<0.778.0>,<0.777.0>,<0.775.0>,
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>                   <0.774.0>,rabbit_sup,<0.215.0>]
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     message_queue_len: 1
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     messages: [{'EXIT',#Port<0.143218>,normal}]
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     links: [<0.745103.0>]
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     dictionary: [{process_name,
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>                       {rabbit_reader,
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>                           <<"10.0.1.248:33264 -> 10.0.1.44:5672">>}}]
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     trap_exit: true
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     status: running
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     heap_size: 2586
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     stack_size: 29
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>     reductions: 9058
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0>   neighbours:
2025-01-24 20:46:35.071333+00:00 [error] <0.745106.0> 
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>     supervisor: {<0.745103.0>,rabbit_connection_sup}
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>     errorContext: child_terminated
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>     reason: {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}}
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>     offender: [{pid,<0.745106.0>},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                {id,reader},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                {mfargs,{rabbit_reader,start_link,
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                                       [{<0.745104.0>,<0.745105.0>},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                                        {acceptor,{0,0,0,0,0,0,0,0},5672}]}},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                {restart_type,transient},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                {significant,true},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                {shutdown,300000},
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0>                {child_type,worker}]
2025-01-24 20:46:35.072129+00:00 [error] <0.745103.0> 
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>     supervisor: {<0.745103.0>,rabbit_connection_sup}
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>     errorContext: shutdown
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>     reason: reached_max_restart_intensity
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>     offender: [{pid,<0.745106.0>},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                {id,reader},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                {mfargs,{rabbit_reader,start_link,
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                                       [{<0.745104.0>,<0.745105.0>},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                                        {acceptor,{0,0,0,0,0,0,0,0},5672}]}},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                {restart_type,transient},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                {significant,true},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                {shutdown,300000},
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0>                {child_type,worker}]
2025-01-24 20:46:35.072311+00:00 [error] <0.745103.0> 
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0> ** Generic server channel_queue_metrics_metrics_collector terminating
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0> ** Last message in was collect_metrics
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0> ** When Server state == {state,channel_queue_metrics,5000,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>                                {[{605,5},{3600,60}],
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>                                 [{605,5}],
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>                                 [{605,5},{3660,60},{29400,600},{86400,1800}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>                                basic,fun rabbit_amqqueue:exists/1,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>                                fun rabbit_exchange:exists/1,#{},undefined}
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0> ** Reason for termination ==
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0> ** {badarg,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>        [{ets,member,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>             [rabbit_queue,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>              {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>                  <<"celery@tilegen_worker_5.celery.pidbox">>}],
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>             [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {rabbit_db_queue,exists_in_mnesia,1,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>             [{file,"rabbit_db_queue.erl"},{line,774}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {mnesia_to_khepri,handle_fallback,5,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>             [{file,"src/mnesia_to_khepri.erl"},{line,530}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {rabbit_mgmt_metrics_collector,aggregate_entry,4,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>             [{file,"rabbit_mgmt_metrics_collector.erl"},{line,352}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {ets,do_foldl,4,[{file,"ets.erl"},{line,2073}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {ets,foldl,3,[{file,"ets.erl"},{line,2066}]},
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>         {rabbit_mgmt_metrics_collector,aggregate_metrics,2,
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0>             [{file,"rabbit_mgmt_metrics_collector.erl"},{line,172}]}]}
2025-01-24 20:46:35.765179+00:00 [error] <0.745054.0> 
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>   crasher:
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     initial call: rabbit_mgmt_metrics_collector:init/1
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     pid: <0.745054.0>
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     registered_name: channel_queue_metrics_metrics_collector
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     exception error: bad argument
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in function  ets:member/2
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>          called as ets:member(rabbit_queue,
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>                               {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>                                   <<"celery@tilegen_worker_5.celery.pidbox">>})
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>          *** argument 1: the table identifier does not refer to an existing ETS table
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from rabbit_db_queue:exists_in_mnesia/1 (rabbit_db_queue.erl, line 774)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from mnesia_to_khepri:handle_fallback/5 (src/mnesia_to_khepri.erl, line 530)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from rabbit_mgmt_metrics_collector:aggregate_entry/4 (rabbit_mgmt_metrics_collector.erl, line 352)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from lists:foldl/3 (lists.erl, line 2146)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from ets:do_foldl/4 (ets.erl, line 2073)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from ets:foldl/3 (ets.erl, line 2066)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>       in call from rabbit_mgmt_metrics_collector:aggregate_metrics/2 (rabbit_mgmt_metrics_collector.erl, line 172)
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     ancestors: [rabbit_mgmt_agent_sup,rabbit_mgmt_agent_sup_sup,<0.669.0>]
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     message_queue_len: 0
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     messages: []
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     links: [<0.672.0>]
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     dictionary: []
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     trap_exit: false
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     status: running
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     heap_size: 4185
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     stack_size: 29
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>     reductions: 24483
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0>   neighbours:
2025-01-24 20:46:35.765693+00:00 [error] <0.745054.0> 
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>     supervisor: {local,rabbit_mgmt_agent_sup}
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>     errorContext: child_terminated
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>     reason: {badarg,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                 [{ets,member,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                      [rabbit_queue,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                       {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                           <<"celery@tilegen_worker_5.celery.pidbox">>}],
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                      [{error_info,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                           #{cause => id,module => erl_stdlib_errors}}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {rabbit_db_queue,exists_in_mnesia,1,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                      [{file,"rabbit_db_queue.erl"},{line,774}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {mnesia_to_khepri,handle_fallback,5,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                      [{file,"src/mnesia_to_khepri.erl"},{line,530}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {rabbit_mgmt_metrics_collector,aggregate_entry,4,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                      [{file,"rabbit_mgmt_metrics_collector.erl"},{line,352}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {ets,do_foldl,4,[{file,"ets.erl"},{line,2073}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {ets,foldl,3,[{file,"ets.erl"},{line,2066}]},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                  {rabbit_mgmt_metrics_collector,aggregate_metrics,2,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                      [{file,"rabbit_mgmt_metrics_collector.erl"},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                       {line,172}]}]}
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>     offender: [{pid,<0.745054.0>},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                {id,channel_queue_metrics_metrics_collector},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                {mfargs,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                    {rabbit_mgmt_metrics_collector,start_link,
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                        [channel_queue_metrics]}},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                {restart_type,permanent},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                {significant,false},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                {shutdown,300000},
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0>                {child_type,worker}]
2025-01-24 20:46:35.766482+00:00 [error] <0.672.0> 
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0> ** Generic server queue_metrics_metrics_collector terminating
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0> ** Last message in was collect_metrics
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0> ** When Server state == {state,queue_metrics,5000,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>                                {[{605,5},{3600,60}],
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>                                 [{605,5}],
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>                                 [{605,5},{3660,60},{29400,600},{86400,1800}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>                                basic,fun rabbit_amqqueue:exists/1,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>                                fun rabbit_exchange:exists/1,#{},undefined}
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0> ** Reason for termination ==
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0> ** {badarg,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>        [{ets,member,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>             [rabbit_queue,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>              {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>                  <<"celeryev.6e813829-e515-4bf3-8e1c-35a5166f070b">>}],
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>             [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {rabbit_db_queue,exists_in_mnesia,1,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>             [{file,"rabbit_db_queue.erl"},{line,774}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {mnesia_to_khepri,handle_fallback,5,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>             [{file,"src/mnesia_to_khepri.erl"},{line,530}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {rabbit_mgmt_metrics_collector,aggregate_entry,4,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>             [{file,"rabbit_mgmt_metrics_collector.erl"},{line,453}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {ets,do_foldl,4,[{file,"ets.erl"},{line,2073}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {ets,foldl,3,[{file,"ets.erl"},{line,2066}]},
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>         {rabbit_mgmt_metrics_collector,aggregate_metrics,2,
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0>             [{file,"rabbit_mgmt_metrics_collector.erl"},{line,172}]}]}
2025-01-24 20:46:35.819227+00:00 [error] <0.745055.0> 
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>   crasher:
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     initial call: rabbit_mgmt_metrics_collector:init/1
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     pid: <0.745055.0>
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     registered_name: queue_metrics_metrics_collector
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     exception error: bad argument
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in function  ets:member/2
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>          called as ets:member(rabbit_queue,
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>                               {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>                                   <<"celeryev.6e813829-e515-4bf3-8e1c-35a5166f070b">>})
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>          *** argument 1: the table identifier does not refer to an existing ETS table
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from rabbit_db_queue:exists_in_mnesia/1 (rabbit_db_queue.erl, line 774)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from mnesia_to_khepri:handle_fallback/5 (src/mnesia_to_khepri.erl, line 530)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from rabbit_mgmt_metrics_collector:aggregate_entry/4 (rabbit_mgmt_metrics_collector.erl, line 453)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from lists:foldl/3 (lists.erl, line 2146)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from ets:do_foldl/4 (ets.erl, line 2073)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from ets:foldl/3 (ets.erl, line 2066)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>       in call from rabbit_mgmt_metrics_collector:aggregate_metrics/2 (rabbit_mgmt_metrics_collector.erl, line 172)
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     ancestors: [rabbit_mgmt_agent_sup,rabbit_mgmt_agent_sup_sup,<0.669.0>]
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     message_queue_len: 0
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     messages: []
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     links: [<0.672.0>]
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     dictionary: []
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     trap_exit: false
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     status: running
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     heap_size: 17731
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     stack_size: 29
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>     reductions: 23893
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0>   neighbours:
2025-01-24 20:46:35.819836+00:00 [error] <0.745055.0> 
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>     supervisor: {local,rabbit_mgmt_agent_sup}
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>     errorContext: child_terminated
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>     reason: {badarg,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                 [{ets,member,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                      [rabbit_queue,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                       {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                           <<"celeryev.6e813829-e515-4bf3-8e1c-35a5166f070b">>}],
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                      [{error_info,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                           #{cause => id,module => erl_stdlib_errors}}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {rabbit_db_queue,exists_in_mnesia,1,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                      [{file,"rabbit_db_queue.erl"},{line,774}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {mnesia_to_khepri,handle_fallback,5,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                      [{file,"src/mnesia_to_khepri.erl"},{line,530}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {rabbit_mgmt_metrics_collector,aggregate_entry,4,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                      [{file,"rabbit_mgmt_metrics_collector.erl"},{line,453}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {ets,do_foldl,4,[{file,"ets.erl"},{line,2073}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {ets,foldl,3,[{file,"ets.erl"},{line,2066}]},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                  {rabbit_mgmt_metrics_collector,aggregate_metrics,2,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                      [{file,"rabbit_mgmt_metrics_collector.erl"},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                       {line,172}]}]}
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>     offender: [{pid,<0.745055.0>},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                {id,queue_metrics_metrics_collector},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                {mfargs,
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                    {rabbit_mgmt_metrics_collector,start_link,[queue_metrics]}},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                {restart_type,permanent},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                {significant,false},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                {shutdown,300000},
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0>                {child_type,worker}]
2025-01-24 20:46:35.820563+00:00 [error] <0.672.0> 
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0> ** Generic server queue_coarse_metrics_metrics_collector terminating
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0> ** Last message in was collect_metrics
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0> ** When Server state == {state,queue_coarse_metrics,5000,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>                                {[{605,5},{3600,60}],
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>                                 [{605,5}],
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>                                 [{605,5},{3660,60},{29400,600},{86400,1800}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>                                basic,fun rabbit_amqqueue:exists/1,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>                                fun rabbit_exchange:exists/1,#{},undefined}
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0> ** Reason for termination ==
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0> ** {badarg,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>        [{ets,member,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>             [rabbit_queue,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>              {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>                  <<"celeryev.6e813829-e515-4bf3-8e1c-35a5166f070b">>}],
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>             [{error_info,#{cause => id,module => erl_stdlib_errors}}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {rabbit_db_queue,exists_in_mnesia,1,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>             [{file,"rabbit_db_queue.erl"},{line,774}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {mnesia_to_khepri,handle_fallback,5,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>             [{file,"src/mnesia_to_khepri.erl"},{line,530}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {rabbit_mgmt_metrics_collector,aggregate_entry,4,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>             [{file,"rabbit_mgmt_metrics_collector.erl"},{line,489}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {ets,do_foldl,4,[{file,"ets.erl"},{line,2073}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {ets,foldl,3,[{file,"ets.erl"},{line,2066}]},
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>         {rabbit_mgmt_metrics_collector,aggregate_metrics,2,
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0>             [{file,"rabbit_mgmt_metrics_collector.erl"},{line,172}]}]}
2025-01-24 20:46:35.825145+00:00 [error] <0.745056.0> 
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>   crasher:
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     initial call: rabbit_mgmt_metrics_collector:init/1
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     pid: <0.745056.0>
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     registered_name: queue_coarse_metrics_metrics_collector
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     exception error: bad argument
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in function  ets:member/2
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>          called as ets:member(rabbit_queue,
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>                               {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>                                   <<"celeryev.6e813829-e515-4bf3-8e1c-35a5166f070b">>})
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>          *** argument 1: the table identifier does not refer to an existing ETS table
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from rabbit_db_queue:exists_in_mnesia/1 (rabbit_db_queue.erl, line 774)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from mnesia_to_khepri:handle_fallback/5 (src/mnesia_to_khepri.erl, line 530)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from rabbit_mgmt_metrics_collector:aggregate_entry/4 (rabbit_mgmt_metrics_collector.erl, line 489)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from lists:foldl/3 (lists.erl, line 2146)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from ets:do_foldl/4 (ets.erl, line 2073)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from ets:foldl/3 (ets.erl, line 2066)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>       in call from rabbit_mgmt_metrics_collector:aggregate_metrics/2 (rabbit_mgmt_metrics_collector.erl, line 172)
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     ancestors: [rabbit_mgmt_agent_sup,rabbit_mgmt_agent_sup_sup,<0.669.0>]
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     message_queue_len: 0
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     messages: []
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     links: [<0.672.0>]
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     dictionary: []
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     trap_exit: false
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     status: running
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     heap_size: 6772
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     stack_size: 29
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>     reductions: 24786
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0>   neighbours:
2025-01-24 20:46:35.825831+00:00 [error] <0.745056.0> 
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>     supervisor: {local,rabbit_mgmt_agent_sup}
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>     errorContext: child_terminated
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>     reason: {badarg,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                 [{ets,member,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                      [rabbit_queue,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                       {resource,<<"ingestion">>,queue,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                           <<"celeryev.6e813829-e515-4bf3-8e1c-35a5166f070b">>}],
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                      [{error_info,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                           #{cause => id,module => erl_stdlib_errors}}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {rabbit_db_queue,exists_in_mnesia,1,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                      [{file,"rabbit_db_queue.erl"},{line,774}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {mnesia_to_khepri,handle_fallback,5,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                      [{file,"src/mnesia_to_khepri.erl"},{line,530}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {rabbit_mgmt_metrics_collector,aggregate_entry,4,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                      [{file,"rabbit_mgmt_metrics_collector.erl"},{line,489}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {ets,do_foldl,4,[{file,"ets.erl"},{line,2073}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {ets,foldl,3,[{file,"ets.erl"},{line,2066}]},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                  {rabbit_mgmt_metrics_collector,aggregate_metrics,2,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                      [{file,"rabbit_mgmt_metrics_collector.erl"},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                       {line,172}]}]}
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>     offender: [{pid,<0.745056.0>},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                {id,queue_coarse_metrics_metrics_collector},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                {mfargs,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                    {rabbit_mgmt_metrics_collector,start_link,
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                        [queue_coarse_metrics]}},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                {restart_type,permanent},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                {significant,false},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                {shutdown,300000},
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>                {child_type,worker}]
2025-01-24 20:46:35.826674+00:00 [error] <0.672.0>

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

loopback_users.guest = false
log.console = true
heartbeat = 600

Steps to deploy RabbitMQ cluster

N/a; we use a single standalone rabbitmq-management container via docker.
Our compose.yaml for it looks like this:

services:
  rabbitmq:
    image: <company>-rabbitmq # in-house image that just copies in the rabbitmq.conf and sets the env var to use it
    restart: unless-stopped
    hostname: "rabbitmq.node-1"
    ports:
      - "5672:5672"
      - "15672:15672"
    env_file:
      - mca.env
      - rabbitmq.env
    volumes:
      - rabbitmq4:/var/lib/rabbitmq

Steps to reproduce the behavior in question

docker compose stop rabbitmq
docker compose restart rabbitmq

Purpose: to bring down our application for redeployment, and to restart it with new docker images used for other services.

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

When rabbitmq restarts, it gets into a state where it boots, and is running, but no connections can be accepted due to a vhost-related issue.

Manually restarting rabbitmq a SECOND time fixes the issue.

Answered by michaelklishin

Jan 24, 2025

#3837 looks distantly similar, and there you can see evidence of a node that hasn't synced its data from peers yet but already has a client trying to connect and perform operations.

This can be a manifestation of this long documented behavior with node restarts that usually affects Kubernetes but in general can affect any environment where the tool that is responsible for stopping/restarting nodes assumes that nodes do not depend on each other when they are restarted (which is not the case, see the doc guides linked to earlier).

Therefore a specific node does not have its metadata store tables yet, e.g. because its waiting for its last known peer to come online, the peer does not do it be…

View full answer

michaelklishin · 2025-01-24T21:44:53Z

michaelklishin
Jan 24, 2025
Maintainer

@PaarthShah we cannot suggest anything with a single log line.

0 replies

michaelklishin · 2025-01-24T21:48:36Z

michaelklishin
Jan 24, 2025
Maintainer

Multiple error messages in the log suggest that several internal metadata store (Mnesia in this case) tables are missing. This is not at all common to see, most likely something is wrong with this installation's node data directory, such as directory permissions or something like that, therefore the schema data store could not perform its usual initialization and default data seeding.

Unless you can provide clear evidence of a problem in RabbitMQ itself, all Docker image questions should be directed to the respective image repository's Discussions.

in-house image that just copies in the rabbitmq.conf and sets the env var to use it

In-house images are yours to troubleshoot. Don't expect the community to do it for you.

0 replies

michaelklishin · 2025-01-24T22:04:08Z

michaelklishin
Jan 24, 2025
Maintainer

#3837 looks distantly similar, and there you can see evidence of a node that hasn't synced its data from peers yet but already has a client trying to connect and perform operations.

This can be a manifestation of this long documented behavior with node restarts that usually affects Kubernetes but in general can affect any environment where the tool that is responsible for stopping/restarting nodes assumes that nodes do not depend on each other when they are restarted (which is not the case, see the doc guides linked to earlier).

Therefore a specific node does not have its metadata store tables yet, e.g. because its waiting for its last known peer to come online, the peer does not do it because of how the deployment tool operates, and clients that connect won't be able to perform any operation on said node, and won't most CLI commands that usually need a running metadata store.

Both the problematic sequence of events and the recommended solution (besides "use our cluster Operator") for Kubernetes has long been documented.

What's the best option for docker-compose, I don't know but I assume some of the recommendations would be possible to adopt.

1 reply

michaelklishin Jan 24, 2025
Maintainer

One small relevant feature discussed well over one year ago that's slipped through the cracks #13153.

It won't help with deadlocked restarts or damaged node data directories but it will help with the race condition between connecting clients and the successful metadata store syncs that simply takes fractions of a second longer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Questions] [Custom OCI image] Node reports multiple schema database tables as missing after a restart #13151

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[Questions] [Custom OCI image] Node reports multiple schema database tables as missing after a restart #13151

PaarthShah Jan 24, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 3 comments · 1 reply

michaelklishin Jan 24, 2025 Maintainer

michaelklishin Jan 24, 2025 Maintainer

michaelklishin Jan 24, 2025 Maintainer

michaelklishin Jan 24, 2025 Maintainer

PaarthShah
Jan 24, 2025

Replies: 3 comments 1 reply

michaelklishin
Jan 24, 2025
Maintainer

michaelklishin
Jan 24, 2025
Maintainer

michaelklishin
Jan 24, 2025
Maintainer

michaelklishin Jan 24, 2025
Maintainer