Khepri on K8S: Khepri gives up on trying to form a cluster (timeout_waiting_for_leader) after a cluster-wide shutdown and an attempt at an ordered restart #13182

evolvedlight · 2025-01-30T16:48:23Z

evolvedlight
Jan 30, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.5

Erlang version used

27.2.x

Operating system (distribution) used

Openshift

How is RabbitMQ deployed?

Community Docker image

Logs from node 1 (with sensitive values edited out)

'/etc/rabbitmq/rabbitmq.conf' -> '/var/lib/rabbitmq/rabbitmq.conf'
'/etc/rabbitmq/advanced.config' -> '/var/lib/rabbitmq/advanced.config'
2025-01-30 16:28:48.508028+00:00 [warning] <0.153.0> Overriding Erlang cookie using the value set in the environment
2025-01-30 16:28:52.080561+00:00 [notice] <0.45.0> Application syslog exited with reason: stopped
2025-01-30 16:28:52.080669+00:00 [notice] <0.216.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
{"time":"2025-01-30 16:28:52.081300+00:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","line":1549,"pid":"<0.216.0>","file":"rabbit_prelaunch_logging.erl","domain":"rabbitmq.prelaunch","mfa":["rabbit_prelaunch_logging","install_handlers",0]}
{"time":"2025-01-30 16:28:52.099445+00:00","level":"info","msg":"ra: starting system quorum_queues","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:28:52.099553+00:00","level":"info","msg":"starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/quorum/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
{"time":"2025-01-30 16:28:52.219424+00:00","level":"info","msg":"ra system 'quorum_queues' running pre init for 138 registered servers","line":43,"pid":"<0.229.0>","file":"src/ra_log_pre_init.erl","domain":"ra","mfa":["ra_log_pre_init","init",1]}
{"time":"2025-01-30 16:28:52.475760+00:00","level":"info","msg":"ra: meta data store initialised for system quorum_queues. 138 record(s) recovered","line":58,"pid":"<0.230.0>","file":"src/ra_log_meta.erl","domain":"ra","mfa":["ra_log_meta","init",1]}
{"time":"2025-01-30 16:28:52.494177+00:00","level":"notice","msg":"WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables","line":255,"pid":"<0.235.0>","file":"src/ra_log_wal.erl","domain":"ra","mfa":["ra_log_wal","init",1]}
{"time":"2025-01-30 16:28:52.517712+00:00","level":"info","msg":"ra_system_recover: ra system 'quorum_queues' server recovery strategy rabbit_quorum_queue:system_recover","line":56,"pid":"<0.237.0>","file":"src/ra_system_recover.erl","domain":"ra","mfa":["ra_system_recover","init",1]}
{"time":"2025-01-30 16:28:52.517837+00:00","level":"info","msg":"[rabbit_quorum_queue:system_recover/1] rabbit not booted, skipping queue recovery","pid":"<0.237.0>","domain":"rabbitmq"}
{"time":"2025-01-30 16:28:52.518048+00:00","level":"info","msg":"ra: starting system coordination","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:28:52.518085+00:00","level":"info","msg":"starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/coordination/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
{"time":"2025-01-30 16:28:52.519353+00:00","level":"info","msg":"ra system 'coordination' running pre init for 2 registered servers","line":43,"pid":"<0.243.0>","file":"src/ra_log_pre_init.erl","domain":"ra","mfa":["ra_log_pre_init","init",1]}
{"time":"2025-01-30 16:28:52.659827+00:00","level":"info","msg":"ra: meta data store initialised for system coordination. 2 record(s) recovered","line":58,"pid":"<0.244.0>","file":"src/ra_log_meta.erl","domain":"ra","mfa":["ra_log_meta","init",1]}
{"time":"2025-01-30 16:28:52.660281+00:00","level":"notice","msg":"WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables","line":255,"pid":"<0.249.0>","file":"src/ra_log_wal.erl","domain":"ra","mfa":["ra_log_wal","init",1]}
{"time":"2025-01-30 16:28:52.663681+00:00","level":"info","msg":"ra: starting system coordination","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:28:52.663751+00:00","level":"info","msg":"starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/coordination/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
BOOT FAILED
{"time":"2025-01-30 16:29:23.065019+00:00","level":"error","msg":"\nBOOT FAILED\n===========\nException during startup:\n\nexit:timeout_waiting_for_leader\n\n rabbit_khepri:setup/1, line 278\n rabbit:run_prelaunch_second_phase/0, line 396\n rabbit:start/2, line 922\n application_master:start_it_old/4, line 295\n","line":138,"pid":"<0.216.0>","file":"rabbit_prelaunch_errors.erl","domain":"rabbitmq.prelaunch","mfa":["rabbit_prelaunch_errors","log_message",1]}
===========
Exception during startup:
exit:timeout_waiting_for_leader
rabbit_khepri:setup/1, line 278
rabbit:run_prelaunch_second_phase/0, line 396
rabbit:start/2, line 922
application_master:start_it_old/4, line 295
{{"time":"2025-01-30 16:29:24.066476+00:00","level":"notice","msg":"Application rabbit exited with reason: {timeout_waiting_for_leader,{rabbit,start,[normal,[]]}}","line":2125,"pid":"<0.45.0>","file":"application_controller.erl","domain":"otp","mfa":["application_controller","info_exited",3]}
exit,terminating,[{application_controller,call,2,[{file,"application_controller.erl"},{line,511}]},{application,'-ensure_all_started/3-lc$^0/1-0-',1,[{file,"application.erl"},{line,367}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,367}]},{rabbit,'-start_it/1-fun-0-',1,[{file,"rabbit.erl"},{line,430}]},{timer,tc,2,[{file,"timer.erl"},{line,595}]},{rabbit,start_it,1,[{file,"rabbit.erl"},{line,426}]},{init,start_it,1,[]},{init,start_em,1,[]}]}
Runtime terminating during boot (terminating)
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
[os_mon] memory supervisor port (memsup): Erlang has closed

Logs from node 2 (if applicable, with sensitive values edited out)

'/etc/rabbitmq/rabbitmq.conf' -> '/var/lib/rabbitmq/rabbitmq.conf'
'/etc/rabbitmq/advanced.config' -> '/var/lib/rabbitmq/advanced.config'
2025-01-30 16:28:40.877123+00:00 [warning] <0.153.0> Overriding Erlang cookie using the value set in the environment
2025-01-30 16:28:43.686095+00:00 [notice] <0.45.0> Application syslog exited with reason: stopped
2025-01-30 16:28:43.686178+00:00 [notice] <0.216.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
{"time":"2025-01-30 16:28:43.686692+00:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","line":1549,"pid":"<0.216.0>","file":"rabbit_prelaunch_logging.erl","domain":"rabbitmq.prelaunch","mfa":["rabbit_prelaunch_logging","install_handlers",0]}
{"time":"2025-01-30 16:28:43.750998+00:00","level":"info","msg":"ra: starting system quorum_queues","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:28:43.751161+00:00","level":"info","msg":"starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/quorum/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
{"time":"2025-01-30 16:28:43.856886+00:00","level":"info","msg":"ra system 'quorum_queues' running pre init for 136 registered servers","line":43,"pid":"<0.229.0>","file":"src/ra_log_pre_init.erl","domain":"ra","mfa":["ra_log_pre_init","init",1]}
{"time":"2025-01-30 16:28:44.025305+00:00","level":"info","msg":"ra: meta data store initialised for system quorum_queues. 136 record(s) recovered","line":58,"pid":"<0.230.0>","file":"src/ra_log_meta.erl","domain":"ra","mfa":["ra_log_meta","init",1]}
{"time":"2025-01-30 16:28:44.066355+00:00","level":"notice","msg":"WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables","line":255,"pid":"<0.235.0>","file":"src/ra_log_wal.erl","domain":"ra","mfa":["ra_log_wal","init",1]}
{"time":"2025-01-30 16:28:44.092108+00:00","level":"info","msg":"ra_system_recover: ra system 'quorum_queues' server recovery strategy rabbit_quorum_queue:system_recover","line":56,"pid":"<0.237.0>","file":"src/ra_system_recover.erl","domain":"ra","mfa":["ra_system_recover","init",1]}
{"time":"2025-01-30 16:28:44.092213+00:00","level":"info","msg":"[rabbit_quorum_queue:system_recover/1] rabbit not booted, skipping queue recovery","pid":"<0.237.0>","domain":"rabbitmq"}
{"time":"2025-01-30 16:28:44.092393+00:00","level":"info","msg":"ra: starting system coordination","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:28:44.092427+00:00","level":"info","msg":"starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/coordination/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
{"time":"2025-01-30 16:28:44.093849+00:00","level":"info","msg":"ra system 'coordination' running pre init for 2 registered servers","line":43,"pid":"<0.243.0>","file":"src/ra_log_pre_init.erl","domain":"ra","mfa":["ra_log_pre_init","init",1]}
{"time":"2025-01-30 16:28:44.201796+00:00","level":"info","msg":"ra: meta data store initialised for system coordination. 2 record(s) recovered","line":58,"pid":"<0.244.0>","file":"src/ra_log_meta.erl","domain":"ra","mfa":["ra_log_meta","init",1]}
{"time":"2025-01-30 16:28:44.202122+00:00","level":"notice","msg":"WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables","line":255,"pid":"<0.249.0>","file":"src/ra_log_wal.erl","domain":"ra","mfa":["ra_log_wal","init",1]}
{"time":"2025-01-30 16:28:44.205195+00:00","level":"info","msg":"ra: starting system coordination","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:28:44.205264+00:00","level":"info","msg":"starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/coordination/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}

BOOT FAILED
===========
{"time":"2025-01-30 16:29:14.562976+00:00","level":"error","msg":"\nBOOT FAILED\n===========\nException during startup:\n\nexit:timeout_waiting_for_leader\n\n    rabbit_khepri:setup/1, line 278\n    rabbit:run_prelaunch_second_phase/0, line 396\n    rabbit:start/2, line 922\n    application_master:start_it_old/4, line 295\n","line":138,"pid":"<0.216.0>","file":"rabbit_prelaunch_errors.erl","domain":"rabbitmq.prelaunch","mfa":["rabbit_prelaunch_errors","log_message",1]}
Exception during startup:

exit:timeout_waiting_for_leader

    rabbit_khepri:setup/1, line 278
    rabbit:run_prelaunch_second_phase/0, line 396
    rabbit:start/2, line 922
    application_master:start_it_old/4, line 295

{{"time":"2025-01-30 16:29:15.563847+00:00","level":"notice","msg":"Application rabbit exited with reason: {timeout_waiting_for_leader,{rabbit,start,[normal,[]]}}","line":2125,"pid":"<0.45.0>","file":"application_controller.erl","domain":"otp","mfa":["application_controller","info_exited",3]}
exit,terminating,[{application_controller,call,2,[{file,"application_controller.erl"},{line,511}]},{application,'-ensure_all_started/3-lc$^0/1-0-',1,[{file,"application.erl"},{line,367}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,367}]},{rabbit,'-start_it/1-fun-0-',1,[{file,"rabbit.erl"},{line,430}]},{timer,tc,2,[{file,"timer.erl"},{line,595}]},{rabbit,start_it,1,[{file,"rabbit.erl"},{line,426}]},{init,start_it,1,[]},{init,start_em,1,[]}]}
Runtime terminating during boot (terminating)

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
[os_mon] memory supervisor port (memsup): Erlang has closed

Logs from node 3 (if applicable, with sensitive values edited out)

'/etc/rabbitmq/rabbitmq.conf' -> '/var/lib/rabbitmq/rabbitmq.conf'
'/etc/rabbitmq/advanced.config' -> '/var/lib/rabbitmq/advanced.config'
2025-01-30 16:31:26.028665+00:00 [warning] <0.153.0> Overriding Erlang cookie using the value set in the environment
2025-01-30 16:31:29.947537+00:00 [notice] <0.45.0> Application syslog exited with reason: stopped
2025-01-30 16:31:29.947665+00:00 [notice] <0.216.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
{"time":"2025-01-30 16:31:29.948407+00:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","line":1549,"pid":"<0.216.0>","file":"rabbit_prelaunch_logging.erl","domain":"rabbitmq.prelaunch","mfa":["rabbit_prelaunch_logging","install_handlers",0]}
{"time":"2025-01-30 16:31:29.969380+00:00","level":"info","msg":"ra: starting system quorum_queues","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:31:29.969517+00:00","level":"info","msg":"starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/quorum/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
{"time":"2025-01-30 16:31:30.140609+00:00","level":"info","msg":"ra system 'quorum_queues' running pre init for 135 registered servers","line":43,"pid":"<0.229.0>","file":"src/ra_log_pre_init.erl","domain":"ra","mfa":["ra_log_pre_init","init",1]}
{"time":"2025-01-30 16:31:30.372045+00:00","level":"info","msg":"ra: meta data store initialised for system quorum_queues. 135 record(s) recovered","line":58,"pid":"<0.230.0>","file":"src/ra_log_meta.erl","domain":"ra","mfa":["ra_log_meta","init",1]}
{"time":"2025-01-30 16:31:30.391873+00:00","level":"notice","msg":"WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables","line":255,"pid":"<0.235.0>","file":"src/ra_log_wal.erl","domain":"ra","mfa":["ra_log_wal","init",1]}
{"time":"2025-01-30 16:31:30.450225+00:00","level":"info","msg":"ra_system_recover: ra system 'quorum_queues' server recovery strategy rabbit_quorum_queue:system_recover","line":56,"pid":"<0.237.0>","file":"src/ra_system_recover.erl","domain":"ra","mfa":["ra_system_recover","init",1]}
{"time":"2025-01-30 16:31:30.450378+00:00","level":"info","msg":"[rabbit_quorum_queue:system_recover/1] rabbit not booted, skipping queue recovery","pid":"<0.237.0>","domain":"rabbitmq"}
{"time":"2025-01-30 16:31:30.450646+00:00","level":"info","msg":"ra: starting system coordination","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:31:30.450683+00:00","level":"info","msg":"starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/coordination/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}
{"time":"2025-01-30 16:31:30.452225+00:00","level":"info","msg":"ra system 'coordination' running pre init for 2 registered servers","line":43,"pid":"<0.243.0>","file":"src/ra_log_pre_init.erl","domain":"ra","mfa":["ra_log_pre_init","init",1]}
{"time":"2025-01-30 16:31:30.582805+00:00","level":"info","msg":"ra: meta data store initialised for system coordination. 2 record(s) recovered","line":58,"pid":"<0.244.0>","file":"src/ra_log_meta.erl","domain":"ra","mfa":["ra_log_meta","init",1]}
{"time":"2025-01-30 16:31:30.583287+00:00","level":"notice","msg":"WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables","line":255,"pid":"<0.249.0>","file":"src/ra_log_wal.erl","domain":"ra","mfa":["ra_log_wal","init",1]}
{"time":"2025-01-30 16:31:30.586519+00:00","level":"info","msg":"ra: starting system coordination","line":72,"pid":"<0.216.0>","file":"src/ra_system.erl","domain":"ra","mfa":["ra_system","start",1]}
{"time":"2025-01-30 16:31:30.586602+00:00","level":"info","msg":"starting Ra system: coordination in directory: /var/lib/rabbitmq/mnesia/[email protected].<rabbit-namespace>.svc.cluster.local/coordination/[email protected].<rabbit-namespace>.svc.cluster.local","line":31,"pid":"<0.216.0>","file":"src/ra_systems_sup.erl","domain":"ra","mfa":["ra_systems_sup","start_system",1]}

BOOT FAILED
===========
Exception during startup:

{"time":"2025-01-30 16:32:01.049091+00:00","level":"error","msg":"\nBOOT FAILED\n===========\nException during startup:\n\nexit:timeout_waiting_for_leader\n\n    rabbit_khepri:setup/1, line 278\n    rabbit:run_prelaunch_second_phase/0, line 396\n    rabbit:start/2, line 922\n    application_master:start_it_old/4, line 295\n","line":138,"pid":"<0.216.0>","file":"rabbit_prelaunch_errors.erl","domain":"rabbitmq.prelaunch","mfa":["rabbit_prelaunch_errors","log_message",1]}
exit:timeout_waiting_for_leader

    rabbit_khepri:setup/1, line 278
    rabbit:run_prelaunch_second_phase/0, line 396
    rabbit:start/2, line 922
    application_master:start_it_old/4, line 295

{"time":"2025-01-30 16:32:02.049780+00:00","level":"notice","msg":"Application rabbit exited with reason: {timeout_waiting_for_leader,{rabbit,start,[normal,[]]}}","line":2125,"pid":"<0.45.0>","file":"application_controller.erl","domain":"otp","mfa":["application_controller","info_exited",3]}
{exit,terminating,[{application_controller,call,2,[{file,"application_controller.erl"},{line,511}]},{application,'-ensure_all_started/3-lc$^0/1-0-',1,[{file,"application.erl"},{line,367}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,367}]},{rabbit,'-start_it/1-fun-0-',1,[{file,"rabbit.erl"},{line,430}]},{timer,tc,2,[{file,"timer.erl"},{line,595}]},{rabbit,start_it,1,[{file,"rabbit.erl"},{line,426}]},{init,start_it,1,[]},{init,start_em,1,[]}]}
Runtime terminating during boot (terminating)

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed

rabbitmq.conf

loopback_users.guest = false
## SSL
listeners.ssl.1 = 5671 
ssl_options.cacertfile = /etc/rabbitmq-cluster-server-certs/ca_certificate.pem 
ssl_options.certfile   = /etc/rabbitmq-cluster-server-certs/server_certificate.pem 
ssl_options.keyfile    = /etc/rabbitmq-cluster-server-certs/server_key.pem 
ssl_options.verify     = verify_peer
ssl_options.fail_if_no_peer_cert = false

## Clustering
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.hostname_suffix = .rabbitmq-cluster.<rabbit-namespace>.svc.cluster.local
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
## queue master locator
queue_master_locator=min-masters

## Increase memory to higher as it's a dedicated node
vm_memory_high_watermark.relative = 0.5

total_memory_available_override_value = 6000MB

disk_free_limit.relative = 1.5

log.console = true
log.console.formatter = json

# Import definitions from file at node boot time
load_definitions = /etc/rabbitmq/definitions.json

# Replaces the plugin (https://github.com/rabbitmq/rabbitmq-server/pull/8188)
message_interceptors.incoming.set_header_timestamp.overwrite = true

# Default new class queues to v2
classic_queue.default_version = 2

<redacted>

Steps to deploy RabbitMQ cluster

Deployed via a legacy Openshift template - this deploys a StatefulSet to k8s.

Steps to reproduce the behavior in question

Turn off all nodes in the cluster. Start with node 2, then node 1, then node 0. Try and turn them back on again, starting with node 0. Node 0 will not start.

advanced.config

[
  {
    rabbit,
    [
      {auth_mechanisms, ['PLAIN', 'EXTERNAL']},
        
      {ssl_cert_login_from, common_name}
    ]
  },

%% ----------------------------------------------------------------------------
%% RabbitMQ LDAP Plugin
%%
%% Related doc guide: https://www.rabbitmq.com/ldap.html .
%%
%% ----------------------------------------------------------------------------

  {rabbitmq_auth_backend_ldap,
    [%%
      %%
      %% Authentication
      %% ==============
      %%

      %% Pattern to convert the username given through AMQP to a DN before
      %% binding
      %%
      %%{user_dn_pattern, "cn=*${username}*,ou=XPuser,ou=OU,dc=vt,dc=ch"},
      %% Alternatively, you can convert a username to a Distinguished
      %% Name via an LDAP lookup after binding. See the documentation for
      %% full details.

      %% When converting a username to a dn via a lookup, set these to
      %% the name of the attribute that represents the user name, and the
      %% base DN for the lookup query.
      %%
<redacted>
      %% Controls how to bind for authorisation queries and also to
      %% retrieve the details of users logging in without presenting a
      %% password (e.g., SASL EXTERNAL).
      %% One of
      %%  - as_user (to bind as the authenticated user - requires a password)
      %%  - anon    (to bind anonymously)
      %%  - {UserDN, Password} (to bind with a specified user name and password)
      %%
      %% Defaults to 'as_user'.
      %%
      %% {other_bind, as_user},

      %%
      %% Authorisation
      %% =============
      %%

<redacted>

      %% Lager controls logging.
      %% See https://github.com/basho/lager  for more documentation
      {lager, [
      %%
      %% Log directory, taken from the RABBITMQ_LOG_BASE env variable by default.
      %% {log_root, "/var/log/rabbitmq"},
      %%
      %% All log messages go to the default "sink" configured with
      %% the `handlers` parameter. By default, it has a single
      %% lager_file_backend handler writing messages to "$nodename.log"
      %% (ie. the value of $RABBIT_LOGS).
      %% {handlers, [
      %%   {lager_file_backend, [{file, "rabbit.log"},
      %%                         {level, info},
      %%                         {date, ""},
      %%                         {size, 0}]}
      %% ]},
      %%
      %% Extra sinks are used in RabbitMQ to categorize messages. By
      %% default, those extra sinks are configured to forward messages
      %% to the default sink (see above). "rabbit_log_lager_event"
      %% is the default category where all RabbitMQ messages without
      %% a category go. Messages in the "channel" category go to the
      %% "rabbit_channel_lager_event" Lager extra sink, and so on.
      %% {extra_sinks, [
      %%   {rabbit_log_lager_event, [{handlers, [
      %%                               {lager_forwarder_backend,
      %%                                [lager_event, info]}]}]},
      %%   {rabbit_channel_lager_event, [{handlers, [
      %%                                   {lager_forwarder_backend,
      %%                                    [lager_event, info]}]}]},
      %%   {rabbit_connection_lager_event, [{handlers, [
      %%                                     {lager_forwarder_backend,
      %%                                      [lager_event, info]}]}]},
      %%   {rabbit_mirroring_lager_event, [{handlers, [
      %%                                     {lager_forwarder_backend,
      %%                                      [lager_event, info]}]}]}
      %% ]}
    ]
  }
].

Kubernetes deployment file

kind: StatefulSet
apiVersion: apps/v1
metadata:
  name: rabbitmq-cluster
  namespace: <rabbitmq-namespace>
  labels:
    app: rabbitmq-cluster
spec:
  serviceName: rabbitmq-cluster
  revisionHistoryLimit: 10
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  volumeClaimTemplates:
    - kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
        name: rabbitmq-storage
        creationTimestamp: null
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: san-storage
        volumeMode: Filesystem
      status:
        phase: Pending
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rabbitmq-cluster
    spec:
      restartPolicy: Always
      serviceAccountName: rabbitmq-discovery
      schedulerName: default-scheduler
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - rabbitmq-cluster
                topologyKey: datacenter
      terminationGracePeriodSeconds: 30
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '1'
              memory: 6000Mi
            requests:
              cpu: '1'
              memory: 6000Mi
          readinessProbe:
            exec:
              command:
                - rabbitmq-diagnostics
                - ping
            initialDelaySeconds: 60
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 30
          terminationMessagePath: /dev/termination-log
          name: rabbitmq
          command:
            - sh
          env:
            - name: RABBITMQ_DEFAULT_USER
              valueFrom:
                secretKeyRef:
                  name: rabbitmq-cluster-secret
                  key: username
            - name: RABBITMQ_DEFAULT_PASS
              valueFrom:
                secretKeyRef:
                  name: rabbitmq-cluster-secret
                  key: password
            - name: RABBITMQ_ERLANG_COOKIE
              valueFrom:
                secretKeyRef:
                  name: rabbitmq-cluster-secret
                  key: cookie
            - name: K8S_SERVICE_NAME
              value: rabbitmq-cluster
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: RABBITMQ_USE_LONGNAME
              value: 'true'
            - name: RABBITMQ_NODENAME
              value: rabbit@$(POD_NAME).rabbitmq-cluster.$(POD_NAMESPACE).svc.cluster.local
            - name: RABBITMQ_CONFIG_FILE
              value: /var/lib/rabbitmq/rabbitmq
            - name: RABBITMQ_ADVANCED_CONFIG_FILE
              value: /var/lib/rabbitmq/advanced.config
          ports:
            - name: http
              containerPort: 15672
              protocol: TCP
            - name: amqp
              containerPort: 5672
              protocol: TCP
            - name: amqptls
              containerPort: 5671
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: config-volume
              mountPath: /etc/rabbitmq
            - name: rabbitmq-storage
              mountPath: /var/lib/rabbitmq
            - name: rabbitmq-cluster-server-certs-volume
              readOnly: true
              mountPath: /etc/rabbitmq-cluster-server-certs
          terminationMessagePolicy: File
          image: '<rabbitmq-management:4.0.5 but with a SSL cert added>'
          args:
            - '-c'
            - 'chmod 400 /var/lib/rabbitmq/.erlang.cookie; cp -v /etc/rabbitmq/rabbitmq.conf ${RABBITMQ_CONFIG_FILE}.conf; cp -v /etc/rabbitmq/advanced.config ${RABBITMQ_ADVANCED_CONFIG_FILE}; exec docker-entrypoint.sh rabbitmq-server'
      serviceAccount: rabbitmq-discovery
      volumes:
        - name: config-volume
          configMap:
            name: rabbitmq-cluster-config
            items:
              - key: rabbitmq.conf
                path: rabbitmq.conf
              - key: advanced.config
                path: advanced.config
              - key: definitions.json
                path: definitions.json
              - key: enabled_plugins
                path: enabled_plugins
            defaultMode: 420
        - name: rabbitmq-cluster-server-certs-volume
          secret:
            secretName: rabbitmq-cluster-server-certs
            defaultMode: 420
      dnsPolicy: ClusterFirst
  podManagementPolicy: Parallel
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: rabbitmq-cluster

What problem are you trying to solve?

After turning the nodes off and on again, it no longer starts up. I'd quite like it to startup again! We have the khepri database enabled.
This is not a production cluster but I'm running the latest version of RabbitMQ to try to help weed out bugs.

I are aware that this Openshift stateful set is quite basic and would quite like to replace it with the operator - if the answer is "well, use the operator, bugs are fixed there" that's an option for me.

Originally, the cluster was with a podManagementPolicy: OrderedReady.
So far I've:

Replaced the statefulset with a podManagementPolicy Parallel
Tried "rabbitmqctl force_boot" on the node

Answered by michaelklishin

Jan 30, 2025

@evolvedlight this is documented in not one but two places:

I don't have much to add. Nodes await their previously known peers before their continue booting,
the only exception is the last node to stop which remembers that there were no online peers and proceeds to boot. With default settings this must happen within 5 minutes (10 retries with a 30 second delay each), or nodes will voluntarily stop.

We will not troubleshoot your OpenShift cluster for you. If you have to DIY on Kubernetes, that's entirely on you. Our team has put in a lot of effort into the Operator and the docs.

View full answer

michaelklishin · 2025-01-30T17:23:11Z

michaelklishin
Jan 30, 2025
Maintainer

@evolvedlight this is documented in not one but two places:

I don't have much to add. Nodes await their previously known peers before their continue booting,
the only exception is the last node to stop which remembers that there were no online peers and proceeds to boot. With default settings this must happen within 5 minutes (10 retries with a 30 second delay each), or nodes will voluntarily stop.

We will not troubleshoot your OpenShift cluster for you. If you have to DIY on Kubernetes, that's entirely on you. Our team has put in a lot of effort into the Operator and the docs.

5 replies

evolvedlight Jan 30, 2025
Author

Thanks - however I read the docs lots of times and didn't see any way of recovering this.
The timeout appears to be about 30 seconds, not 5 minutes, and we haven't set that.

What has worked is the (not yet documented) command rabbitmqctl force_standalone_khepri_boot on one node - however it still wasn't possible to get the other nodes to work without resetting their data directories.

michaelklishin Jan 30, 2025
Maintainer

If you end up without a leader during the first part of your test, or node identities change, Khepri, like any Raft-based feature will become unavailable until either the majority of previously known nodes come online, or Raft cluster membership is shrunk to one node and then more replicas are re-added. This is a solved problem in 4.0.x for quorum queues (that is, this is a disaster recovery technique known to be used).

Khepri, quorum queues, streams and much of RabbitMQ development in the last N years
first and foremost focusses on rolling restarts that maintain an online majority of replicas.
I would not be surprised if there were next to no Khepri tests that stop the entire cluster.

michaelklishin Jan 30, 2025
Maintainer

@evolvedlight if the Khepri recovery limit is indeed just 30 seconds, this mismatch with the rest of RabbitMQ (and Mnesia) can be the cause of the confusion. I have brought this up with the team, thank you.

evolvedlight Jan 30, 2025
Author

Thanks - makes sense - I will go away and do some homework on what the current rabbitmq raft behaviour is (in this silly scenario of stopping the world). Then at least I can post here with the behaviour, if it's helpful at all.

michaelklishin Jan 30, 2025
Maintainer

@evolvedlight you did correctly identify a problematic (in this case anyway) default in Khepri/Ra. It does indeed use a 10 times lower timeout than RabbitMQ with Mnesia. We will see what's the best way to bring it to the five minute standard.

Thank you, that's a non-code contribution to RabbitMQ in my books, so when a relevant change ships we will credit you in the release notes.

michaelklishin · 2025-01-30T17:26:30Z

michaelklishin
Jan 30, 2025
Maintainer

If you suspect Khepri specifically: we are not aware of any scenarios where Khepri would not form the cluster after a cluster-wide restart.

Khepri is based on the same Raft library as quorum queues and streams, which has been battle tested during the last seven years.

Without logs from all nodes, we will not guess as to what may be going on in this cluster. The burden of proof is on the users of free open source software.

1 reply

evolvedlight Jan 30, 2025
Author

I gave all the nodes - there are three and all of them have the same characteristics. I was suspecting Khepri as I was using the same old technique as used for years with mnesia - shut down each node one by one, and start them up in order starting from the last one that was turned off. But I agree, the burden of proof to give even better repo instructions is on me, thanks

michaelklishin · 2025-01-30T17:27:49Z

michaelklishin
Jan 30, 2025
Maintainer

Turn off all nodes in the cluster. Try and turn them back on again

I'm afraid these are not the steps to reproduce. "Stop all cluster nodes" is specific enough but we won't guess how exactly you are "turning them back on again". It matters a great deal if you boot all nodes at once or one by one, as described in the docs in my first response.

2 replies

evolvedlight Jan 30, 2025
Author

Yup, apologies for that. If I figure out full enough repro instructions then I will add them here but I understand that this is not something you want to encourage

lukebakken Jan 30, 2025
Maintainer

Turn off all nodes in the cluster

What does "turn off" mean? I'm assuming you're doing the equivalent of a simultaneous power loss by killing your k8s containers at the same time.

This is a very important detail that must be precisely reported. At the very least, tell us how you're turning off your cluster!

evolvedlight · 2025-01-30T17:35:29Z

evolvedlight
Jan 30, 2025
Author

The solution I used (this does result in data loss):

On node 0, run: rabbitmqctl force_standalone_khepri_boot
On node 1 and 2: cd to the data directory and delete everything. Then reboot the node.

Everything will work again, definitions will all sync from node 0.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Khepri on K8S: Khepri gives up on trying to form a cluster (timeout_waiting_for_leader) after a cluster-wide shutdown and an attempt at an ordered restart #13182

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Khepri on K8S: Khepri gives up on trying to form a cluster (timeout_waiting_for_leader) after a cluster-wide shutdown and an attempt at an ordered restart #13182

evolvedlight Jan 30, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Kubernetes deployment file

What problem are you trying to solve?

Replies: 4 comments · 8 replies

michaelklishin Jan 30, 2025 Maintainer

evolvedlight Jan 30, 2025 Author

michaelklishin Jan 30, 2025 Maintainer

michaelklishin Jan 30, 2025 Maintainer

evolvedlight Jan 30, 2025 Author

michaelklishin Jan 30, 2025 Maintainer

michaelklishin Jan 30, 2025 Maintainer

evolvedlight Jan 30, 2025 Author

michaelklishin Jan 30, 2025 Maintainer

evolvedlight Jan 30, 2025 Author

lukebakken Jan 30, 2025 Maintainer

evolvedlight Jan 30, 2025 Author

evolvedlight
Jan 30, 2025

Replies: 4 comments 8 replies

michaelklishin
Jan 30, 2025
Maintainer

evolvedlight Jan 30, 2025
Author

michaelklishin Jan 30, 2025
Maintainer

michaelklishin Jan 30, 2025
Maintainer

evolvedlight Jan 30, 2025
Author

michaelklishin Jan 30, 2025
Maintainer

michaelklishin
Jan 30, 2025
Maintainer

evolvedlight Jan 30, 2025
Author

michaelklishin
Jan 30, 2025
Maintainer

evolvedlight Jan 30, 2025
Author

lukebakken Jan 30, 2025
Maintainer

evolvedlight
Jan 30, 2025
Author