[Questions] [Custom OCI image] Node reports multiple schema database tables as missing after a restart #13151
-
Community Support Policy
RabbitMQ version used4.0.5 Erlang version used27.2.x Operating system (distribution) usedRocky Linux 8.10 How is RabbitMQ deployed?Community Docker image rabbitmq-diagnostics status outputSee https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
Logs from node 1 (with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
rabbitmq.confSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location
Steps to deploy RabbitMQ clusterN/a; we use a single standalone services:
rabbitmq:
image: <company>-rabbitmq # in-house image that just copies in the rabbitmq.conf and sets the env var to use it
restart: unless-stopped
hostname: "rabbitmq.node-1"
ports:
- "5672:5672"
- "15672:15672"
env_file:
- mca.env
- rabbitmq.env
volumes:
- rabbitmq4:/var/lib/rabbitmq Steps to reproduce the behavior in question
Purpose: to bring down our application for redeployment, and to restart it with new docker images used for other services. advanced.configSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application code# PASTE CODE HERE, BETWEEN BACKTICKS Kubernetes deployment file# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS What problem are you trying to solve?When rabbitmq restarts, it gets into a state where it boots, and is running, but no connections can be accepted due to a vhost-related issue. Manually restarting rabbitmq a SECOND time fixes the issue. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
@PaarthShah we cannot suggest anything with a single log line. |
Beta Was this translation helpful? Give feedback.
-
Multiple error messages in the log suggest that several internal metadata store (Mnesia in this case) tables are missing. This is not at all common to see, most likely something is wrong with this installation's node data directory, such as directory permissions or something like that, therefore the schema data store could not perform its usual initialization and default data seeding. Unless you can provide clear evidence of a problem in RabbitMQ itself, all Docker image questions should be directed to the respective image repository's Discussions.
In-house images are yours to troubleshoot. Don't expect the community to do it for you. |
Beta Was this translation helpful? Give feedback.
-
#3837 looks distantly similar, and there you can see evidence of a node that hasn't synced its data from peers yet but already has a client trying to connect and perform operations. This can be a manifestation of this long documented behavior with node restarts that usually affects Kubernetes but in general can affect any environment where the tool that is responsible for stopping/restarting nodes assumes that nodes do not depend on each other when they are restarted (which is not the case, see the doc guides linked to earlier). Therefore a specific node does not have its metadata store tables yet, e.g. because its waiting for its last known peer to come online, the peer does not do it because of how the deployment tool operates, and clients that connect won't be able to perform any operation on said node, and won't most CLI commands that usually need a running metadata store. Both the problematic sequence of events and the recommended solution (besides "use our cluster Operator") for Kubernetes has long been documented. What's the best option for |
Beta Was this translation helpful? Give feedback.
#3837 looks distantly similar, and there you can see evidence of a node that hasn't synced its data from peers yet but already has a client trying to connect and perform operations.
This can be a manifestation of this long documented behavior with node restarts that usually affects Kubernetes but in general can affect any environment where the tool that is responsible for stopping/restarting nodes assumes that nodes do not depend on each other when they are restarted (which is not the case, see the doc guides linked to earlier).
Therefore a specific node does not have its metadata store tables yet, e.g. because its waiting for its last known peer to come online, the peer does not do it be…