Khepri on K8S: Khepri gives up on trying to form a cluster (timeout_waiting_for_leader) after a cluster-wide shutdown and an attempt at an ordered restart #13182
-
Community Support Policy
RabbitMQ version used4.0.5 Erlang version used27.2.x Operating system (distribution) usedOpenshift How is RabbitMQ deployed?Community Docker image Logs from node 1 (with sensitive values edited out)
Logs from node 2 (if applicable, with sensitive values edited out)
Logs from node 3 (if applicable, with sensitive values edited out)
rabbitmq.conf
Steps to deploy RabbitMQ clusterDeployed via a legacy Openshift template - this deploys a StatefulSet to k8s. Steps to reproduce the behavior in questionTurn off all nodes in the cluster. Start with node 2, then node 1, then node 0. Try and turn them back on again, starting with node 0. Node 0 will not start. advanced.config
Kubernetes deployment filekind: StatefulSet
apiVersion: apps/v1
metadata:
name: rabbitmq-cluster
namespace: <rabbitmq-namespace>
labels:
app: rabbitmq-cluster
spec:
serviceName: rabbitmq-cluster
revisionHistoryLimit: 10
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
volumeClaimTemplates:
- kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: rabbitmq-storage
creationTimestamp: null
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: san-storage
volumeMode: Filesystem
status:
phase: Pending
template:
metadata:
creationTimestamp: null
labels:
app: rabbitmq-cluster
spec:
restartPolicy: Always
serviceAccountName: rabbitmq-discovery
schedulerName: default-scheduler
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rabbitmq-cluster
topologyKey: datacenter
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources:
limits:
cpu: '1'
memory: 6000Mi
requests:
cpu: '1'
memory: 6000Mi
readinessProbe:
exec:
command:
- rabbitmq-diagnostics
- ping
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 30
terminationMessagePath: /dev/termination-log
name: rabbitmq
command:
- sh
env:
- name: RABBITMQ_DEFAULT_USER
valueFrom:
secretKeyRef:
name: rabbitmq-cluster-secret
key: username
- name: RABBITMQ_DEFAULT_PASS
valueFrom:
secretKeyRef:
name: rabbitmq-cluster-secret
key: password
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-cluster-secret
key: cookie
- name: K8S_SERVICE_NAME
value: rabbitmq-cluster
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: RABBITMQ_USE_LONGNAME
value: 'true'
- name: RABBITMQ_NODENAME
value: rabbit@$(POD_NAME).rabbitmq-cluster.$(POD_NAMESPACE).svc.cluster.local
- name: RABBITMQ_CONFIG_FILE
value: /var/lib/rabbitmq/rabbitmq
- name: RABBITMQ_ADVANCED_CONFIG_FILE
value: /var/lib/rabbitmq/advanced.config
ports:
- name: http
containerPort: 15672
protocol: TCP
- name: amqp
containerPort: 5672
protocol: TCP
- name: amqptls
containerPort: 5671
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: rabbitmq-storage
mountPath: /var/lib/rabbitmq
- name: rabbitmq-cluster-server-certs-volume
readOnly: true
mountPath: /etc/rabbitmq-cluster-server-certs
terminationMessagePolicy: File
image: '<rabbitmq-management:4.0.5 but with a SSL cert added>'
args:
- '-c'
- 'chmod 400 /var/lib/rabbitmq/.erlang.cookie; cp -v /etc/rabbitmq/rabbitmq.conf ${RABBITMQ_CONFIG_FILE}.conf; cp -v /etc/rabbitmq/advanced.config ${RABBITMQ_ADVANCED_CONFIG_FILE}; exec docker-entrypoint.sh rabbitmq-server'
serviceAccount: rabbitmq-discovery
volumes:
- name: config-volume
configMap:
name: rabbitmq-cluster-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: advanced.config
path: advanced.config
- key: definitions.json
path: definitions.json
- key: enabled_plugins
path: enabled_plugins
defaultMode: 420
- name: rabbitmq-cluster-server-certs-volume
secret:
secretName: rabbitmq-cluster-server-certs
defaultMode: 420
dnsPolicy: ClusterFirst
podManagementPolicy: Parallel
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: rabbitmq-cluster What problem are you trying to solve?After turning the nodes off and on again, it no longer starts up. I'd quite like it to startup again! We have the khepri database enabled. I are aware that this Openshift stateful set is quite basic and would quite like to replace it with the operator - if the answer is "well, use the operator, bugs are fixed there" that's an option for me. Originally, the cluster was with a podManagementPolicy: OrderedReady.
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 8 replies
-
@evolvedlight this is documented in not one but two places: I don't have much to add. Nodes await their previously known peers before their continue booting, We will not troubleshoot your OpenShift cluster for you. If you have to DIY on Kubernetes, that's entirely on you. Our team has put in a lot of effort into the Operator and the docs. |
Beta Was this translation helpful? Give feedback.
-
If you suspect Khepri specifically: we are not aware of any scenarios where Khepri would not form the cluster after a cluster-wide restart. Khepri is based on the same Raft library as quorum queues and streams, which has been battle tested during the last seven years. Without logs from all nodes, we will not guess as to what may be going on in this cluster. The burden of proof is on the users of free open source software. |
Beta Was this translation helpful? Give feedback.
-
I'm afraid these are not the steps to reproduce. "Stop all cluster nodes" is specific enough but we won't guess how exactly you are "turning them back on again". It matters a great deal if you boot all nodes at once or one by one, as described in the docs in my first response. |
Beta Was this translation helpful? Give feedback.
-
The solution I used (this does result in data loss): On node 0, run: Everything will work again, definitions will all sync from node 0. |
Beta Was this translation helpful? Give feedback.
@evolvedlight this is documented in not one but two places:
I don't have much to add. Nodes await their previously known peers before their continue booting,
the only exception is the last node to stop which remembers that there were no online peers and proceeds to boot. With default settings this must happen within 5 minutes (10 retries with a 30 second delay each), or nodes will voluntarily stop.
We will not troubleshoot your OpenShift cluster for you. If you have to DIY on Kubernetes, that's entirely on you. Our team has put in a lot of effort into the Operator and the docs.