Helm install of CockroachDB on Digital Ocean fails #109995
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-community
Originated from the community
X-blathers-triaged
blathers was able to find an owner
Helm install of CockroachDB on Digital Ocean fails
Tried installing Cockroach DB on a digital ocean kubernetes cluster using the helm package included on Rancher. Main change is to use the Digital Ocean storage class StorageClass: 'do-block-storage'.
To Reproduce
helm install cockroachdb on digital ocean
Additional data / screenshots
kubectl describe pods cockroachdb-0 -n cockroachdb
Name: cockroachdb-0
Namespace: cockroachdb
Priority: 0
Service Account: cockroachdb
Node: staging-yy92h/10.106.0.4
Start Time: Mon, 04 Sep 2023 21:57:32 +0100
Labels: app.kubernetes.io/component=cockroachdb
app.kubernetes.io/instance=cockroachdb
app.kubernetes.io/name=cockroachdb
controller-revision-hash=cockroachdb-695ff69b67
statefulset.kubernetes.io/pod-name=cockroachdb-0
Annotations:
Status: Running
IP: 10.244.0.93
IPs:
IP: 10.244.0.93
Controlled By: StatefulSet/cockroachdb
Init Containers:
copy-certs:
Container ID: containerd://811423a6ff8a550b20b9d9991ad7e9fb9f52bebc99a47d85dba0862150de7866
Image: busybox
Image ID: docker.io/library/busybox@sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79
Port:
Host Port:
Command:
/bin/sh
-c
cp -f /certs/* /cockroach-certs/; chmod 0400 /cockroach-certs/*.key
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Sep 2023 21:57:39 +0100
Finished: Mon, 04 Sep 2023 21:57:39 +0100
Ready: True
Restart Count: 0
Environment:
POD_NAMESPACE: cockroachdb (v1:metadata.namespace)
Mounts:
/certs/ from certs-secret (rw)
/cockroach-certs/ from certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro)
Containers:
db:
Container ID: containerd://a248855282c32c2e6aaa39b871d1bf5b27c8f9a50e10218bb6cfb31200f0bd43
Image: cockroachdb/cockroach:v23.1.8
Image ID: docker.io/cockroachdb/cockroach@sha256:c02c58d9c6c1ed623369f7b5890ed81f623b50dedd4d1800472016f4b07b9c80
Ports: 26257/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
shell
-ecx
exec /cockroach/cockroach start --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --logtostderr=INFO
State: Running
Started: Mon, 04 Sep 2023 21:57:40 +0100
Ready: False
Restart Count: 0
Liveness: http-get https://:http/health delay=30s timeout=1s period=5s #success=1 #failure=3
Readiness: http-get https://:http/health%3Fready=1 delay=10s timeout=1s period=5s #success=1 #failure=2
Environment:
STATEFULSET_NAME: cockroachdb
STATEFULSET_FQDN: cockroachdb.cockroachdb.svc.cluster.local
COCKROACH_CHANNEL: kubernetes-helm
Mounts:
/cockroach/cockroach-certs/ from certs (rw)
/cockroach/cockroach-data/ from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-cockroachdb-0
ReadOnly: false
certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
certs-secret:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: cockroachdb-node-secret
SecretOptionalName:
kube-api-access-d4c6b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=cockroachdb,app.kubernetes.io/instance=cockroachdb,app.kubernetes.io/name=cockroachdb
Events:
Type Reason Age From Message
Warning FailedScheduling 8m46s default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Normal Scheduled 8m44s default-scheduler Successfully assigned cockroachdb/cockroachdb-0 to staging-yy92h
Normal SuccessfulAttachVolume 8m39s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-78bbba7e-5a3b-43a3-81a8-6e6a2691c826"
Normal Pulled 8m38s kubelet Container image "busybox" already present on machine
Normal Created 8m38s kubelet Created container copy-certs
Normal Started 8m37s kubelet Started container copy-certs
Normal Pulled 8m37s kubelet Container image "cockroachdb/cockroach:v23.1.8" already present on machine
Normal Created 8m37s kubelet Created container db
Normal Started 8m36s kubelet Started container db
Warning Unhealthy 3m33s (x63 over 8m23s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
LOGS:
kubectl logs cockroachdb-0 --all-containers=true -n cockroachdb
I230904 21:07:54.549571 32 server/init.go:421 ⋮ [T1,n?] 973 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 ‹[core]›‹[Channel #1849 SubChannel #1850] grpc: addrConn.createTransport failed to connect to {›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Attributes": null,›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "BalancerAttributes": null,›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Type": 0,›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Metadata": null›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
W230904 21:07:55.529085 32 server/init.go:423 ⋮ [T1,n?] 975 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
I230904 21:07:56.539170 32 server/init.go:421 ⋮ [T1,n?] 976 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 ‹[core]›‹[Channel #1855 SubChannel #1856] grpc: addrConn.createTransport failed to connect to {›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Attributes": null,›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "BalancerAttributes": null,›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Type": 0,›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Metadata": null›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
W230904 21:07:57.528165 32 server/init.go:423 ⋮ [T1,n?] 978 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
I230904 21:07:58.538910 32 server/init.go:421 ⋮ [T1,n?] 979 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
Jira issue: CRDB-31208
The text was updated successfully, but these errors were encountered: