Helm install of CockroachDB on Digital Ocean fails #109995

chokosabe · 2023-09-04T21:08:44Z

Helm install of CockroachDB on Digital Ocean fails

Tried installing Cockroach DB on a digital ocean kubernetes cluster using the helm package included on Rancher. Main change is to use the Digital Ocean storage class StorageClass: 'do-block-storage'.

To Reproduce

helm install cockroachdb on digital ocean

Additional data / screenshots

kubectl describe pods cockroachdb-0 -n cockroachdb

Name: cockroachdb-0
Namespace: cockroachdb
Priority: 0
Service Account: cockroachdb
Node: staging-yy92h/10.106.0.4
Start Time: Mon, 04 Sep 2023 21:57:32 +0100
Labels: app.kubernetes.io/component=cockroachdb
app.kubernetes.io/instance=cockroachdb
app.kubernetes.io/name=cockroachdb
controller-revision-hash=cockroachdb-695ff69b67
statefulset.kubernetes.io/pod-name=cockroachdb-0
Annotations:
Status: Running
IP: 10.244.0.93
IPs:
IP: 10.244.0.93
Controlled By: StatefulSet/cockroachdb
Init Containers:
copy-certs:
Container ID: containerd://811423a6ff8a550b20b9d9991ad7e9fb9f52bebc99a47d85dba0862150de7866
Image: busybox
Image ID: docker.io/library/busybox@sha256:3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79
Port:
Host Port:
Command:
/bin/sh
-c
cp -f /certs/* /cockroach-certs/; chmod 0400 /cockroach-certs/*.key
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Sep 2023 21:57:39 +0100
Finished: Mon, 04 Sep 2023 21:57:39 +0100
Ready: True
Restart Count: 0
Environment:
POD_NAMESPACE: cockroachdb (v1:metadata.namespace)
Mounts:
/certs/ from certs-secret (rw)
/cockroach-certs/ from certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro)
Containers:
db:
Container ID: containerd://a248855282c32c2e6aaa39b871d1bf5b27c8f9a50e10218bb6cfb31200f0bd43
Image: cockroachdb/cockroach:v23.1.8
Image ID: docker.io/cockroachdb/cockroach@sha256:c02c58d9c6c1ed623369f7b5890ed81f623b50dedd4d1800472016f4b07b9c80
Ports: 26257/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP
Args:
shell
-ecx
exec /cockroach/cockroach start --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --logtostderr=INFO
State: Running
Started: Mon, 04 Sep 2023 21:57:40 +0100
Ready: False
Restart Count: 0
Liveness: http-get https://:http/health delay=30s timeout=1s period=5s #success=1 #failure=3
Readiness: http-get https://:http/health%3Fready=1 delay=10s timeout=1s period=5s #success=1 #failure=2
Environment:
STATEFULSET_NAME: cockroachdb
STATEFULSET_FQDN: cockroachdb.cockroachdb.svc.cluster.local
COCKROACH_CHANNEL: kubernetes-helm
Mounts:
/cockroach/cockroach-certs/ from certs (rw)
/cockroach/cockroach-data/ from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d4c6b (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-cockroachdb-0
ReadOnly: false
certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:
certs-secret:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: cockroachdb-node-secret
SecretOptionalName:
kube-api-access-d4c6b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=cockroachdb,app.kubernetes.io/instance=cockroachdb,app.kubernetes.io/name=cockroachdb
Events:
Type Reason Age From Message

Warning FailedScheduling 8m46s default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Normal Scheduled 8m44s default-scheduler Successfully assigned cockroachdb/cockroachdb-0 to staging-yy92h
Normal SuccessfulAttachVolume 8m39s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-78bbba7e-5a3b-43a3-81a8-6e6a2691c826"
Normal Pulled 8m38s kubelet Container image "busybox" already present on machine
Normal Created 8m38s kubelet Created container copy-certs
Normal Started 8m37s kubelet Started container copy-certs
Normal Pulled 8m37s kubelet Container image "cockroachdb/cockroach:v23.1.8" already present on machine
Normal Created 8m37s kubelet Created container db
Normal Started 8m36s kubelet Started container db
Warning Unhealthy 3m33s (x63 over 8m23s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503

LOGS:

kubectl logs cockroachdb-0 --all-containers=true -n cockroachdb

I230904 21:07:54.549571 32 server/init.go:421 ⋮ [T1,n?] 973 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 ‹[core]›‹[Channel #1849 SubChannel #1850] grpc: addrConn.createTransport failed to connect to {›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Attributes": null,›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "BalancerAttributes": null,›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Type": 0,›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹ "Metadata": null›
W230904 21:07:55.528823 7561 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 974 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
W230904 21:07:55.529085 32 server/init.go:423 ⋮ [T1,n?] 975 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
I230904 21:07:56.539170 32 server/init.go:421 ⋮ [T1,n?] 976 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 ‹[core]›‹[Channel #1855 SubChannel #1856] grpc: addrConn.createTransport failed to connect to {›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Addr": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "ServerName": "cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257",›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Attributes": null,›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "BalancerAttributes": null,›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Type": 0,›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹ "Metadata": null›
W230904 21:07:57.527923 7568 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 977 +‹}. Err: connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
W230904 21:07:57.528165 32 server/init.go:423 ⋮ [T1,n?] 978 outgoing join rpc to ‹cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp: lookup cockroachdb-2.cockroachdb.cockroachdb.svc.cluster.local: no such host"›
I230904 21:07:58.538910 32 server/init.go:421 ⋮ [T1,n?] 979 ‹cockroachdb-1.cockroachdb.cockroachdb.svc.cluster.local:26257› is itself waiting for init, will retry

Jira issue: CRDB-31208

blathers-crl · 2023-09-04T21:08:48Z

Hello, I am Blathers. I am here to help you get the issue triaged.

It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.

I have CC'd a few people who may be able to assist you:

@cockroachdb/kv (found keywords: Liveness)
@bdarnell (author of When cleaning cache on circleci, only delete files. #1849, commented on Default test timeout to 70 seconds #1856)
@tbg (author of Test failure in CI build 5137 #1855)
@BramGruneir (assigned to Test failure in CI build 5137 #1855, commented on Default test timeout to 70 seconds #1856)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

chokosabe added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 4, 2023

blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm install of CockroachDB on Digital Ocean fails #109995

Helm install of CockroachDB on Digital Ocean fails #109995

chokosabe commented Sep 4, 2023 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Sep 4, 2023

Helm install of CockroachDB on Digital Ocean fails #109995

Helm install of CockroachDB on Digital Ocean fails #109995

Comments

chokosabe commented Sep 4, 2023 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Sep 4, 2023

chokosabe commented Sep 4, 2023 •

edited by cockroach-jira-scripts

Loading