Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install Scylla fails with Init:InvalidImageName on pod/scylla-manager-manager-dc-manager-rack-0 #1617

Closed
jbolila opened this issue Dec 5, 2023 · 6 comments · Fixed by #1619
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@jbolila
Copy link

jbolila commented Dec 5, 2023

What happened?

I'm starting with Scylla and following the instructions in:

In both cases, fails with Init:InvalidImageName on scylla-manager-manager-dc-manager-rack-0

The other two pods of scylla-manager are running:

kubectl -n scylla-manager get pods -l "app.kubernetes.io/name=scylla-manager"
kubectl -n scylla-manager get pods -l "app.kubernetes.io/name=scylla-manager-controller"

After following the instructions bellow, these two pods end in the same state:

scylla-manager    scylla-manager-manager-dc-manager-rack-0     0/2     Init:InvalidImageName
scylla            scylla-us-east-1-us-east-1a-0                0/2     Init:InvalidImageName

What did you expect to happen?

Instructions provided on operator.docs.scylladb.com result in a working database.

How can we reproduce it (as minimally and precisely as possible)?

# Using Helm
git clone https://github.com/scylladb/scylla-operator.git && cd scylla-operator/

helm repo add scylla https://scylla-operator-charts.storage.googleapis.com/stable
helm repo update

kubectl apply -f examples/common/cert-manager.yaml 
kubectl wait --for condition=established crd/certificates.cert-manager.io crd/issuers.cert-manager.io
kubectl -n cert-manager rollout status deployment.apps/cert-manager-webhook

helm install scylla-operator scylla/scylla-operator --create-namespace --namespace scylla-operator
kubectl wait --for condition=established crd/scyllaclusters.scylla.scylladb.com
kubectl -n scylla-operator rollout status deployment.apps/scylla-operator

helm install scylla-manager scylla/scylla-manager --create-namespace --namespace scylla-manager
kubectl get pods -n scylla-manager
# NAME                                             READY   STATUS                  RESTARTS   AGE
# pod/scylla-manager-56557d5698-64lw8              0/1     Running                 0          30s
# pod/scylla-manager-controller-57f87c5b9c-6gx29   1/1     Running                 0          30s
# pod/scylla-manager-controller-57f87c5b9c-gtlr7   1/1     Running                 0          30s
# pod/scylla-manager-manager-dc-manager-rack-0     0/2     Init:InvalidImageName   0          30s    <---

helm install scylla scylla/scylla --create-namespace --namespace scylla

Scylla Operator version

master and v1.11.0

Kubernetes platform name and version

Platform: minikube
Minikube version: 1.32.0
Kubernetes version: 1.27.7
Scylla-operator version: master and v1.11.0

Please attach the must-gather archive.

This same issue is reported on #1441, in my test I'm using minikube and able to see the following details:

"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"sidecar-injection\" with InvalidImageName: \"Failed to apply default image tag \\\"docker-pullable://scylladb/scylla-operator@sha256:8c1e45dea1814b678cdd2f52029e09257803f467d2a26ebe58c5ee730b03d358\\\": couldn't parse image reference \\\"docker-pullable://scylladb/scylla-operator@sha256:8c1e45dea1814b678cdd2f52029e09257803f467d2a26ebe58c5ee730b03d358\\\": invalid reference format\"" pod="scylla/scylla-us-east-1-us-east-1a-0" podUID=d0a0fcd9-5eb1-48b5-b102-c19e7edf43ef

Anything else we need to know?

No response

@jbolila jbolila added the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2023
@scylla-operator-bot scylla-operator-bot bot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 5, 2023
@tnozicka
Copy link
Contributor

tnozicka commented Dec 5, 2023

Please attach the must-gather archive.

This same issue is reported on #1441, in my test I'm using minikube and able to see the following details:

Please attach must-gather archive - none of the info above actually shows what value is set on CRD and which one is on the pod

@tnozicka tnozicka added the triage/needs-information Indicates an issue needs more information in order to work on it. label Dec 5, 2023
@jbolila
Copy link
Author

jbolila commented Dec 5, 2023

Thank you for the prompt reply @tnozicka

scylla-operator-must-gather-mb6gz6bj8fvj.tar.gz

@Sammers21
Copy link

Sammers21 commented Dec 6, 2023

Having exactly the same issue when following the steps from https://operator.docs.scylladb.com/stable/helm.html. On the installa tion step after doing:

helm install scylla scylla/scylla --values examples/helm/values.cluster.yaml --create-namespace --namespace scylla

Getting Init:InvalidImageName:
image

Describe:

Name:             scylla-us-east-1-us-east-1b-0
Namespace:        scylla
Priority:         0
Service Account:  scylla-member
Node:             docker-desktop/192.168.65.3
Start Time:       Wed, 06 Dec 2023 07:42:06 +0300
Labels:           app=scylla
                  app.kubernetes.io/managed-by=scylla-operator
                  app.kubernetes.io/name=scylla
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=scylla-us-east-1-us-east-1b-64bb5499db
                  scylla/cluster=scylla
                  scylla/datacenter=us-east-1
                  scylla/rack=us-east-1b
                  scylla/rack-ordinal=0
                  scylla/scylla-version=5.2.11
                  statefulset.kubernetes.io/pod-name=scylla-us-east-1-us-east-1b-0
Annotations:      prometheus.io/port: 9180
                  prometheus.io/scrape: true
Status:           Pending
IP:               10.1.0.13
IPs:
  IP:           10.1.0.13
Controlled By:  StatefulSet/scylla-us-east-1-us-east-1b
Init Containers:
  sidecar-injection:
    Container ID:
    Image:         docker-pullable://scylladb/scylla-operator@sha256:418ed93e3c201bf3e9322b66afeb2ca215f0b821c08ee4dd09620703491da5dc
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      cp -a /usr/bin/scylla-operator /mnt/shared
    State:          Waiting
      Reason:       InvalidImageName
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     10m
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /mnt/shared from shared (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ljw4d (ro)
Containers:
  scylla:
    Container ID:
    Image:         scylladb/scylla:5.2.11
    Image ID:
    Ports:         7000/TCP, 7001/TCP, 7199/TCP, 9180/TCP, 9100/TCP, 9042/TCP, 9142/TCP, 9160/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /mnt/shared/scylla-operator
      sidecar
      --feature-gates=AllAlpha=false,AllBeta=false,AutomaticTLSCertificates=true
      --nodes-broadcast-address-type=ServiceClusterIP
      --clients-broadcast-address-type=ServiceClusterIP
      --service-name=$(SERVICE_NAME)
      --cpu-count=$(CPU_COUNT)
      --loglevel=2
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      1
      memory:   1Gi
    Liveness:   http-get http://:8080/healthz delay=0s timeout=10s period=10s #success=1 #failure=12
    Readiness:  http-get http://:8080/readyz delay=0s timeout=30s period=10s #success=1 #failure=1
    Startup:    http-get http://:8080/healthz delay=0s timeout=30s period=10s #success=1 #failure=40
    Environment:
      SERVICE_NAME:  scylla-us-east-1-us-east-1b-0 (v1:metadata.name)
      CPU_COUNT:     1 (limits.cpu)
    Mounts:
      /mnt/scylla-client-config from scylla-client-config-volume (ro)
      /mnt/scylla-config from scylla-config-volume (ro)
      /mnt/shared from shared (ro)
      /var/lib/scylla from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ljw4d (ro)
      /var/run/secrets/scylla-operator.scylladb.com/scylladb/client-ca from scylladb-client-ca (ro)
      /var/run/secrets/scylla-operator.scylladb.com/scylladb/serving-certs from scylladb-serving-certs (ro)
      /var/run/secrets/scylla-operator.scylladb.com/scylladb/user-admin from scylladb-user-admin (ro)
  scylla-manager-agent:
    Container ID:
    Image:         scylladb/scylla-manager-agent:3.1.2
    Image ID:
    Port:          10001/TCP
    Host Port:     0/TCP
    Args:
      -c
      /etc/scylla-manager-agent/scylla-manager-agent.yaml
      -c
      /mnt/scylla-agent-config/scylla-manager-agent.yaml
      -c
      /mnt/scylla-agent-config/auth-token.yaml
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        50m
      memory:     10M
    Environment:  <none>
    Mounts:
      /mnt/scylla-agent-config/auth-token.yaml from scylla-agent-auth-token-volume (ro,path="auth-token.yaml")
      /mnt/scylla-agent-config/scylla-manager-agent.yaml from scylla-agent-config-volume (ro,path="scylla-manager-agent.yaml")
      /var/lib/scylla from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ljw4d (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-scylla-us-east-1-us-east-1b-0
    ReadOnly:   false
  shared:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  scylla-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      scylla-config
    Optional:  true
  scylla-agent-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-agent-config-secret
    Optional:    true
  scylla-client-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-client-config-secret
    Optional:    true
  scylla-agent-auth-token-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-auth-token
    Optional:    false
  scylladb-serving-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-local-serving-certs
    Optional:    false
  scylladb-client-ca:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-local-client-ca
    Optional:    false
  scylladb-user-admin:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scylla-local-user-admin
    Optional:    false
  kube-api-access-ljw4d:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  17m                  default-scheduler  0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled         17m                  default-scheduler  Successfully assigned scylla/scylla-us-east-1-us-east-1b-0 to docker-desktop
  Warning  FailedMount       16m (x5 over 17m)    kubelet            MountVolume.SetUp failed for volume "scylladb-serving-certs" : secret "scylla-local-serving-certs" not found
  Warning  Failed            15m (x10 over 16m)   kubelet            Error: InvalidImageName
  Warning  InspectFailed     109s (x72 over 16m)  kubelet            Failed to apply default image tag "docker-pullable://scylladb/scylla-operator@sha256:418ed93e3c201bf3e9322b66afeb2ca215f0b821c08ee4dd09620703491da5dc": couldn't parse image name "docker-pullable://scylladb/scylla-operator@sha256:418ed93e3c201bf3e9322b66afeb2ca215f0b821c08ee4dd09620703491da5dc": invalid reference format

Doing that on WSL2 with kubernetes enabled in docker desktop.

@Sammers21
Copy link

It's actually crazy how much steps and moving parts in there. I wonder if that could be more simple: why would we need a cert-manager to run a database(WTF?)

@tnozicka
Copy link
Contributor

tnozicka commented Dec 6, 2023

I wonder if that could be more simple: why would we need a cert-manager to run a database(WTF?)

@Sammers21 usually people prefer secure and encrypted traffic, especially for valuable data but that is handled internally in the operator. We don't need cert-manager per say, but something needs setup serving certificates to establish TLS for webhooks to allow kube-apiserver to validate their serving cert.
https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#service-reference

scylla-operator-must-gather-mb6gz6bj8fvj.tar.gz

@jbolila thanks, unfortunately it seems like the ScyllaCluster and its namespace is no longer there. I suspect this may be CRI runtime issue though, by looking at the scylla-operator pod.

  containerStatuses:
  - containerID: docker://1426116e1640da210e4193b954832167c163a40e9964a5043b24ab2bfaec22fb
    image: scylladb/scylla-operator:1.11.0
    imageID: docker-pullable://scylladb/scylla-operator@sha256:8c1e45dea1814b678cdd2f52029e09257803f467d2a26ebe58c5ee730b03d358

imageID is where your runtime should have put the image reference which we later use to inject as a sidecar into scylla pods, to make sure it's the same image (#1425). I think we could make a fallback for the broken runtime - can you give us more info about the runtime? is this containerd? version?

Can you try adding this env to scylla-operator/scylla-operator deployment and do kubectl -n=scylla-operator rollout restart deploy/scylla-operator?

env:
        - name: SCYLLA_OPERATOR_IMAGE
          value: docker.io/scylladb/scylla-operator:1.11.0

@tnozicka tnozicka added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 6, 2023
@tnozicka tnozicka self-assigned this Dec 6, 2023
@scylla-operator-bot scylla-operator-bot bot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 6, 2023
@jbolila
Copy link
Author

jbolila commented Dec 6, 2023

Thanks @tnozicka, I have applied the env you suggested on v1.11.0, and it's working now.

helm/scylla-operator/templates/operator.deployment.yaml --- Text (2 YAML parse errors, exceeded DFT_PARSE_ERROR_LIMIT)
38 38           valueFrom:
39 39             fieldRef:
40 40               fieldPath: metadata.name
.. 41         - name: SCYLLA_OPERATOR_IMAGE
.. 42           value: docker.io/scylladb/scylla-operator:1.11.0
41 43         args:
42 44         - operator
43 45         - --loglevel={{ .Values.logLevel }}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants