Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator crashes in face of an incomplete TLS configuration #1054

Closed
hoyhbx opened this issue Jul 19, 2022 · 6 comments
Closed

Operator crashes in face of an incomplete TLS configuration #1054

hoyhbx opened this issue Jul 19, 2022 · 6 comments
Assignees

Comments

@hoyhbx
Copy link
Contributor

hoyhbx commented Jul 19, 2022

What did you do to encounter the bug?
I enabled TLS with a very incomplete TLS configuration. The operator crashed when Ensuring TLS is correctly configured.

Steps to reproduce the behavior:

  1. Apply the following cr:
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: test-cluster
  namespace: mongodb
spec:
  members: 3
  security:
    authentication:
      modes:
      - SCRAM
    tls:
      enabled: true
      optional: true
  statefulSet:
    spec:
      template:
        spec:
          containers:
          - name: mongod
            resources:
              limits:
                cpu: '0.2'
                memory: 250M
              requests:
                cpu: '0.2'
                memory: 200M
          - name: mongodb-agent
            resources:
              limits:
                cpu: '0.2'
                memory: 250M
              requests:
                cpu: '0.2'
                memory: 200M
  type: ReplicaSet
  users:
  - db: admin
    name: my-user
    passwordSecretRef:
      name: my-user-password
    roles:
    - db: admin
      name: clusterAdmin
    - db: admin
      name: userAdminAnyDatabase
    scramCredentialsSecretName: my-scram
  version: 4.4.0

What did you expect?
The operator does not crash and it can also report an error/warning for incomplete input.

What happened instead?
The operator crashes with the following message:

2022-07-07T06:15:47.794Z INFO controllers/mongodb_tls.go:40 Ensuring TLS is correctly configured {"ReplicaSet": "mongodb/test-cluster"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x130e4ee]

Operator Information

  • Operator Version - 0.7.3
  • MongoDB Image used - 4.4.0

Kubernetes Cluster Information

kubectl version --short --output=yaml
$ kubectl version --short --output=yaml
clientVersion:
  buildDate: "2022-05-24T12:26:19Z"
  compiler: gc
  gitCommit: 3ddd0f45aa91e2f30c70734b175631bec5b5825a
  gitTreeState: clean
  gitVersion: v1.24.1
  goVersion: go1.18.2
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
  buildDate: "2021-05-21T23:01:33Z"
  compiler: gc
  gitCommit: 5e58841cce77d4bc13713ad2b91fa0d961e69192
  gitTreeState: clean
  gitVersion: v1.21.1
  goVersion: go1.16.4
  major: "1"
  minor: "21"
  platform: linux/amd64

Additional context
It seems that this null pointer exception occurs while ensuring that the CA cert is configured during TLS config validation when the following condition is checked:

if mdb.Spec.Security.TLS.CaCertificateSecret != nil {
	caResourceName = mdb.TLSCaCertificateSecretNamespacedName()
	caData, err = secret.ReadStringData(secretGetter, caResourceName)
} else {
	caResourceName = mdb.TLSConfigMapNamespacedName()
	caData, err = configmap.ReadData(cmGetter, caResourceName)
}

Since Spec.Security.TLS.CaCertificateSecret is set to nil, mdb.TLSConfigMapNamespacedName is called:

func (m MongoDBCommunity) TLSConfigMapNamespacedName() types.NamespacedName {
	return types.NamespacedName{Name: m.Spec.Security.TLS.CaConfigMap.Name, Namespace: m.Namespace}
}

However, Spec.Security.TLS.CaConfigMap is also nil, so when the Name field is accessed a runtime error occurs as a nil pointer is dereferenced.

Additional Information (description/logs)

  • kubectl describe output
Name:         mongodb-kubernetes-operator-698bc456cc-k5sqm
Namespace:    mongodb
Priority:     0
Node:         kind-control-plane/172.18.0.2
Start Time:   Thu, 14 Jul 2022 11:47:45 +0500
Labels:       name=mongodb-kubernetes-operator
              pod-template-hash=698bc456cc
Annotations:  seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:       Running
IP:           10.244.0.5
IPs:
  IP:           10.244.0.5
Controlled By:  ReplicaSet/mongodb-kubernetes-operator-698bc456cc
Containers:
  mongodb-kubernetes-operator:
    Container ID:  containerd://d6d75f3c8998c43484433ffd09c7483d76807377b2064133c717367bce9e8e77
    Image:         quay.io/mongodb/mongodb-kubernetes-operator:0.7.3
    Image ID:      quay.io/mongodb/mongodb-kubernetes-operator@sha256:4b2edb911c05ccaaa09a3e4d30c50891e6170b652cd53f0de19c3c13a01eb465
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/entrypoint
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Thu, 14 Jul 2022 11:50:29 +0500
      Finished:     Thu, 14 Jul 2022 11:50:30 +0500
    Ready:          False
    Restart Count:  3
    Limits:
      cpu:     1100m
      memory:  1Gi
    Requests:
      cpu:     500m
      memory:  200Mi
    Environment:
      WATCH_NAMESPACE:             mongodb (v1:metadata.namespace)
      POD_NAME:                    mongodb-kubernetes-operator-698bc456cc-k5sqm (v1:metadata.name)
      OPERATOR_NAME:               mongodb-kubernetes-operator
      AGENT_IMAGE:                 quay.io/mongodb/mongodb-agent:11.12.0.7388-1
      VERSION_UPGRADE_HOOK_IMAGE:  quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.4
      READINESS_PROBE_IMAGE:       quay.io/mongodb/mongodb-kubernetes-readinessprobe:1.0.8
      MONGODB_IMAGE:               mongo
      MONGODB_REPO_URL:            docker.io
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gws5w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-gws5w:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  3m11s                default-scheduler  Successfully assigned mongodb/mongodb-kubernetes-operator-698bc456cc-k5sqm to kind-control-plane
  Normal   Pulled     101s                 kubelet            Successfully pulled image "quay.io/mongodb/mongodb-kubernetes-operator:0.7.3" in 1m30.138677351s
  Normal   Pulled     69s                  kubelet            Successfully pulled image "quay.io/mongodb/mongodb-kubernetes-operator:0.7.3" in 1.329111544s
  Normal   Pulled     55s                  kubelet            Successfully pulled image "quay.io/mongodb/mongodb-kubernetes-operator:0.7.3" in 1.099131675s
  Normal   Pulling    28s (x4 over 3m11s)  kubelet            Pulling image "quay.io/mongodb/mongodb-kubernetes-operator:0.7.3"
  Normal   Created    27s (x4 over 100s)   kubelet            Created container mongodb-kubernetes-operator
  Normal   Started    27s (x4 over 100s)   kubelet            Started container mongodb-kubernetes-operator
  Normal   Pulled     27s                  kubelet            Successfully pulled image "quay.io/mongodb/mongodb-kubernetes-operator:0.7.3" in 1.229023453s
  Warning  BackOff    0s (x6 over 68s)     kubelet            Back-off restarting failed container
  • log files for the operator
Running ./manager
2022-07-14T07:20:14.759Z	INFO	manager/main.go:74	Watching namespace: mongodb
2022-07-14T07:20:15.363Z	INFO	manager/main.go:91	Registering Components.
2022-07-14T07:20:15.364Z	INFO	manager/main.go:104	Starting the Cmd.
2022-07-14T07:20:15.467Z	INFO	controllers/replica_set_controller.go:140	Reconciling MongoDB	{"ReplicaSet": "mongodb/test-cluster"}
2022-07-14T07:20:15.467Z	DEBUG	controllers/replica_set_controller.go:142	Validating MongoDB.Spec	{"ReplicaSet": "mongodb/test-cluster"}
2022-07-14T07:20:15.467Z	DEBUG	controllers/replica_set_controller.go:151	Ensuring the service exists	{"ReplicaSet": "mongodb/test-cluster"}
2022-07-14T07:20:15.575Z	INFO	controllers/replica_set_controller.go:475	Create/Update operation succeeded	{"ReplicaSet": "mongodb/test-cluster", "operation": "updated"}
2022-07-14T07:20:15.575Z	DEBUG	controllers/replica_set_controller.go:160	Ensuring the service for Arbiters exists	{"ReplicaSet": "mongodb/test-cluster"}
2022-07-14T07:20:15.581Z	INFO	controllers/replica_set_controller.go:475	Create/Update operation succeeded	{"ReplicaSet": "mongodb/test-cluster", "operation": "updated"}
2022-07-14T07:20:15.581Z	INFO	controllers/mongodb_tls.go:40	Ensuring TLS is correctly configured	{"ReplicaSet": "mongodb/test-cluster"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x130e4ee]

goroutine 153 [running]:
github.com/mongodb/mongodb-kubernetes-operator/api/v1.MongoDBCommunity.TLSConfigMapNamespacedName(...)
	/workspace/api/v1/mongodbcommunity_types.go:716
github.com/mongodb/mongodb-kubernetes-operator/controllers.getCaCrt({_, _}, {_, _}, {{{0x134105c, 0x10}, {0xc0002ea000, 0x1f}}, {{0xc0002f1d20, 0xc}, ...}, ...})
	/workspace/controllers/mongodb_tls.go:167 +0x12e
github.com/mongodb/mongodb-kubernetes-operator/controllers.(*ReplicaSetReconciler).validateTLSConfig(_, {{{0x134105c, 0x10}, {0xc0002ea000, 0x1f}}, {{0xc0002f1d20, 0xc}, {0x0, 0x0}, {0xc0002f1d39, ...}, ...}, ...})
	/workspace/controllers/mongodb_tls.go:43 +0x132
github.com/mongodb/mongodb-kubernetes-operator/controllers.ReplicaSetReconciler.Reconcile({{0x18bf290, 0xc000605790}, 0xc000176cb0, 0xc00000e250, 0xc00046a128, 0xc00046a130}, {0x187fc98, 0xc00063e0f0}, {{{0xc0002f1d39, 0x7}, ...}})
	/workspace/controllers/replica_set_controller.go:169 +0xaba
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0003408c0, {0x187fc98, 0xc00063e090}, {{{0xc0002f1d39, 0x14df7a0}, {0xc0002f1d20, 0xc0004350c0}}})
	/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0x222
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003408c0, {0x187fbf0, 0xc0000bf180}, {0x1495720, 0xc00004b5e0})
	/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x2f2
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003408c0, {0x187fbf0, 0xc0000bf180})
	/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x356

Possible Fix
I think the CRD is not consistent with the operator code as both caConfigMapRef and caCertificateSecretRef are optional in the CRD. However, the operator expects one of them to be present while validating the TLS config. We can make changes to the CRD or we can also consider the following alternative fix.

Currently, there is no validation for TLS spec:

func validateSpec(mdb mdbv1.MongoDBCommunity) error {
	if err := validateUsers(mdb); err != nil {
		return err
	}


	if err := validateArbiterSpec(mdb); err != nil {
		return err
	}


	if err := validateAuthModeSpec(mdb); err != nil {
		return err
	}


	return nil
}

I think we can include the following condition in the above code along with the function definition in controllers/validation/validation.go:

if err := validateTLSSpec(mdb); err != nil {
	return err
}
// Validate TLS specification
func validateTLSSpec(mdb mdbv1.MongoDBCommunity) error {
        if mdb.Spec.Security.TLS.CaConfigMap == nil && mdb.Spec.Security.TLS.CaCertificateSecret == nil {
		return fmt.Errorf("CaCertificateSecretRef/CaConfigMapRef not found.")
	}

	return nil
}
@github-actions
Copy link
Contributor

This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.

@github-actions github-actions bot added the stale label Sep 21, 2022
@hoyhbx
Copy link
Contributor Author

hoyhbx commented Sep 22, 2022

Remove stale

@github-actions github-actions bot removed the stale label Sep 23, 2022
@slaskawi slaskawi self-assigned this Oct 3, 2022
@slaskawi
Copy link
Contributor

slaskawi commented Oct 3, 2022

Thanks for raising this up @hoyhbx

As for the hoyhbx#3, it seems the PR has been opened against a fork. Could you please explain why?

Also, could you please tell me if #1119 contains a proper fix for this? I'm a bit puzzled with both those Pull Requests.

@hoyhbx
Copy link
Contributor Author

hoyhbx commented Oct 5, 2022

@slaskawi
hoyhbx#3 is a duplicate and we will close it.

#1119 would also fix this issue as #1115 , the difference is that #1115 rejects the invalid input up front, and #1119 throws an error when reconciling ca secret. Both fixes work

@slaskawi
Copy link
Contributor

Closed via #1115

@dan-mckean
Copy link
Collaborator

Hi, I'm Dan and I'm the Product Manager for MongoDB's support of Kubernetes.

I'm doing some work right now to try and identify how the Community Operator is being used. The Community Operator is something I inherited when I started at MongoDB, but it doesn't get as much attention from us as we'd like and we're trying to understand how it's used in order to establish its future. It will help us prioritize future issues and PRs raised by the community 🙂

Here's a super short survey (it's much easier for us to review all the feedback that way!): https://docs.google.com/forms/d/e/1FAIpQLSfwrwyxBSlUyJ6AmC-eYlgW_3JEdfA48SB2i5--_WpiynMW2w/viewform?usp=sf_link

If you'd rather email me instead: [email protected]

Thank you in advance!
Dan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants