Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster.kafka cannot be created #1611

Closed
haorenfsa opened this issue Dec 26, 2022 · 12 comments · Fixed by #1612
Closed

Cluster.kafka cannot be created #1611

haorenfsa opened this issue Dec 26, 2022 · 12 comments · Fixed by #1612
Labels
bug Something isn't working

Comments

@haorenfsa
Copy link
Contributor

What happened?

The resource Cluster.kafka cannot be created

How can we reproduce it?

just use the example provided, you can change some fields to more reasonable values. but it doesn't make any difference. The reconcile failed before it can start create.
(https://github.com/crossplane-contrib/provider-aws/blob/master/examples/kafka/cluster.yaml)

What environment did it happen in?

Crossplane version: (doesn't matter)

Here is why it's going wrong:

The initializer of the managed resources will set the external-name annotation.
The observe function uses the external-name as the arn of the cluster to do the DescribeCluster. But a user is only allowed to describe the arn under his/her own useraccount.

Say a user is 670409000460, and the region is ap-southeast-1.
Then the user can only describe the resources under "arn:aws:kafka:ap-southeast-1:670409000460:cluster/bla/bla"
So the Observe failed with 403.
I tried a way to set the external-name by my self to avoid it. But then I jump into annother error:

  - lastTransitionTime: "2022-12-26T09:15:17Z"
    message: |-
      observe failed: failed to describe Cluster: NotFoundException: The requested resource doesn’t exist.
      {
        RespMetadata: {
          StatusCode: 404,
          RequestID: ""
        },
        InvalidParameter: "clusterArn",
        Message_: "The requested resource doesn’t exist."
      }
    reason: ReconcileError
    status: "False"
    type: Synced

So the here's the conclusion: the reconciler can only work correctly when external-name is not set, or set to correct value. While the initializer set it to the wrong value by default.

func defaultMRManaged(m manager.Manager) mrManaged {
	return mrManaged{
		CriticalAnnotationUpdater: NewRetryingCriticalAnnotationUpdater(m.GetClient()),
		Finalizer:                 resource.NewAPIFinalizer(m.GetClient(), FinalizerName),
		Initializer:               NewNameAsExternalName(m.GetClient()),
		ReferenceResolver:         NewAPISimpleReferenceResolver(m.GetClient()),
		ConnectionPublisher: PublisherChain([]ConnectionPublisher{
			NewAPISecretPublisher(m.GetClient(), m.GetScheme()),
			&DisabledSecretStoreManager{},
		}),
	}
}

//...

// Initialize the given managed resource.
func (a *NameAsExternalName) Initialize(ctx context.Context, mg resource.Managed) error {
	if meta.GetExternalName(mg) != "" {
		return nil
	}
	meta.SetExternalName(mg, mg.GetName())
	return errors.Wrap(a.client.Update(ctx, mg), errUpdateManaged)
}

Solution

We should add WithInitializers options to override the unintended default intializers

@haorenfsa haorenfsa added the bug Something isn't working label Dec 26, 2022
@haorenfsa haorenfsa mentioned this issue Dec 26, 2022
2 tasks
@haorenfsa
Copy link
Contributor Author

#1551 seems provides a more complete solution

@haorenfsa
Copy link
Contributor Author

updated according to #1551. If #1551 got no progress, plz merge this. I'm hoping to use this feature soon

@haorenfsa
Copy link
Contributor Author

/reopen

@HotThoughts
Copy link

Hey I am still unable to create kafka cluster with v0.36.0 which includes the kafka initializer fix. I use crossplane v1.10.1. eks version v1.23.13-eks-fb459a0.

The error message is a bit different:

        cannot update managed resource: Cluster.kafka.aws.crossplane.io
        "aws-kafka-dev" is invalid: spec.forProvider.clusterName: Required value

Anyone experiencing the same? Note the value for spec.forProvider.clusterName is given. No idea why it complains about it.

The output of kubectl describe cluster.kafka.aws.crossplane.io/example

Name:         example
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kafka.aws.crossplane.io/v1alpha1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2023-01-16T13:17:53Z
  Generation:          1
  Managed Fields:
    API Version:  kafka.aws.crossplane.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:crossplane.io/external-name:
      f:status:
        .:
        f:atProvider:
        f:conditions:
    Manager:      crossplane-aws-provider
    Operation:    Update
    Subresource:  status
    Time:         2023-01-16T13:17:53Z
    API Version:  kafka.aws.crossplane.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:deletionPolicy:
        f:forProvider:
          .:
          f:brokerNodeGroupInfo:
            .:
            f:clientSubnets:
            f:instanceType:
            f:securityGroupRefs:
            f:storageInfo:
              .:
              f:ebsStorageInfo:
                .:
                f:volumeSize:
          f:clusterName:
          f:configurationInfo:
            .:
            f:arnRef:
              .:
              f:name:
            f:revision:
          f:kafkaVersion:
          f:numberOfBrokerNodes:
          f:region:
          f:tags:
            .:
            f:myKey:
        f:providerConfigRef:
          .:
          f:name:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2023-01-16T13:17:53Z
  Resource Version:  179928293
  UID:               393158b1-9ea6-42dd-850c-583e6a0e60f8
Spec:
  Deletion Policy:  Delete
  For Provider:
    Broker Node Group Info:
      Client Subnets:
        subnet-XXXXXXX
        subnet-XXXXXXX
      Instance Type:  kafka.t3.small
      Security Group Refs:
        Name:  aws-kafka-dev-msk-sg
      Storage Info:
        Ebs Storage Info:
          Volume Size:  1
    Cluster Name:       example
    Configuration Info:
      Arn Ref:
        Name:                aws-kafka-dev-configuration
      Revision:              1
    Kafka Version:           3.2.0
    Number Of Broker Nodes:  2
    Region:                  eu-west-1
    Tags:
      My Key:  myValue
  Provider Config Ref:
    Name:  staging
Status:
  At Provider:
  Conditions:
    Last Transition Time:  2023-01-16T13:17:53Z
    Message:               cannot update managed resource: Cluster.kafka.aws.crossplane.io "example" is invalid: spec.forProvider.clusterName: Required value
    Reason:                ReconcileError
    Status:                False
    Type:                  Synced
Events:
  Type     Reason                           Age                From                                     Message
  ----     ------                           ----               ----                                     -------
  Warning  CannotInitializeManagedResource  25s (x6 over 54s)  managed/cluster.kafka.aws.crossplane.io  cannot update managed resource: Cluster.kafka.aws.crossplane.io "example" is invalid: spec.forProvider.clusterName: Required value

@haarchri
Copy link
Member

Let me Test again i will update - later today

@haorenfsa
Copy link
Contributor Author

Seems to be the logic problem in Create

func (e *external) Create(ctx context.Context, mg cpresource.Managed) (managed.ExternalCreation, error) {
	cr, ok := mg.(*svcapitypes.Cluster)
	if !ok {
		return managed.ExternalCreation{}, errors.New(errUnexpectedObject)
	}
	cr.Status.SetConditions(xpv1.Creating())
	input := GenerateCreateClusterInput(cr)
	if err := e.preCreate(ctx, cr, input); err != nil {
		return managed.ExternalCreation{}, errors.Wrap(err, "pre-create failed")
	}
	resp, err := e.client.CreateClusterWithContext(ctx, input)
	if err != nil {
		return managed.ExternalCreation{}, awsclient.Wrap(err, errCreate)
	}

	if resp.ClusterArn != nil {
		cr.Status.AtProvider.ClusterARN = resp.ClusterArn
	} else {
		cr.Status.AtProvider.ClusterARN = nil
	}
        // here we use the response to overwrite the spec
	if resp.ClusterName != nil {
		cr.Spec.ForProvider.ClusterName = resp.ClusterName
	} else {
		cr.Spec.ForProvider.ClusterName = nil
	}
	if resp.State != nil {
		cr.Status.AtProvider.State = resp.State
	} else {
		cr.Status.AtProvider.State = nil
	}

	return e.postCreate(ctx, cr, resp, managed.ExternalCreation{}, err)
}

Below the commented line, we use the response to overwrite the spec, which is a little bit unreasonable. I wonder if the response didn't provide the clusterName when first create. And controller tried to overwrite the clusterName to nil and failed.

The error msg says cannot update managed resource, so it's very likely this case.

@haorenfsa
Copy link
Contributor Author

@HotThoughts
Cloud you check your aws console whether the MSK cluster is already created?

@HotThoughts
Copy link

Hi @haorenfsa , yes I checked aws console. No MSK cluster is being created by crossplane.

@haorenfsa
Copy link
Contributor Author

Add ref of related patch: #1624

@haorenfsa
Copy link
Contributor Author

haorenfsa commented Jan 18, 2023

@HotThoughts According to others tests, the creation should be Okey.
So creation failure of your kafka resource is very likely caused by invalid / missing parameters in your CR.
But there's indeed a bug when update the CR status after CreatCluster called, so we can only find out after this bug get fixed.

My guess is that the the clientAuthentication shoud add at least on kind of authentication, for example:

spec:
  clientAuthentication:
    unauthenticated:
      enabled: true

You can try out or just wait for the patch.

@haarchri
Copy link
Member

we will publish tomorrow a new fix release #1638
that Observe / describe is working!

@drewwells
Copy link
Contributor

I am getting the same issue in v0.36.1 / #1638

The creation is not surfacing any errors calling AWS with provided configuration. It would be helpful to update the CR with status if AWS api failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants