Add more retries to resource group deletion #5537

ocofaigh · 2024-07-30T09:34:02Z

A common use case is to provision resource group + OCP VPC cluster as part of the same terraform script.
When you provision an OCP VPC cluster, it automatically provisions a VPC load balancer. Terraform does not know about this load balancer (its not in the state file).
So when you run a terraform destroy, it almost always fails on first attempt with the error:

 2024/07/22 13:36:19 Terraform destroy |     "Result": {
 2024/07/22 13:36:19 Terraform destroy |         "errors": [
 2024/07/22 13:36:19 Terraform destroy |             {
 2024/07/22 13:36:19 Terraform destroy |                 "code": "NOT_EMPTY",
 2024/07/22 13:36:19 Terraform destroy |                 "message": "Resource groups with active instances can't be deleted. Use the CLI command \"ibmcloud resource service-instances --type all -g \u003cresource-group\u003e\" to check for remaining instances, then delete the instances and try again.",
 2024/07/22 13:36:19 Terraform destroy |                 "more_info": "n/a"
 2024/07/22 13:36:19 Terraform destroy |             }
 2024/07/22 13:36:19 Terraform destroy |         ],

By running the command ibmcloud resource service-instances --type all -g <resource-group> I can see that indeed the group still contains a VPC load balancer - for example:

[
  {
    "guid": "crn:v1:bluemix:public:containers-kubernetes:us-south:a/abac0df06b644a9cabc6e44f55b3880e:cqjqkvvd0c64fpr2h9j0:nlb:nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud",
    "id": "crn:v1:bluemix:public:containers-kubernetes:us-south:a/abac0df06b644a9cabc6e44f55b3880e:cqjqkvvd0c64fpr2h9j0:nlb:nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud",
    "url": "/v2/resource_instances/crn:v1:bluemix:public:containers-kubernetes:us-south:a%2Fabac0df06b644a9cabc6e44f55b3880e:cqjqkvvd0c64fpr2h9j0:nlb:nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud",
    "created_at": "2024-07-29T15:38:22Z",
    "updated_at": "2024-07-29T15:38:22Z",
    "deleted_at": null,
    "name": "nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud",
    "region_id": "us-south",
    "account_id": "abac0df06b644a9cabc6e44f55b3880e",
    "resource_plan_id": "containers.kubernetes.multizone.load.balancer",
    "resource_group_id": "0ed9fc69d01c48a092dd1600f63de2fa",
    "crn": "crn:v1:bluemix:public:containers-kubernetes:us-south:a/abac0df06b644a9cabc6e44f55b3880e:cqjqkvvd0c64fpr2h9j0:nlb:nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud",
    "create_time": 1722267502000,
    "created_by": "iam-ServiceId-1829dcf6-eb99-4760-81ad-6ca95cbab194",
    "state": "active",
    "type": "service_instance",
    "resource_id": "containers-kubernetes",
    "dashboard_url": null,
    "allow_cleanup": false,
    "locked": false,
    "last_operation": {
      "type": "create",
      "state": "succeeded",
      "description": "Instance provisioning is completed.",
      "updated_at": null,
      "cancelable": false
    },
    "account_url": "",
    "resource_plan_url": "",
    "resource_bindings_url": "/v2/resource_instances/crn:v1:bluemix:public:containers-kubernetes:us-south:a%2Fabac0df06b644a9cabc6e44f55b3880e:cqjqkvvd0c64fpr2h9j0:nlb:nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud/resource_bindings",
    "resource_aliases_url": "/v2/resource_instances/crn:v1:bluemix:public:containers-kubernetes:us-south:a%2Fabac0df06b644a9cabc6e44f55b3880e:cqjqkvvd0c64fpr2h9j0:nlb:nlb-con2-workload-cluster-3b5bf5f75003778663c521c8c35ad277-i000.us-south.containers.appdomain.cloud/resource_aliases",
    "siblings_url": "",
    "target_crn": "crn:v1:bluemix:public:globalcatalog::::deployment:containers.kubernetes.multizone.load.balancer%3Aus-south"
  }
]

If I wait some time, this eventually get deleted and resource group deletion passes. I would like to propose that the terraform provider is updated to add more retries when attempting to delete a resource group to cover such a use case.
An even nicer enhancement would be to actually output the content that are remaining in the resource group that is preventing deletion from occurring.

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform IBM Provider Version

Affected Resource(s)

ibm_resource_group

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please share a link to the ZIP file.

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

terraform apply

Important Factoids

References

#0000

The text was updated successfully, but these errors were encountered:

hkantare · 2024-07-30T09:42:31Z

@ocofaigh
As part of cluster delete we already have check to wait for load balancer to be deleted

terraform-provider-ibm/ibm/service/kubernetes/resource_ibm_container_vpc_cluster.go

Line 1022 in 67305d7

id := *lb.ID

Need to analyze even after this wait for delete also resource group n't able to disassociate from that particular instance

hkantare · 2024-07-30T09:45:42Z

Second approach :
As part of resource group delete add some conditional logic to check for any existing instance association and wait for certain time

ocofaigh · 2024-07-30T10:01:27Z

@hkantare Thanks for feedback. So it sounds like isWaitForLBDeleted is not working as expected, so that should probably be debugged. I'm able to very easily reproduce using this code (which is the same as the Red Hat OpenShift Container Platform on VPC landing zone tile in IBM Cloud catalog).

+1 for the second approach too though, as I have seen other resources with similar issues. PAG is another one, as it provisions an sdnlb that terraform state does not know about

ocofaigh · 2024-08-27T12:03:21Z

@hkantare Do you think this is something that could be prioritised?

As part of resource group delete add some conditional logic to check for any existing instance association and wait for certain time

Its something that consumers keep on hitting, especially since most of the Deployable Architectures that are available in the IBM Cloud catalog support creating a resource group. When people do a destroy (especially when OCP cluster are destroyed), the resource group delete fails very frequently with:

 2024/08/27 11:40:06 Terraform destroy |       "Result": {
 2024/08/27 11:40:06 Terraform destroy |           "errors": [
 2024/08/27 11:40:06 Terraform destroy |               {
 2024/08/27 11:40:06 Terraform destroy |                   "code": "NOT_EMPTY",
 2024/08/27 11:40:06 Terraform destroy |                   "message": "Resource groups with active instances can't be deleted. Use the CLI command \"ibmcloud resource service-instances --type all -g \u003cresource-group\u003e\" to check for remaining instances, then delete the instances and try again.",
 2024/08/27 11:40:06 Terraform destroy |                   "more_info": "n/a"
 2024/08/27 11:40:06 Terraform destroy |               }
 2024/08/27 11:40:06 Terraform destroy |           ],
 2024/08/27 11:40:06 Terraform destroy |           "trace": "80e645c8-e323-4893-b0c1-b0d8a82ee0b6"
 2024/08/27 11:40:06 Terraform destroy |       },
 2024/08/27 11:40:06 Terraform destroy |       "RawResult": null
 2024/08/27 11:40:06 Terraform destroy |   }

hkantare · 2024-08-27T14:40:30Z

@ocofaigh
We will plan to add some retry for resource group delete. Can you share what is the status code associated for above error?

ocofaigh · 2024-08-27T15:06:30Z

@hkantare "StatusCode": 500

Full output:

2024/08/27 11:40:06 Terraform destroy | Error: [ERROR] Error Deleting resource group: Resource groups with active instances can't be deleted. Use the CLI command "ibmcloud resource service-instances --type all -g <resource-group>" to check for remaining instances, then delete the instances and try again. with response code  {
 2024/08/27 11:40:06 Terraform destroy |     "StatusCode": 500,
 2024/08/27 11:40:06 Terraform destroy |     "Headers": {
 2024/08/27 11:40:06 Terraform destroy |         "Cache-Control": [
 2024/08/27 11:40:06 Terraform destroy |             "max-age=0, no-cache, no-store"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Content-Length": [
 2024/08/27 11:40:06 Terraform destroy |             "332"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Content-Type": [
 2024/08/27 11:40:06 Terraform destroy |             "application/json; charset=utf-8"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Date": [
 2024/08/27 11:40:06 Terraform destroy |             "Tue, 27 Aug 2024 11:40:06 GMT"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Etag": [
 2024/08/27 11:40:06 Terraform destroy |             "W/\"14c-POn/BpsPEJ94sjfRFJOtr4bZwxc\""
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Expires": [
 2024/08/27 11:40:06 Terraform destroy |             "Tue, 27 Aug 2024 11:40:06 GMT"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Pragma": [
 2024/08/27 11:40:06 Terraform destroy |             "no-cache"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Server": [
 2024/08/27 11:40:06 Terraform destroy |             "istio-envoy"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Strict-Transport-Security": [
 2024/08/27 11:40:06 Terraform destroy |             "max-age=31536000; includeSubDomains"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Transaction-Id": [
 2024/08/27 11:40:06 Terraform destroy |             "80e645c8-e323-4893-b0c1-b0d8a82ee0b6"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "Vary": [
 2024/08/27 11:40:06 Terraform destroy |             "Accept-Encoding"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Content-Type-Options": [
 2024/08/27 11:40:06 Terraform destroy |             "nosniff"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Envoy-Upstream-Service-Time": [
 2024/08/27 11:40:06 Terraform destroy |             "169"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Ratelimit-Limit": [
 2024/08/27 11:40:06 Terraform destroy |             "60"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Ratelimit-Remaining": [
 2024/08/27 11:40:06 Terraform destroy |             "59"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Ratelimit-Reset": [
 2024/08/27 11:40:06 Terraform destroy |             "0"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Request-Id": [
 2024/08/27 11:40:06 Terraform destroy |             "80e645c8-e323-4893-b0c1-b0d8a82ee0b6"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "X-Response-Time": [
 2024/08/27 11:40:06 Terraform destroy |             "166.360ms"
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "_request_id": [
 2024/08/27 11:40:06 Terraform destroy |             "80e645c8-e323-4893-b0c1-b0d8a82ee0b6"
 2024/08/27 11:40:06 Terraform destroy |         ]
 2024/08/27 11:40:06 Terraform destroy |     },
 2024/08/27 11:40:06 Terraform destroy |     "Result": {
 2024/08/27 11:40:06 Terraform destroy |         "errors": [
 2024/08/27 11:40:06 Terraform destroy |             {
 2024/08/27 11:40:06 Terraform destroy |                 "code": "NOT_EMPTY",
 2024/08/27 11:40:06 Terraform destroy |                 "message": "Resource groups with active instances can't be deleted. Use the CLI command \"ibmcloud resource service-instances --type all -g \u003cresource-group\u003e\" to check for remaining instances, then delete the instances and try again.",
 2024/08/27 11:40:06 Terraform destroy |                 "more_info": "n/a"
 2024/08/27 11:40:06 Terraform destroy |             }
 2024/08/27 11:40:06 Terraform destroy |         ],
 2024/08/27 11:40:06 Terraform destroy |         "trace": "80e645c8-e323-4893-b0c1-b0d8a82ee0b6"
 2024/08/27 11:40:06 Terraform destroy |     },
 2024/08/27 11:40:06 Terraform destroy |     "RawResult": null
 2024/08/27 11:40:06 Terraform destroy | }

hkantare · 2024-08-29T16:40:06Z

@ocofaigh Added this retry logic for deletion of resource grp with default timeout of 20 mins.
Mostly this should be able to address the deletion of cluster alb, pag.

ocofaigh · 2024-09-04T17:57:27Z

Thanks, I see it was released in 1.69.0 so going to close this issue. If I see any issues, I'll let you know

github-actions bot added the service/Resource Management Issues related to Resource Manager or Resource controller Issues label Jul 30, 2024

This was referenced Jul 30, 2024

fix: resource group names terraform-ibm-modules/terraform-ibm-landing-zone#852

Merged

feat: added operating-system input variable terraform-ibm-modules/terraform-ibm-landing-zone#848

Merged

ocofaigh mentioned this issue Aug 29, 2024

feat: added variables to configure Workload Protection components terraform-ibm-modules/terraform-ibm-scc-da#175

Merged

6 tasks

hkantare mentioned this issue Aug 29, 2024

Retry support for delete of resource group #5591

Merged

ocofaigh closed this as completed Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more retries to resource group deletion #5537

Add more retries to resource group deletion #5537

ocofaigh commented Jul 30, 2024 •

edited

Loading

hkantare commented Jul 30, 2024 •

edited

Loading

hkantare commented Jul 30, 2024

ocofaigh commented Jul 30, 2024

ocofaigh commented Aug 27, 2024

hkantare commented Aug 27, 2024

ocofaigh commented Aug 27, 2024

hkantare commented Aug 29, 2024

ocofaigh commented Sep 4, 2024

Add more retries to resource group deletion #5537

Add more retries to resource group deletion #5537

Comments

ocofaigh commented Jul 30, 2024 • edited Loading

Community Note

Terraform CLI and Terraform IBM Provider Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

hkantare commented Jul 30, 2024 • edited Loading

hkantare commented Jul 30, 2024

ocofaigh commented Jul 30, 2024

ocofaigh commented Aug 27, 2024

hkantare commented Aug 27, 2024

ocofaigh commented Aug 27, 2024

hkantare commented Aug 29, 2024

ocofaigh commented Sep 4, 2024

ocofaigh commented Jul 30, 2024 •

edited

Loading

hkantare commented Jul 30, 2024 •

edited

Loading