Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Unable to remove RDS Global Cluster and associated RDS Clusters at one go #39909

Closed
ktrenchev opened this issue Oct 28, 2024 · 10 comments · Fixed by #40333
Closed

[Bug]: Unable to remove RDS Global Cluster and associated RDS Clusters at one go #39909

ktrenchev opened this issue Oct 28, 2024 · 10 comments · Fixed by #40333
Labels
bug Addresses a defect in current functionality. service/rds Issues and PRs that pertain to the rds service.
Milestone

Comments

@ktrenchev
Copy link

Terraform Core Version

0.13.7

AWS Provider Version

4.53.0

Affected Resource(s)

aws_rds_global_cluster
aws_rds_cluster

Expected Behavior

I want to be able to delete both the RDS Global Cluster and the associated RDS Clusters with a single terraform destroy invocation.

Actual Behavior

When terraform destroy is called it:

  1. Detaches the replica RDS Cluster from the Global RDS Cluster, thus triggering a promotion.
  2. Terraform waits for the replica RDS Cluster to be deleted, but times out as the replica RDS Cluster needs to first be promoted and then deleted, but the promotion process takes longed than the timeout.
  3. The replica RDS Cluster is eventually deleted from AWS, but the terraform destroy operation fails to delete the other RDS Cluster and the RDS Global Cluster.
  4. A 2nd run of terraform destroy deletes the leftover RDS Global Cluster and RDS Cluster.

Relevant Error/Panic Output Snippet

waiting for RDS Cluster (XXXXXXX) delete: unexpected state 'promoting', wanted target ''. last error: %!s(<nil>

Terraform Configuration Files

N/A, setup is way too complicated to extract the exact configuration.

Steps to Reproduce

  1. Create a new RDS Global Cluster.
  2. Attach an RDS Cluster (primary).
  3. Attach an RDS Cluster (replica).
  4. Run terraform destroy.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

@ktrenchev ktrenchev added the bug Addresses a defect in current functionality. label Oct 28, 2024
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Oct 28, 2024
@justinretzolk
Copy link
Member

Hey @ktrenchev 👋 Thank you for taking the time to raise this! While we understand Terraform configurations can get pretty complicated, it's often quite difficult to reproduce scenarios like this without any logging or configuration samples. Are you able to provide debug logs (redacted as necessary) if you're unable to provide a configuration as you'd initially indicated?

One thing that came to up when taking a quick look at this while triaging was the force_destroy argument of the aws_rds_global_cluster resource, which I believe is meant to help with this scenario. Are you able to confirm whether that argument has been configured?

@justinretzolk justinretzolk added waiting-response Maintainers are waiting on response from community or contributor. service/rds Issues and PRs that pertain to the rds service. labels Oct 28, 2024
@ktrenchev
Copy link
Author

Greetings @justinretzolk!,

Unfortunately I'm unable to provide debug logs. I did play around with the force_destroy argument of RDS Global Cluster resource, but it had no effect. I dug around cluster.go myself and my best estimation is:

  1. Either the destruction of the RDS Global Cluster and associated RDS Clusters at one go is intentionally unsupported (AWS docs state something along the lines of "there is no 'one button push' deletion process as RDSs are usually mission critical").
  2. The timeout in waitDBClusterDelete() (called in resourceClusterDelete()) is insufficient as earlier in resourceClusterDelete() RemoveFromGlobalClusterWithContext() is called on the replica and a promotion is triggered.

I'll be happy with a confirmation that the deletion of a Global RDS Cluster and associated RDS Clusters at one go is supported (meaning there is something wrong with my setup, which, unfortunately, is not unlikely).

@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Oct 29, 2024
@justinretzolk
Copy link
Member

justinretzolk commented Oct 29, 2024

Thanks for the additional information here @ktrenchev 👍 Completely understand re:logging and configuration samples. I'll let someone from the team or community speak to some of the more specifics here.

Edit: I had a thought that using a later provider version may help, given that we've migrated most of the provider to use AWS SDK for Go V2. In doing so, I noticed the following in the release notes for 5.24.0:

It may be worth upgrading to at least provider version 5.24.0 and testing again to see if that bug fix resolves your particular issue.

@justinretzolk justinretzolk added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Oct 29, 2024
@github-actions github-actions bot removed the waiting-response Maintainers are waiting on response from community or contributor. label Nov 3, 2024
@Fadih
Copy link

Fadih commented Nov 3, 2024

@justinretzolk do you know what was changed , i still using same aws provider 5.0.0 like before , but since october 13 its start failing , i cant upgrade my provider to new version because i need to do a lot of changes in my terraform infrastructure

@Fadih
Copy link

Fadih commented Nov 3, 2024

steps to reproduce ,

  1. create aws global db
    2)add cluster on west region with one instance
  2. add replica in east region with one instance
  3. try to restack the complete cluster using snapshot

you can see that its start deleting the instance in east region , and then when trying to promote east cluster from the global db , it didn't wait to finish promoting and start the deletion directlly , so its failing on
Error: waiting for RDS Cluster (xxxx-dr-global-region-us-east-2-cluster) delete: unexpected state 'promoting', wanted target ''. last error: %!s()

@Fadih
Copy link

Fadih commented Nov 4, 2024

@justinretzolk i already have the force_destroy on the aws_rds_global_cluster resource and it still happen

Copy link

github-actions bot commented Dec 5, 2024

Warning

This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

@github-actions github-actions bot added this to the v5.81.0 milestone Dec 5, 2024
Copy link

This functionality has been released in v5.81.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/rds Issues and PRs that pertain to the rds service.
Projects
None yet
3 participants