Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform plan or destroy fails when an aws resource that is referenced by another resource is removed outside of terraform. #19932

Closed
overlordchin opened this issue Jun 23, 2021 · 6 comments · Fixed by #26553
Labels
bug Addresses a defect in current functionality. provider Pertains to the provider itself, rather than any interaction with AWS.
Milestone

Comments

@overlordchin
Copy link

overlordchin commented Jun 23, 2021

Problem statement:
When a terraform apply is interrupted or if someone manually deletes an AWS resource in the console, running any subsequent plan or destroy action will fail with an error message that is dependent on the specific resource in question. Usually along the lines of "resource x cannot be found".

​More to the point this happens when Resource A exists and references Resource B which does not exist. IE A security group rule that references a security group that no longer exists. Or an ALB Listener rule that references a target group that was deleted.

Example: Error: Error deleting Glue Catalog Table: EntityNotFoundException: Database myfancy_database not found.
Example2: Error: No security group with ID "sg-myfancysecuritygroup"

Expected behavior:
Since the goal of a destroy is removing the item in question and previous documentation implied this: a WARN statement should be output indicating there was no action needed on the resource and the destroy should continue to remove the "parent" resource(s)
Planning - should attempt to recreate the resource if it still exists in the tf files. Understood if a refresh=true was needed here but that doesnt work either.

I would expect both to work under the circumstances with at least executing a refresh to correct the known state of the world but for whatever reason that does not seem to help.

Current work around isnt super practical. You are required to match the version of terraform the state was previously planned on, ensure the tf files mirror the state that is live as best as possible. Perform Terraform init, terraform workspace select, terraform state list, terraform state rm 'bad resource', terraform destroy. This is a manual, error-prone, painstaking process and when you have over 100 states in an environment .. tedious just isnt really a fitting descriptor anymore.

@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Jun 23, 2021
@overlordchin overlordchin changed the title Terraform plan or destroy fails when an aws resource is removed outside of terraform. Terraform plan or destroy fails when an aws resource that is referenced by another resource is removed outside of terraform. Jun 23, 2021
@ewbankkit ewbankkit added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Jun 28, 2021
@ewbankkit
Copy link
Contributor

@overlordchin Thanks for raising this issue.
You are correct that any Read or Delete of a resource should handle the non-existence of the resource gracefully.
It is a bug for the provider to return an error (and hence terminate the workflow) in these cases.
If you have specific examples the best approach is to open a separate issue for each case.

@overlordchin
Copy link
Author

overlordchin commented Jul 1, 2021

As requested I have created several sub tickets and referenced this ticket in them to keep them better organized - with the observed failure cases with destroys. Keep in mind this is only what I have log data on and there may be more cases out there.
Unfortunately, I do not have any log data related to failed plans so you guys will need to dig into that.

Side note:
I have also observed this behavior outside of the aws provider with datadog monitors but that falls outside the scope of this ticket given its a completely sep provider.

@ewbankkit
Copy link
Contributor

A way to test these is to use 2 sequential calls to testAccCheckResourceDisappears() (#13527) in acceptance tests.
The first call will delete the resource, the second will then attempt to delete the deleted resource.
This could be done in the _disappears acceptance tests.

@ewbankkit ewbankkit added the bug Addresses a defect in current functionality. label Jul 1, 2021
@divmgl
Copy link
Contributor

divmgl commented Jan 22, 2022

This is unfortunate, but I'm gonna bump this bug. I'm running into this right now with an Athena database I removed manually:

MetadataException: Database aggregator not found. (Service: AmazonDataCatalog; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 3bef06a6-abf8-4566-bd4f-96ac2e581f69; Proxy: null)

I'm not sure what to do here. I'm going to try recreating the database by hand, but it feels like Terraform should handle this.

Edit: I was able to fix this by doing

terraform state rm module.dev.aws_athena_database.aggregator

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

This functionality has been released in v4.29.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Oct 3, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. provider Pertains to the provider itself, rather than any interaction with AWS.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants