Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_security_group_rule.*: Error finding matching Security Group Rule #5529

Closed
mildred opened this issue Aug 13, 2018 · 2 comments
Closed

aws_security_group_rule.*: Error finding matching Security Group Rule #5529

mildred opened this issue Aug 13, 2018 · 2 comments
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. stale Old or inactive issues managed by automation, if no further action taken these will get closed.

Comments

@mildred
Copy link
Contributor

mildred commented Aug 13, 2018

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Affected Resource(s)

  • aws_security_group_rule

(but the same problem can probably occur with many other resources)

Important Factoids

Terraform code is nothing specific. A security group and a security group rule. Looking at the debug log, I found that the following sequence of events occurred within resource_aws_security_group_rule.go function resourceAwsSecurityGroupRuleCreate

  • DescribeSecurityGroups
  • AuthorizeSecurityGroupIngress
  • compute security group rule id sgrule-*
  • DescribeSecurityGroups
    • if the security group is not found, fatal error (this is what happened)
    • if the security group exists but the rule does not exists, retry until the read timeout

What's really strange and counter intuitive is that the first DescribeSecurityGroups call returns the security group details while the second DescribeSecurityGroups call returns that the group is not found.

This occurs right after security group creation. Investigating this with AWS support, they told me that the API calls are only eventually consistent and this kind of glitches is to be expected until everything in AWS becomes aware of the security group existence :

I understand that you are getting an error "InvalidGroup.NotFound" for DesscribeSecurityGroups after a successful CreateSecurityGroup Go SDK API call.

On reviewing the CloudTrailLogs, I see that there was a successful CreateSecurityGroup API call for sg-0559f279 on 2018-07-20, 11:05:12 UTC, and from the logs, I see the on 2018-07-20, 11:05:15 UTC you made a DesscribeSecurityGroups API call. This call was made 3 seconds after CreateSecurityGroup API call. The security group that was made, was not yet propagated throughout the system due to distributed nature of the system supporting the API.

The Amazon EC2 API follows an eventual consistency model, due to the distributed nature of the system supporting the API. This means that the result of an API command you run that affects your Amazon EC2 resources might not be immediately visible to all subsequent commands you run. [1]

Eventual consistency can affect the way you manage your resources. For example, if you run a command to create a resource, it will eventually be visible to other commands. This means that if you run a command to modify or describe the resource that you just created, its ID might not have propagated throughout the system, and you will get an error responding that the resource does not exist. You may get the following errors: [1]

  1. InvalidInstanceID.NotFound
  2. InvalidGroup.NotFound
  3. InstanceLimitExceeded

[1] Eventual Consistency: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#eventual-consistency

I believe there is no easy solution to guard against this specific, but this is a bug nonetheless. I don't believe it's possible to retry on every NotFoundError because then we'd reach timeouts hen the resources really do not exists. Perhaps we could have a timestamp telling when a specific resource was created and allow for something like 10s after that to allow retrying in case of NotFoundErrors...

@radeksimko radeksimko added bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. labels Aug 13, 2018
@github-actions
Copy link

github-actions bot commented Aug 2, 2020

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

@github-actions github-actions bot added the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Aug 2, 2020
@github-actions github-actions bot closed this as completed Sep 2, 2020
@ghost
Copy link

ghost commented Oct 3, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Oct 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. stale Old or inactive issues managed by automation, if no further action taken these will get closed.
Projects
None yet
Development

No branches or pull requests

2 participants