aws_security_group_rule.*: Error finding matching Security Group Rule #5529

mildred · 2018-08-13T09:53:40Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

terraform version 0.11.7
aws provider version 1.27.0 (customized with PR aws_lb_listener: wait for listener creation #5167 Allow configurable timeout when reading security group rule #3911 Add retry mechanism and timeout when reading ecr_repository #3910 merged in)

Affected Resource(s)

aws_security_group_rule

(but the same problem can probably occur with many other resources)

Important Factoids

Terraform code is nothing specific. A security group and a security group rule. Looking at the debug log, I found that the following sequence of events occurred within resource_aws_security_group_rule.go function resourceAwsSecurityGroupRuleCreate

DescribeSecurityGroups
AuthorizeSecurityGroupIngress
compute security group rule id sgrule-*
DescribeSecurityGroups
- if the security group is not found, fatal error (this is what happened)
- if the security group exists but the rule does not exists, retry until the read timeout

What's really strange and counter intuitive is that the first DescribeSecurityGroups call returns the security group details while the second DescribeSecurityGroups call returns that the group is not found.

This occurs right after security group creation. Investigating this with AWS support, they told me that the API calls are only eventually consistent and this kind of glitches is to be expected until everything in AWS becomes aware of the security group existence :

I understand that you are getting an error "InvalidGroup.NotFound" for DesscribeSecurityGroups after a successful CreateSecurityGroup Go SDK API call.

On reviewing the CloudTrailLogs, I see that there was a successful CreateSecurityGroup API call for sg-0559f279 on 2018-07-20, 11:05:12 UTC, and from the logs, I see the on 2018-07-20, 11:05:15 UTC you made a DesscribeSecurityGroups API call. This call was made 3 seconds after CreateSecurityGroup API call. The security group that was made, was not yet propagated throughout the system due to distributed nature of the system supporting the API.

The Amazon EC2 API follows an eventual consistency model, due to the distributed nature of the system supporting the API. This means that the result of an API command you run that affects your Amazon EC2 resources might not be immediately visible to all subsequent commands you run. [1]

Eventual consistency can affect the way you manage your resources. For example, if you run a command to create a resource, it will eventually be visible to other commands. This means that if you run a command to modify or describe the resource that you just created, its ID might not have propagated throughout the system, and you will get an error responding that the resource does not exist. You may get the following errors: [1]

InvalidInstanceID.NotFound

InvalidGroup.NotFound

InstanceLimitExceeded

[1] Eventual Consistency: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#eventual-consistency

I believe there is no easy solution to guard against this specific, but this is a bug nonetheless. I don't believe it's possible to retry on every NotFoundError because then we'd reach timeouts hen the resources really do not exists. Perhaps we could have a timestamp telling when a specific resource was created and allow for something like 10s after that to allow retrying in case of NotFoundErrors...

The text was updated successfully, but these errors were encountered:

github-actions · 2020-08-02T17:42:57Z

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

ghost · 2020-10-03T17:12:40Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

radeksimko added bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service. labels Aug 13, 2018

github-actions bot added the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Aug 2, 2020

github-actions bot closed this as completed Sep 2, 2020

ghost locked as resolved and limited conversation to collaborators Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_security_group_rule.*: Error finding matching Security Group Rule #5529

aws_security_group_rule.*: Error finding matching Security Group Rule #5529

mildred commented Aug 13, 2018

github-actions bot commented Aug 2, 2020

ghost commented Oct 3, 2020

aws_security_group_rule.*: Error finding matching Security Group Rule #5529

aws_security_group_rule.*: Error finding matching Security Group Rule #5529

Comments

mildred commented Aug 13, 2018

Community Note

Terraform Version

Affected Resource(s)

Important Factoids

github-actions bot commented Aug 2, 2020

ghost commented Oct 3, 2020