Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

route53: aws-sdk-go-v2 broke IAM instance role #2033

Closed
3 tasks done
nickjmv opened this issue Oct 12, 2023 · 6 comments
Closed
3 tasks done

route53: aws-sdk-go-v2 broke IAM instance role #2033

nickjmv opened this issue Oct 12, 2023 · 6 comments

Comments

@nickjmv
Copy link

nickjmv commented Oct 12, 2023

Welcome

  • Yes, I'm using a binary release within 2 latest releases.
  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've included all information below (version, config, etc).

What did you expect to see?

A certificate is generated by using the AWS EC2 instance profile role.

What did you see instead?

An error message about the AWS EC2 IMDS.

How do you use lego?

Docker image

Reproduction steps

Renew an existing certificate by letting the docker image by making use of the instance profile of the AWS EC2 machine.

It works when using role assumption by passing a profile other than 'default' to the docker image. But using the attached instances profile role the error is generated.
Another fix is using lego v4.13.2 which is still using the old AWS sdk.

Version of lego

v4.14.2

Logs

2023/10/12 09:43:05 [INFO] [xxx.sub.domain.com] acme: Trying renewal with -6 hours remaining
2023/10/12 09:43:05 [INFO] renewal: random delay of 1m31.195523715s
2023/10/12 09:44:36 [INFO] [xxx.sub.domain.com] acme: Obtaining bundled SAN certificate
2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/273116278446
2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: Could not find solver for: tls-alpn-01
2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: Could not find solver for: http-01
2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: use dns-01 solver
2023/10/12 09:44:37 [INFO] [xxx.sub.domain.com] acme: Preparing to solve DNS-01
2023/10/12 09:44:42 [INFO] [xxx.sub.domain.com] acme: Cleaning DNS-01 challenge
2023/10/12 09:44:43 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/273116278446
2023/10/12 09:44:43 error: one or more domains had a problem:
[xxx.sub.domain.com] [xxx.sub.domain.com] acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded

Go environment (if applicable)

$ go version && go env
# paste output here
@nickjmv nickjmv added the bug label Oct 12, 2023
@ldez ldez changed the title aws-sdk-go-v2 broke IAM instance role route53: aws-sdk-go-v2 broke IAM instance role Oct 12, 2023
@ldez
Copy link
Member

ldez commented Oct 12, 2023

Hello,

I think this is an internal change in the SDK.

acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded

The error comes from here.

I'm not a specialist in AWS, and the SDK migration guide is really weak.

I don't know if it's an expected behavior for the new SDK, a bug of the SDK, or something else.

@nickjmv
Copy link
Author

nickjmv commented Oct 16, 2023

I read on the AWS documentation that IMDs v1 and v2 should both work. So I'm kind of puzzled about why we are receiving the error.

Will you do some extra testing on this? Or what actions do you see next? I assume there are multiple users that encounter this.

@ldez
Copy link
Member

ldez commented Oct 16, 2023

I assume there are multiple users that encounter this.

As you can see it seems you are alone with this problem (no thumbs up, no other report)

what actions do you see next?

I don't know because based on the code I have no idea of the real root of the problem.

@triplepoint

This comment was marked as duplicate.

@ldez
Copy link
Member

ldez commented Dec 7, 2023

#2067 (comment)

@triplepoint
Copy link

triplepoint commented Dec 31, 2024

If anyone else comes along this path, I had a related issue masking this fix, and just figured it out.

Specifically:

  • I'm trying to have Traefik v2.11.16 use Lets Encrypt to maintain TLS certs for my various dockerized services
  • I did not want to specify my AWS key id/secret pair to traefik, and instead wanted it to source those on the ec2 instance from the AWS IMDS internal instance service (the internal service hosted on http://169.254.169.254/ on an ec2 instance)
  • This was working with older versions of Traefik, and broke around the time this Golang aws sdk lib update made the AWS_REGION a required value

I was seeing that supplying the AWS_REGION env var in my traefik instance's docker-compose file had no effect.

It wasn't until I changed the ec2 instance's IMDS metadata to allow 2 hops instead of 1 that I saw the dockerized Traefik instance be able to hit the EC2 instance's IMDS service.

I tried it with the hop limit bumped from 1 to 2, but without the AWS_REGION env var set, and got a different error than I was seeing before:

acme: error presenting token: route53: failed to determine hosted zone ID: operation error Route 53: ListHostedZonesByName, failed to resolve service endpoint, endpoint rule error, Invalid Configuration: Missing Region

So both these changes are necessary: providing the AWS_REGION (in my case, us-west-2) environment variable to traefik, and also bumping the instance's metadata configuration to set "Metadata response hop limit" to 2 instead of 1.

See these references:

In my case, my ec2 instance was managed by Terraform, so I needed to add a metadata options section to my "aws_instance" define:
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance#metadata-options

  metadata_options {
    # So docker can access ec2 metadata
    # see https://github.com/aws/aws-sdk-go/issues/2972
    http_put_response_hop_limit = 2
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants