Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out docker hub caching #176

Closed
ehuss opened this issue Oct 26, 2024 · 14 comments
Closed

Figure out docker hub caching #176

ehuss opened this issue Oct 26, 2024 · 14 comments
Assignees

Comments

@ehuss
Copy link

ehuss commented Oct 26, 2024

In GitHub Actions we periodically have problems hitting the Docker Hub rate limit which was introduced in November 2020 (error is "429 Too Many Requests"). This hits any repo using Docker (such as rust-lang/rust, and rust-lang/cargo).

The anonymous Docker Hub rate limit is 100 pulls / 6 hours / IP. source

There have been a few solutions proposed:

  1. Authenticate with Docker Hub. This changes the rate limit to 200 pulls / 6 hours / account. I do not know if that is sufficient for our needs across the org. I get the impression that infra team members do not like this option.
  2. Mirror in GitHub Container registry (docs). From what I can tell, there doesn't seem to be a read limit. There is a 10GB/layer limit, which I think is fine. The main drawback is that it requires manually updating the images (like when new Ubuntu images are released).
    • I do not think we update the base images very often. I do not know if the infra team has a pre-existing mechanism for uploading, or how difficult that is to do manually.
  3. Use Amazon ECR. I think the ECR Public Gallery has many of the images we typically use. I am uncertain, but I think the unauthenticated rate limit is 500GB/month (per IP?) source. We could authenticate using OIDC, which raises the cap to 5 TB / month.

I do not know what the performance and reliability compares between ghcr and amazon ecr.

Would the infra team have a preference here? I prefer whatever is easiest 😜. ghcr seems appealing to me if the infra team is ok with handling uploading new images.

@Mark-Simulacrum
Copy link
Member

Authenticate with Docker Hub. This changes the rate limit to 200 pulls / 6 hours / account. I do not know if that is sufficient for our needs across the org. I get the impression that infra team members do not like this option.

Yeah, I'd prefer to avoid ~personal/team accounts on Docker Hub, seems like unnecessary hassle, and 200 pulls / 6 hours also doesn't feel that high that this fully solves the problem.

Use Amazon ECR. I think the ECR Public Gallery has many of the images we typically use. I am uncertain, but I think the unauthenticated rate limit is 500GB/month (per IP?) source. We could authenticate using OIDC, which raises the cap to 5 TB / month.

It is per IP: " *** Data transferred out from public repositories is limited by source IP when an AWS account is not used." (https://aws.amazon.com/ecr/pricing/)

5 TB isn't a cap, it's just the free tier. Past that we start paying, but I'd expect that in practice we wouldn't use much beyond 5 TB (if at all, that's a pretty large amount of data).

For ECR, if it's not in the existing public gallery, we could probably configure pull through caching (https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache.html), though it sounds like that would require authentication. I'm much more comfortable with not ending up needing multiple Docker hub accounts to distribute load (as seems likely if e.g. rust-lang/rust uses this).

I do not think we update the base images very often. I do not know if the infra team has a pre-existing mechanism for uploading, or how difficult that is to do manually.

I think base images get updated pretty regularly? At least I'd expect that e.g. ubuntu:22.04 is getting updates constantly -- it was updated just 8 days ago (way after initial release) https://hub.docker.com/layers/library/ubuntu/22.04/images/sha256-3d1556a8a18cf5307b121e0a98e93f1ddf1f3f8e092f1fddfd941254785b95d7?context=explore

@marcoieni marcoieni self-assigned this Dec 10, 2024
@marcoieni
Copy link
Member

This is on my todo list now 👍
I will switch from dockerhub to aws ecr as agreed here

@marcoieni
Copy link
Member

ECR issues

I'm summarizing some of the discussions we had in Zulip.

The ECR unauthenticated rate limit of 500GB/month per IP is an issue for the following reasons:

  • For contributors, authenticating to AWS ECR is more complicated than to DockerHub.
    I can imagine people would prefer to edit the base image in the Dockerfile rather than setting up AWS CLI and configuring it. This isn't a great developer experience.
  • In PRs we can't authenticate to aws, so we are subject to the unauthenticated rate limit.
    This is because PR jobs from forks can't read GitHub secrets (see here), so they can't authenticate with aws.

An option is hardcoding the AWS key using to authenticate to ECR, but
this would allow external people to use the key to download from ECR public gallery for free by using our account. So we would pay for their traffic.
If they use the key in a private repo, we wouldn't notice.

GHCR

I suggest we use GHCR instead and forget about rate limits.

I found two solutions to periodically mirror the images from DockerHub to GHCR:

@Mark-Simulacrum
Copy link
Member

FYI -- https://github.com/orgs/community/discussions/45253#discussioncomment-4769997 suggests there are some hidden, internal rate limits for pushing to GHCR at least, though I somewhat doubt we'll hit them in practice. Generally speaking using GHCR seems pretty reasonable, if we can get the mirroring at least semi-automated -- that will take a bit of work but probably not too hard.

@marcoieni
Copy link
Member

marcoieni commented Jan 13, 2025

It was very easy to setup:

@marcoieni
Copy link
Member

marcoieni commented Jan 17, 2025

I created rust-lang/rust#135574 but it didn't work:

Error: PUT https://ghcr.io/v2/rust-lang/ubuntu/manifests/22.04: DENIED: permission_denied: write_package

Logs here.

I will investigate

@Kobzol
Copy link

Kobzol commented Jan 17, 2025

I think that the issue might be that we're trying to push to the organization's image hub, rather than the repository's image hub, I think that more permissions are needed for that. But I suppose we could just push to rust-lang/rust/image instead, that should be fine.

@marcoieni
Copy link
Member

Something was pushed
https://github.com/rust-lang/rust/pkgs/container/ubuntu

It just failed to write the manifest, which means the tags aren't there 🤔

@marcoieni
Copy link
Member

So weird that the repository is listed twice. this is probably a gh bug

Image

@Kobzol
Copy link

Kobzol commented Jan 17, 2025

Isn't it r-l/rust and r-l-ci/rust?

@marcoieni
Copy link
Member

marcoieni commented Jan 17, 2025

no 🙈
it's r-l/r

@marcoieni
Copy link
Member

I reported this bug here

@marcoieni
Copy link
Member

marcoieni commented Jan 17, 2025

  1. I tried to reproduce the issue in a test org I have. https://github.com/marco-test-org/dockerhub-mirror The job worked fine.
  2. I thought this could be a flaky error, so I deleted the rust-lang/ubuntu package and triggered the workflow again.
  3. it worked
Image

The image now has one rust repo

Image

@marcoieni
Copy link
Member

We now have a workflow to mirror images from dockerhub to ghcr.

I'll close this issue. If you see a dockerhub rate limit error, let t-infra know and we'll evaluate what to do. E.g. if it's worth mirroring the image or just accept the one-time flakiness because it's very rare and retry later 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants