Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically clean up docker images in the registry without a tag pointing to them #21673

Open
kolaente opened this issue Nov 3, 2022 · 21 comments
Assignees
Labels
proposal/accepted We have reviewed the proposal and agree that it should be implemented like that/at all. topic/packages type/feature Completely new functionality. Can only be merged if feature freeze is not active.

Comments

@kolaente
Copy link
Member

kolaente commented Nov 3, 2022

When pushing new docker images for an existing tag, the old image still exists and uses up storage one the server. While you can use images just by pointing to their sha, I've yet to find someone who actively uses that. For my own registry (portus) I have a cron job to automatically remove everything that does not have a tag pointing to it. Docker even has a command for this.

Having a cleanup job like that would allow to keep old versions but still solve the storage space problem.

@KN4CK3R in #21658 (comment):

No, only if it's "older than" or not included in the "keep pattern". But it should be no problem to add a special logic here because there is already the custom Version == "latest" for containers.

Gitlab has an automatic garbage collection process for this: https://docs.gitlab.com/ee/administration/packages/container_registry.html#removing-untagged-manifests-and-unreferenced-layers

I think it's best to discuss this before implementing, mostly regarding these open questions:

  1. Should this be enabled automatically?
  2. Should this be a repo/org setting or a global config one?
@kolaente kolaente added type/feature Completely new functionality. Can only be merged if feature freeze is not active. topic/packages labels Nov 3, 2022
@KN4CK3R
Copy link
Member

KN4CK3R commented Nov 3, 2022

Just for clarification, a repo has no impact on packages:

Should this be a repouser/org setting or a global config one?

I checked again how I implemented this and currently there are no untagged images in the container registry! (Exception: If you upload a multiarch image, the different arches are untagged images) If you tag and push an image you can later pull that image with the tag and its hash. If a tag gets pushed again the old tag/version gets removed and that deletes the hash reference too. So after that operation there is no untagged image available anymore.

if pv, err = packages_model.GetOrInsertVersion(ctx, _pv); err != nil {
if err == packages_model.ErrDuplicatePackageVersion {
if err := packages_service.DeletePackageVersionAndReferences(ctx, pv); err != nil {
return nil, err

So at the moment the cleanup does not need to remove untagged images because there are none. The question should first be "Should Gitea keep untagged version?"

@silverwind
Copy link
Member

Use case sounds pretty similar to git gc which we already automatically run as a cron IIRC.

Should this be enabled automatically?

If it's stable, I'd say so.

Should this be a repo/org setting or a global config one?

I think global is sufficient. Ideally it should just be another cron to cleanup orphaned images, like we already do for orphaned git commits via git gc.

@theodiem
Copy link

theodiem commented Jan 5, 2023

I've came across this issue after experiencing the same effect.
Building multiarch images when only the manifest is tagged, left me with lots of "packages" behind with only the digest (the manifest had only one copy since it was tagged).

Tagging each arch so it gets overwritten makes the "details" tab a bit impractical when you have too much different arch and versions (for matrix builds).

Should this be a repo/org setting or a global config one?

In my case, I would be happy with the exact same global mechanism described (similar to the cron that runs git gc)

@salasrod
Copy link

salasrod commented Feb 3, 2023

I am also looking for a similar feature, going out of my way to manually prune images is painful.

@lunny
Copy link
Member

lunny commented Feb 3, 2023

Doesn't #21658 resolved the issue?

@kolaente
Copy link
Member Author

kolaente commented Feb 3, 2023

@lunny I didn't test it but I don't think so. The PR allows to configure rules for removal of tags, I just want to remove every image layer not associated with a tag.

@peiwenxu
Copy link

Is this still happening?

@jum
Copy link

jum commented Sep 20, 2023 via email

@silverwind silverwind added the proposal/accepted We have reviewed the proposal and agree that it should be implemented like that/at all. label Sep 21, 2023
@silverwind
Copy link
Member

silverwind commented Sep 21, 2023

No one has implemented this yet, but it's definitely a vital feature to conserve disk space.

Maybe it should be disabled by default to support pulling image by hash, which is a rare, but valid use case.

@c521wy
Copy link

c521wy commented Jan 27, 2024

Does anyone tried this cleanup rule?

image

@kolaente
Copy link
Member Author

Does anyone tried this cleanup rule?
image

Using that and then checking with the preview yields no results, does not look like its working.

@kolaente
Copy link
Member Author

kolaente commented Jan 30, 2024

It looks like the official docker registry implementation uses this function to find and remove all untagged layers, as described here.

@KN4CK3R As far as I understood from glancing over the code, Gitea does not just "embed" the official registry package, so it's not as easy as just copying or calling that function?

@mhkarimi1383
Copy link

I'm facing the same issue with the latest version of gitea

@ViRb3
Copy link

ViRb3 commented Jul 27, 2024

Does anyone tried this cleanup rule?

image

The following seems to work perfectly! It deletes all images that do not have an associated tag with them. I would just suggest using ^sha256:.+ instead, as you could otherwise match a tag that for some reason has sha256 in the middle.

@KimonHoffmann
Copy link

Be careful with this approach, when using multi platform images!
In this case the individual platform images might be untagged, but the images themselves may still be referenced (by the multi platform manifest that is). I'm currently trying to deal with this problem myself and have not yet found a way that does not require deeper insight into the relationships of the images involved.

If someone has something to suggest that'd be very welcome!

@gjung56
Copy link

gjung56 commented Sep 5, 2024

Yes, the cleanup rule delete platform variants images.

Until we can find a integrated solution, I ended up with an external cronjob that prune old images in my self-hosted instance.

I fetched the registry api and used the gitea golang sdk, in a hacky way but It's working.

gitea_registry_prune.go.txt

@stuzer05
Copy link
Contributor

Yes, the cleanup rule delete platform variants images.

Until we can find a integrated solution, I ended up with an external cronjob that prune old images in my self-hosted instance.

I fetched the registry api and used the gitea golang sdk, in a hacky way but It's working.

gitea_registry_prune.go.txt

Thank you, that works perfectly as needed!

@lunny will there be a solution to cleanup orphan registry images? Space grows on server so this manual hack is the only way cleanup space.

@lunny
Copy link
Member

lunny commented Dec 13, 2024

I don't think so. I will take a look at this problem.

@lunny lunny self-assigned this Dec 13, 2024
@stuzer05
Copy link
Contributor

I don't think so. I will take a look at this problem.

Orphan images occur when pushing with the same label (e.g. latest) to container registry. Those "behind" images are the problem

@philkunz
Copy link

philkunz commented Feb 7, 2025

Any updates on this one? We just ran into this problem, pushing a lot of latest images for internal build tools on our code.foss.global instance.

@stuzer05
Copy link
Contributor

stuzer05 commented Feb 7, 2025

Any updates on this one? We just ran into this problem, pushing a lot of latest images for internal build tools on our code.foss.global instance.

Try this for now https://gitea.stuzer.link/stuzer05/gitea-docker-registry-prune

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal/accepted We have reviewed the proposal and agree that it should be implemented like that/at all. topic/packages type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

No branches or pull requests