Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[infra] Hail should automatically and periodically cleanup unnecessary images #13441

Closed
danking opened this issue Aug 15, 2023 · 2 comments · Fixed by #13489
Closed

[infra] Hail should automatically and periodically cleanup unnecessary images #13441

danking opened this issue Aug 15, 2023 · 2 comments · Fixed by #13489
Assignees
Labels

Comments

@danking
Copy link
Contributor

danking commented Aug 15, 2023

What happened?

I have a PR that proposes something roughly along these lines: https://github.com/hail-is/hail/pull/13057/files It has some problems:
1. It should use a cron job.
2. The last time I ran this code, I'm pretty sure I deleted things I shouldn't have. We should audit the find-expired-images.py code again.

We should probably just use https://cloud.google.com/artifact-registry/docs/repositories/cleanup-policy

Version

0.2.120

Relevant log output

No response

@danking danking added the bug label Aug 15, 2023
@danking
Copy link
Contributor Author

danking commented Aug 15, 2023

Our production images are mostly tagged with a deploy- prefix but there are the third-party/images.txt which we need to handle differently.

(base) dking@wm28c-761 gar-cleaner % k get pods -o json | jq -r '.items[].spec.containers[].image' | sort -u
ghost:3.0-alpine
prom/prometheus:v2.34.0
us-docker.pkg.dev/hail-vdc/hail/admin-pod:deploy-qd833uw7kcyn
us-docker.pkg.dev/hail-vdc/hail/auth:deploy-crsithjyoxfg
us-docker.pkg.dev/hail-vdc/hail/batch:deploy-kpd6nqk4t25o
us-docker.pkg.dev/hail-vdc/hail/batch:deploy-v1yv8cgd1003
us-docker.pkg.dev/hail-vdc/hail/blog_nginx:deploy-wnrqjf4h6qto
us-docker.pkg.dev/hail-vdc/hail/ci:deploy-du68h4bouvp9
us-docker.pkg.dev/hail-vdc/hail/envoyproxy/envoy:v1.22.3
us-docker.pkg.dev/hail-vdc/hail/grafana/grafana:9.1.4
us-docker.pkg.dev/hail-vdc/hail/monitoring:deploy-ljz4mgjf132m
us-docker.pkg.dev/hail-vdc/hail/notebook:deploy-gmftvyf0op87
us-docker.pkg.dev/hail-vdc/hail/notebook_nginx:deploy-n9uipfhjn3jg
us-docker.pkg.dev/hail-vdc/hail/website:deploy-gb1372nuge4g

@danking
Copy link
Contributor Author

danking commented Aug 15, 2023

Keeping the most recent 10 deploy-m pr- and dev- images seems reasonable. I think we also use the cache- prefix. We maybe should keep 10 each of those?

Anything that's untagged is absolutely good to delete.

@jigold jigold self-assigned this Aug 24, 2023
danking pushed a commit that referenced this issue Aug 29, 2023
Fixes #13441 

Usage: `python3 devbin/generate_gcp_ar_cleanup_policy.py >
my_policy_file.txt`
Reference:
https://cloud.google.com/artifact-registry/docs/repositories/cleanup-policy
Note, we have a maximum of 10 policies available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants