Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #2219 - adding a subcommand to enable deleting parts of a repo in the local cache #2221

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sealad886
Copy link
Contributor

Initial thoughts:

Going with repo-cache, use case would be something like:

> huggingface-cli repo-cache delete --all repo_id     # same functionality as delete-cache but quick to delete just the one repo
> huggingface-cli repo-cache delete --include (glob) repo_id.  # use same glob format as download --include flag; mark files to __include__ for delete action
> huggingface-cli repo-cache delete --exclude [glob] repo_id. # same as --include, but opposite
> huggingface-cli repo-cache delete --file-list (list of files to remove) repo_id

Also, a starting point is that the code references a frozen class DeletesCacheStrategy. I will do more digging, but there may be a framework that needs a little polish to make this very straightforward. Dangerous words in this business, I tell ya.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin
Copy link
Contributor

Wauplin commented Apr 18, 2024

Hi @sealad886, thanks for the proposal and sorry for the long delay! So I've looked back to your snippet (and suggestions in #2219 (comment)) and I think we could modify it a bit. What do you it of:

> huggingface-cli cache delete-repo repo_id
> huggingface-cli cache delete-repo repo_id --repo-type=dataset
> huggingface-cli cache delete-repo repo_id --include (glob)
> huggingface-cli cache delete-repo repo_id --exclude (glob)
> huggingface-cli cache delete-repo repo_id --files=file1,file2,file3,file4

?

Let's not focus too much on what's existing in huggingface-cli delete-cache BTW. I would prefer to have a unified experience to delete things from the cache but that's not really possible while maintaining an intuitive API. At least we can start by having a subcommands in a generic huggingface-cli cache part.

@sealad886
Copy link
Contributor Author

@Wauplin those look like excellent starts. I'll probably also includes a --dirs (or if there's another naming convention used elsewhere in the repo, I'll rename it) to be explicit about deleting whole directories.

I had been looking at some of the git -i stuff only recently, and it looks a lot like a utility that I've been developing for Ollama CLI (also centering on cache management). (Ollamaultil)[https://github.com/sealad886/ollama_util] is my take on a interface-based cache management utility (different focus: I use it to move cache files between on-disk and external storage--and back--since I'm on a laptop); it's not really adapted for components to be called at the command line, but it could be. Just wanted to throw that out there.

In terms of design, I think I'm going to start by actually enhancing the DeleteCacheStrategy class as the helper tool it is, and then working up from there. I've also noticed that the directories can sometimes fail to be deleted if (laughs with this one) '.DS_Store' files exist in the directory that's being deleted, and the delete process overall fails mid-way through. I'll fix this too.

Also...verbosity.

So, to summarize my clearly scatterbrained thoughts:

  1. CLI command delete-repo and associated options as noted above.
  2. Fix .DS_Store or other hidden files preventing the final deletion of a directory when the whole repo is being deleted.
  3. Add verbosity, for now only to give some concept of progress when deleting large amounts of cached data.
  4. Possibility of some kind of utility in the future?

I have time this week to actually dig into this more, so I'll get crackin'.

@Wauplin
Copy link
Contributor

Wauplin commented Apr 23, 2024

Thanks for writing up your thoughts @sealad886! This looks like a good plan to me. Please let me know if yo u need any assistance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants