-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CommitOperationCopy #1495
add CommitOperationCopy #1495
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lhoestq thanks for opening a PR on this topic! I've made a few comments on your work. The 2 main things I would change in the logic:
- fail early in case of missing file on the repo or non-LFS file
- handle
list_files_info
differently (returned value is a bit misleading but basically you can't assume the output to be aligned 1-1 with the input) 😕
Also listed a few things to complete before being ready to merge. Could you:
- Add in the existing test a second copy from same source but different destination (basically make a commit to copy twice the same file to 2 different location) => we should ensure it works (not the case currently)
- Update
operations
argument increate_commit
docstring - Add small section + example in the upload guide
- Add
CommitOperationCopy
to package reference
Thanks a lot in advance! 🤗
Note: I made some changes to update _multi_commits.py
(no need to care about that one) accordingly + make CommitOperationCopy
a first-class citizen in huggingface_hub.__init__.py
. You should pull from the branch before making changes :)
for src_revision, operations in groupby(copies, key=lambda op: op.src_revision): | ||
operations = list(operations) # type: ignore | ||
paths = [op.src_path_in_repo for op in operations] | ||
src_repo_files = hf_api.list_files_info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_files_info
can be a bit misleading but basically you cannot assume that the returned files will be in the same order as the input paths + you cannot even assume that the list will be the same size. In particular:
- if a path is duplicated (i.e. user is copying a file multiple times), the server will return only one RepoFile instance
- if a path is missing on the repo, the server will silently ignore it
So what I would advice to do is:
- take as input a
List[CommitOperationCopy]
instead ofIterable
- do a first pass with
for src_revision, operations in groupby(copies, key=lambda op: op.src_revision):
list_files_info
- if a returned RepoFile has
.lfs = None
=> raise an issue - add to
files_to_copy
usingRepoFile.rfilename
,src_revision
andRepoFile.lfs["sha256"]
- do a second pass on
copies
- if
op.src_path_in_repo
/op.src_revision
pair is not found in the sha256 dictionary => raise the EntryNotFound error
- if
(disclaimer: I haven't tested the above logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good ! why switch to List instead of Iterable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can iterate on a list twice. On an iterable it's not guaranteed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I only iterate on it once using groupby so we're good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect :)
Took all your comments into account :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for making the changes :)
I'll quickly update _multi_commits.py
and we'll be ready to merge.
feel free to merge once it's good for you ;) |
close #1083