Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New endpoint: create_commits_on_pr #1375

Merged
merged 18 commits into from
Apr 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/source/guides/upload.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,45 @@ but before that, all previous logs on the repo on deleted. All of this in a sing
... )
```

### Upload a folder by chunks

[`upload_folder`] makes it easy to upload an entire folder to the Hub. However, for large folders (thousands of files or
hundreds of GB), it can still be challenging. If you have a folder with a lot of files, you might want to upload
it in several commits. If you experience an error or a connection issue during the upload, you would not have to resume
the process from the beginning.

To upload a folder in multiple commits, just pass `multi_commits=True` as argument. Under the hood, `huggingface_hub`
will list the files to upload/delete and split them in several commits. The "strategy" (i.e. how to split the commits)
is based on the number and size of the files to upload. A PR is open on the Hub to push all the commits. Once the PR is
ready, the commits are squashed into a single commit. If the process is interrupted before completing, you can rerun
your script to resume the upload. The created PR will be automatically detected and the upload will resume from where
it stopped. It is recommended to pass `multi_commits_verbose=True` to get a better understanding of the upload and its
progress.

The example below will upload the checkpoints folder to a dataset in multiple commits. A PR will be created on the Hub
and merged automatically once the upload is complete. If you prefer the PR to stay open and review it manually, you can
pass `create_pr=True`.

```py
>>> upload_folder(
... folder_path="local/checkpoints",
... repo_id="username/my-dataset",
... repo_type="dataset",
... multi_commits=True,
... multi_commits_verbose=True,
... )
```

If you want a better control on the upload strategy (i.e. the commits that are created), you can have a look at the
low-level [`plan_multi_commits`] and [`create_commits_on_pr`] methods.

<Tip warning={true}>

`multi_commits` is still an experimental feature. Its API and behavior is subject to change in the future without prior
notice.

</Tip>


### create_commit

Expand Down
2 changes: 2 additions & 0 deletions docs/source/package_reference/hf_api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ models = hf_api.list_models()

[[autodoc]] HfApi

[[autodoc]] plan_multi_commits

## API Dataclasses

### CommitInfo
Expand Down
10 changes: 10 additions & 0 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@
"logout",
"notebook_login",
],
"_multi_commits": [
"MultiCommitException",
"plan_multi_commits",
],
"_snapshot_download": [
"snapshot_download",
],
Expand Down Expand Up @@ -134,6 +138,7 @@
"comment_discussion",
"create_branch",
"create_commit",
"create_commits_on_pr",
"create_discussion",
"create_pull_request",
"create_repo",
Expand Down Expand Up @@ -339,6 +344,10 @@ def __dir__():
logout, # noqa: F401
notebook_login, # noqa: F401
)
from ._multi_commits import (
MultiCommitException, # noqa: F401
plan_multi_commits, # noqa: F401
)
from ._snapshot_download import snapshot_download # noqa: F401
from ._space_api import (
SpaceHardware, # noqa: F401
Expand Down Expand Up @@ -413,6 +422,7 @@ def __dir__():
comment_discussion, # noqa: F401
create_branch, # noqa: F401
create_commit, # noqa: F401
create_commits_on_pr, # noqa: F401
create_discussion, # noqa: F401
create_pull_request, # noqa: F401
create_repo, # noqa: F401
Expand Down
6 changes: 0 additions & 6 deletions src/huggingface_hub/_commit_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,17 +439,11 @@ def fetch_upload_modes(
preupload_info = _validate_preupload_info(resp.json())
upload_modes.update(**{file["path"]: file["uploadMode"] for file in preupload_info["files"]})

# If a file is empty, it is most likely a mistake.
# => a warning message is triggered to warn the user.
# => except if `.gitkeep` as it is a legit use case for an empty file.
#
# Empty files cannot be uploaded as LFS (S3 would fail with a 501 Not Implemented)
# => empty files are uploaded as "regular" to still allow users to commit them.
for addition in additions:
if addition.upload_info.size == 0:
path = addition.path_in_repo
if not path.endswith(".gitkeep"):
warnings.warn(f"About to commit an empty file: '{path}'. Are you sure this is intended?")
upload_modes[path] = "regular"

return upload_modes
Expand Down
Loading