Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New endpoint: create_commits_on_pr #1375

Merged
merged 18 commits into from
Apr 17, 2023
Merged

New endpoint: create_commits_on_pr #1375

merged 18 commits into from
Apr 17, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Mar 3, 2023

Step 2 of #1352 (see #1352 (comment)).

Here is my test repo with an example of PR where I pushed all the huggingface_hub source code to the repo. This is a toy example but it should work the same no matter the number and size of the files.


EDIT: I refactored the implementation / the description.

High-level example:

from huggingface_hub import upload_folder

upload_folder(
    repo_id=repo_id,
    folder_path="path/to/folder",
    multi_commits=True,
    multi_commits_verbose=True,
    ignore_patterns="*__pycache__*",
    delete_patterns="*",
)

=> existing files are deleting and local folder is uploaded. Happens in multiple commits, pushed to a PR. PR is either merged (if create_pr=False) or not.

Low-level example:

# Define low level operations
operations = [...] # list of CommitOperationAdd and CommitOperationDelete

# Split operations in commits
# Can be done manually or use a predefined strategy (based on max nb files/max upload size per commit) 
addition_commits, deletion_commits = plan_multi_commits(operations)
# Now we have 2 lists of commits. Additions and deletions are separated (deletions are pushed first).

# Create commits => will return the URL to the PR
api = HfApi()
api.create_commits_on_pr(
    repo_id=...,
    commit_message=...,
    addition_commits=addition_commits,
    deletion_commits=deletion_commits,
    merge_pr=False,
)

How it works internally:

  1. Each commit contains only additions or only deletions. Deletions are pushed first and additions later. Order of operations is not guaranteed but shouldn't be a problem.
  2. Each commit has a deterministic ID => the multi-commit has also a deterministic ID => persistent across re-runs
  3. Create a PR with multi-commit ID in the name or retrieve an existing one.
  4. List commits already pushed.
  5. Push remaining commits. Each commit contains the commit ID to be recognized later.
  6. In the meantime, PR description gets updated with the progress.
  7. At the end, PR is set as "open for reviews" and merged automatically if merge_pr=True.

TODO:

  • add unit tests
  • add objects to the package reference
  • adapt the existing upload guide

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 3, 2023

The documentation is not available anymore as the PR was closed or merged.

@Wauplin Wauplin changed the title [WIP] New endpoint: commit_by_chunks [WIP] New endpoint: commit_in_chunks Mar 3, 2023
@codecov
Copy link

codecov bot commented Mar 3, 2023

Codecov Report

Patch coverage: 26.77% and project coverage change: -2.27 ⚠️

Comparison is base (25e20ef) 84.34% compared to head (167d320) 82.08%.

❗ Current head 167d320 differs from pull request most recent head ef093a0. Consider uploading reports for the commit ef093a0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1375      +/-   ##
==========================================
- Coverage   84.34%   82.08%   -2.27%     
==========================================
  Files          52       49       -3     
  Lines        5430     5168     -262     
==========================================
- Hits         4580     4242     -338     
- Misses        850      926      +76     
Impacted Files Coverage Δ
src/huggingface_hub/__init__.py 75.75% <ø> (ø)
src/huggingface_hub/_commit_api.py 92.72% <ø> (-0.09%) ⬇️
src/huggingface_hub/hf_api.py 83.09% <13.75%> (-5.81%) ⬇️
src/huggingface_hub/_multi_commits.py 35.41% <35.41%> (ø)
src/huggingface_hub/community.py 88.63% <57.14%> (-2.73%) ⬇️

... and 11 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I think you could say "commit" instead of "step" or "chunk" no ? And call this a "multi-commit operation" ? This way you don't introduce new terms.

@Wauplin Wauplin changed the title [WIP] New endpoint: commit_in_chunks New endpoint: create_commits_on_pr Mar 15, 2023
@Wauplin
Copy link
Contributor Author

Wauplin commented Mar 15, 2023

@lhoestq I have updated the PR to introduce less terminology

@Wauplin Wauplin marked this pull request as ready for review March 15, 2023 17:24
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it !

@LysandreJik
Copy link
Member

The code looks good to me @Wauplin, but I'll let @julien-c approve/reject it as he might have opinions about opening PRs

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me from quick glance!

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 17, 2023

As a first start for this feature, I'll release it as "experimental" meaning it triggers a warning when used ("we do not guarantee long-term support"). I'm doing so to ensure we have flexibility if we want to re-think this process in the future.

@Wauplin Wauplin merged commit 622075b into main Apr 17, 2023
@Wauplin Wauplin deleted the 1352-create-chunked-commits branch April 17, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants