-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non blocking git push #315
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks a lot for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for the PR!
I mostly have some questions for my own learning/understanding :)
@@ -344,6 +404,10 @@ def __init__( | |||
if revision is not None: | |||
self.git_checkout(revision, create_branch_ok=True) | |||
|
|||
# This ensures that all commands exit before exiting the Python runtime. | |||
# This will ensure all pushes register on the hub, even if other errors happen in subsequent operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if user tries to kill the process (for example ctrl + d) while things are getting pushed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user tries to kill the process, it will try to continue pushing:
>>> repo.command_queue
[[push command, status code: running, in progress. PID: 95581]]
>>> (CTRL+C)
Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 95581]].
If you try to exit again:
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/lysandre/Workspaces/Python/huggingface_hub/src/huggingface_hub/repository.py", line 1141, in wait_for_commands
time.sleep(1)
KeyboardInterrupt
and then you're out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect!
src/huggingface_hub/repository.py
Outdated
def clone_from(self, repo_url: str, use_auth_token: Union[bool, str, None] = None): | ||
def clone_from( | ||
self, repo_url: str, use_auth_token: Union[bool, str, None] = None | ||
) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but where/why are we returning a bool here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice catch, that's a mistake from my side
tests/test_repository.py
Outdated
try: | ||
url, result = repo.git_push(blocking=False) | ||
except subprocess.CalledProcessError as exc: | ||
print(exc.stderr) | ||
raise exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we having a try/except here? This seems off for a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I suspect it was in an old test which got copy-pasted around. Fixed that in the last commit.
Add test & typings Implement status code visualization context manager has access to commands Better warning Traditional push should also contribute to queue Add push_to-hub Docs and tests Update src/huggingface_hub/README.md Co-authored-by: Omar Sanseviero <[email protected]> Apply Omar's review
6ae8928
to
af19581
Compare
This PR introduces a non-blocking git push as asked in #169 and proposed by @cccntu in #171.
It is currently painful to have to wait until a
git push
is over before continuing to something else. Since this uses nothing else than bandwidth, it is fine for it to happen behind the scenes.This implementation:
blocking
parameter togit_push
and thecommit
context manager; these variables default toTrue
.blocking=False
, outputs a variable that can be monitored to follow the status of thegit push
commandblocking=False
, outputs a variable that can be monitored to ensure thegit push
command has terminatedcommand_queue
to be managed. More information about this can be found below.atexit
official lib to ensure that all processes have completed before exiting the Python runtime.blocking=False
parameter.It is recommended to monitor the progress of the current push before attempting a second one; however, several pushes may be launched concurrently. See below for the expected behaviors:
What happens when two git pushes are done simultaneously?
If two identical git pushes are done, then no error in the workflow will happen. The remote correctly registers the updated version.
Two times more bandwidth is consumed as both pushes will upload all files.
What happens when two git pushes are done simultaneously, with the second git push having one or several additional commits?
All commits are correctly registered on the remote. More bandwidth is consumed as both pushes will upload all files.
What happens if one of the pushes errors out?
Given that the pushes are independent of each other as the remote isn't updated until a push is complete, the workflow should complete.
The
command_queue
This implementation implements a
command_queue
so that users may be aware of the current status of all commands launched.It can be accessed as follows: