-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push to hub save #15327
Push to hub save #15327
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for me! Something to keep in mind here (might be a bit complex), is that git is aware of what commits have been pushed to the remote only when it has done pushing that commit.
It means the following:
repo.git_commit(...)
repo.git_push(blocking=False)
...
repo.git_commit(...)
repo.git_push(blocking=False) <--- if the previous push was not finished, it will push the two commits instead of the last one, resulting in a 2x bigger (and slower) push
This means that if the pushes are very frequent, or with very big models, then it wouldn't be surprising to have each git_push
push the previous commit(s) alongside it, resulting in each following invocation pushing bigger and bigger pushes.
One way to prevent that would be to ensure that all previous pushes have been finished before starting a new one. I'm not sure there is a perfect solution to this.
I think you're missing the fact that, apart from a push asked explicitly with |
Then that's perfect :) |
`output_dir` exists, it needs to be a local clone of the repository to which the [`Trainer`] will be | ||
Whether or not to push the model to the Hub every time the model is saved. If this is activated, | ||
`output_dir` will begin a git directory synced with the the repo (determined by `hub_model_id`) and the | ||
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling | |
content will be pushed each time a save is triggered (depending on your `save_strategy`). Calling |
Whether or not to push the model to the Hub every time the model is saved. If this is activated, | ||
`output_dir` will begin a git directory synced with the the repo (determined by `hub_model_id`) and the | ||
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling | ||
[`~Trainer.save_model`] will also trigger a push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
[`~Trainer.save_model`] will also trigger a push | |
[`~Trainer.save_model`] will also trigger a push. |
@@ -2051,6 +2051,10 @@ def save_model(self, output_dir: Optional[str] = None): | |||
elif self.args.should_save: | |||
self._save(output_dir) | |||
|
|||
# Push to the Hub when `save_model` is called by the user. | |||
if self.args.push_to_hub and not _internal_call: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this. Don't we also want to push to the Hub automatically during training (which is an internal call)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push_to_hub
calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
which calls push_to_hub
which calls save_model
...
This internal argument is here to avoid that ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining and for the PR!
Unfortunately I'm still getting the behaviour in which the model is getting saved during training but it's not getting pushed. https://colab.research.google.com/drive/1GAXf3egH2GDbk7M0btdKWbLerBqLoJPi?usp=sharing. I have a commit saving at step 500 but no push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Omar! I just checked your colab, and ran it. You have to wait for a solid 15 minutes after the fact to see the weights on your repo as Colab is uploading at an excruciatingly slow speed (those pushes during training are asynchronous to avoid slowing down training).
trainer.push_in_progress
give you the job that is pushing, you can check its stdout
attribute to see the progress it makes, its is_done
attribute to see if it's finished or not and its stderr
attribute to check if there was an error or not.
You can see on this repo that I eventually got my weights pushed with your code :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Thanks @sgugger for the investigation. I was not aware about trainer.push_in_progress
, this is something I'll definitively use next time. Thanks once again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sgugger for the PR and clarifications on the pushing behaviour 🚀 🥳
Ugh, sorry @osanseviero I was convinced I had accepted your suggestions before merging (must have skipped one click), will make a new commit with you as Co-author... |
What does this PR do?
This PR changes the documentation of the
push_to_hub
training argument to be in sync with the actual behavior, and also makes sure that whenpush_to_hub=True
, a push is done every time the model is saved.Fixes #15313