Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push to hub save #15327

Merged
merged 2 commits into from
Jan 27, 2022
Merged

Push to hub save #15327

merged 2 commits into from
Jan 27, 2022

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Jan 25, 2022

What does this PR do?

This PR changes the documentation of the push_to_hub training argument to be in sync with the actual behavior, and also makes sure that when push_to_hub=True, a push is done every time the model is saved.

Fixes #15313

@HuggingFaceDocBuilder
Copy link

HuggingFaceDocBuilder commented Jan 25, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me! Something to keep in mind here (might be a bit complex), is that git is aware of what commits have been pushed to the remote only when it has done pushing that commit.

It means the following:

repo.git_commit(...)
repo.git_push(blocking=False)
...
repo.git_commit(...)
repo.git_push(blocking=False) <--- if the previous push was not finished, it will push the two commits instead of the last one, resulting in a 2x bigger (and slower) push

This means that if the pushes are very frequent, or with very big models, then it wouldn't be surprising to have each git_push push the previous commit(s) alongside it, resulting in each following invocation pushing bigger and bigger pushes.

One way to prevent that would be to ensure that all previous pushes have been finished before starting a new one. I'm not sure there is a perfect solution to this.

@sgugger
Copy link
Collaborator Author

sgugger commented Jan 25, 2022

I think you're missing the fact that, apart from a push asked explicitly with trainer.push_to_hub() (or now trainer.save_model()) all other pushes are only done if the previous one is completely finished (see here). So, there are no frequent pushes (except if the user asks for it by putting trainer.save_model() in a loop but I don't think that will happen).

@LysandreJik
Copy link
Member

Then that's perfect :)

`output_dir` exists, it needs to be a local clone of the repository to which the [`Trainer`] will be
Whether or not to push the model to the Hub every time the model is saved. If this is activated,
`output_dir` will begin a git directory synced with the the repo (determined by `hub_model_id`) and the
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling
content will be pushed each time a save is triggered (depending on your `save_strategy`). Calling

Whether or not to push the model to the Hub every time the model is saved. If this is activated,
`output_dir` will begin a git directory synced with the the repo (determined by `hub_model_id`) and the
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling
[`~Trainer.save_model`] will also trigger a push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
[`~Trainer.save_model`] will also trigger a push
[`~Trainer.save_model`] will also trigger a push.

@@ -2051,6 +2051,10 @@ def save_model(self, output_dir: Optional[str] = None):
elif self.args.should_save:
self._save(output_dir)

# Push to the Hub when `save_model` is called by the user.
if self.args.push_to_hub and not _internal_call:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this. Don't we also want to push to the Hub automatically during training (which is an internal call)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push_to_hub calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model which calls push_to_hub which calls save_model ...

This internal argument is here to avoid that ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining and for the PR!

Unfortunately I'm still getting the behaviour in which the model is getting saved during training but it's not getting pushed. https://colab.research.google.com/drive/1GAXf3egH2GDbk7M0btdKWbLerBqLoJPi?usp=sharing. I have a commit saving at step 500 but no push

Copy link
Collaborator Author

@sgugger sgugger Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Omar! I just checked your colab, and ran it. You have to wait for a solid 15 minutes after the fact to see the weights on your repo as Colab is uploading at an excruciatingly slow speed (those pushes during training are asynchronous to avoid slowing down training).

trainer.push_in_progress give you the job that is pushing, you can check its stdout attribute to see the progress it makes, its is_done attribute to see if it's finished or not and its stderr attribute to check if there was an error or not.

You can see on this repo that I eventually got my weights pushed with your code :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Thanks @sgugger for the investigation. I was not aware about trainer.push_in_progress, this is something I'll definitively use next time. Thanks once again.

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sgugger for the PR and clarifications on the pushing behaviour 🚀 🥳

@sgugger sgugger merged commit da5ef25 into master Jan 27, 2022
@sgugger sgugger deleted the push_to_hub_save branch January 27, 2022 14:00
@sgugger
Copy link
Collaborator Author

sgugger commented Jan 27, 2022

Ugh, sorry @osanseviero I was convinced I had accepted your suggestions before merging (must have skipped one click), will make a new commit with you as Co-author...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Push to hub training argument not pushing
4 participants