-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push to hub save #15327
Merged
Merged
Push to hub save #15327
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -365,9 +365,18 @@ class TrainingArguments: | |||||
Whether to skip adding of memory profiler reports to metrics. This is skipped by default because it slows | ||||||
down the training and evaluation speed. | ||||||
push_to_hub (`bool`, *optional*, defaults to `False`): | ||||||
Whether or not to upload the trained model to the hub after training. If this is activated, and | ||||||
`output_dir` exists, it needs to be a local clone of the repository to which the [`Trainer`] will be | ||||||
Whether or not to push the model to the Hub every time the model is saved. If this is activated, | ||||||
`output_dir` will begin a git directory synced with the the repo (determined by `hub_model_id`) and the | ||||||
content will be pushed each time a save is triggered (depneding on your `save_strategy`). Calling | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit
Suggested change
|
||||||
[`~Trainer.save_model`] will also trigger a push | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit
Suggested change
|
||||||
|
||||||
<Tip warning={true}> | ||||||
|
||||||
If `output_dir` exists, it needs to be a local clone of the repository to which the [`Trainer`] will be | ||||||
pushed. | ||||||
|
||||||
</Tip> | ||||||
|
||||||
resume_from_checkpoint (`str`, *optional*): | ||||||
The path to a folder with a valid checkpoint for your model. This argument is not directly used by | ||||||
[`Trainer`], it's intended to be used by your training/evaluation scripts instead. See the [example | ||||||
|
@@ -384,7 +393,7 @@ class TrainingArguments: | |||||
Defines the scope of what is pushed to the Hub and when. Possible values are: | ||||||
|
||||||
- `"end"`: push the model, its configuration, the tokenizer (if passed along to the [`Trainer`]) and a | ||||||
draft of a model card at the end of training. | ||||||
draft of a model card when the [`~Trainer.save_model`] method is called. | ||||||
- `"every_save"`: push the model, its configuration, the tokenizer (if passed along to the [`Trainer`]) and | ||||||
a draft of a model card each time there is a model save. The pushes are asynchronous to not block | ||||||
training, and in case the save are very frequent, a new push is only attempted if the previous one is | ||||||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this. Don't we also want to push to the Hub automatically during training (which is an internal call)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push_to_hub
callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
which callspush_to_hub
which callssave_model
...This internal argument is here to avoid that ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining and for the PR!
Unfortunately I'm still getting the behaviour in which the model is getting saved during training but it's not getting pushed. https://colab.research.google.com/drive/1GAXf3egH2GDbk7M0btdKWbLerBqLoJPi?usp=sharing. I have a commit saving at step 500 but no push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Omar! I just checked your colab, and ran it. You have to wait for a solid 15 minutes after the fact to see the weights on your repo as Colab is uploading at an excruciatingly slow speed (those pushes during training are asynchronous to avoid slowing down training).
trainer.push_in_progress
give you the job that is pushing, you can check itsstdout
attribute to see the progress it makes, itsis_done
attribute to see if it's finished or not and itsstderr
attribute to check if there was an error or not.You can see on this repo that I eventually got my weights pushed with your code :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Thanks @sgugger for the investigation. I was not aware about
trainer.push_in_progress
, this is something I'll definitively use next time. Thanks once again.