-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving of checkpoint after every epoch using ModelCheckpoint if no metric is monitored #596
Comments
yup. i guess either return the epoch number as the thing to monitor or we modify to add this option |
Hmm. I guess monitoring epoch number could work but I think some modifications should be made to handle the cases where there's no validation loop initialized. What do you think? |
This would also be super important for me. I had a quite complicated experiment running on an older version relying on save_best_only = False saving every epoch without validation step. I lost quite a bit of training before I realized it was not saving checkpoints anymore. @williamFalcon is there a workaround? Like putting an empty validation step? |
I also need such functionality. @simonjaq did you found workaround for this problem? |
Hi. I made a custom checkpoint. Copied all the code but changed: This works quite well. before starting the trainer I do this: |
Hello (continues below code block)
In my train loop I call the checkpoint:
My training block looks like this:
This works for me. Note that I work in a Jupyter notebook and just insert the modified callback somewhere at the beginning of the notebook. This should also work by importing your modified callback. |
Hopefully this is an improvement @williamFalcon (but still doesn't allow saving all models independent of validation). |
Just for anyone else, I couldn't get the above to work. pl versions are different. Seemed to get messy putting trainer into model. I'm now saving every epoch, while still validating n > 1 epochs using this custom callback. Doesn't require adjusting of callbacks.model_checkpoint.py. fairly hacky and redoes filenames, but works. `
which is called like:
` |
What I did was
And add this callback to the trainer too, and set the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Please reopen this issue. |
I would appreciate this feature too |
I agree - this feature will be helpful! |
This is supported today inside the
|
I see - that's great |
I may have missed something but it seems that
ModelCheckpoint
does not allow this based on the docs and code?The text was updated successfully, but these errors were encountered: