Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing of auxiliary objects #635

Closed
2 tasks done
ottonemo opened this issue May 13, 2020 · 4 comments
Closed
2 tasks done

Checkpointing of auxiliary objects #635

ottonemo opened this issue May 13, 2020 · 4 comments
Assignees

Comments

@ottonemo
Copy link
Member

ottonemo commented May 13, 2020

As discussed in PR #621, in some circumstances the criterion might change during training (learnable parameters) which makes it desirable for check pointing.

Additionally, #597 makes it easier to add additional modules, optimizers and criteria which do not adhere to the standard naming, rendering checkpointing for such objects useless as well.

  • We need to save the criterion as well (possibly optional)
  • We need to be able to deal with multiple optimizers / models / criteria when checkpointing

I can see the following options:

  1. introduce f_criterion to checkpoint and instead of saving only the one optimizer/model/criterion we save all of them in their respective file - we can find those using the prefix and the registered parameters
  2. re-group files into a) models and b) training artifacts (history, optimizer, criterion) and save them independently

With (1) I'm a little worried that there is yet another parameter that needs saving in the future, resulting in an ever expanding parameter list for the checkpoint callback. Maybe this is irrational, I'm interested in other opinions.

@BenjaminBossan
Copy link
Collaborator

2. re-group files into a) models and b) training artifacts (history, optimizer, criterion) and save them independently

What I would like to avoid is a custom skorch storing format, even if it's just {'module': ..., 'criterion': ...}. Maybe that doesn't follow from your suggestion, I just want to throw this in.

If we look at the code for save_params and load_params, we see a certain regularity:

skorch/skorch/net.py

Lines 1638 to 1652 in 1f6b542

if f_params is not None:
msg = (
"Cannot save parameters of an un-initialized model. "
"Please initialize first by calling .initialize() "
"or by fitting the model with .fit(...).")
self.check_is_fitted(msg=msg)
torch.save(self.module_.state_dict(), f_params)
if f_optimizer is not None:
msg = (
"Cannot save state of an un-initialized optimizer. "
"Please initialize first by calling .initialize() "
"or by fitting the model with .fit(...).")
self.check_is_fitted(attributes=['optimizer_'], msg=msg)
torch.save(self.optimizer_.state_dict(), f_optimizer)

I wonder if we can build on that and allow to pass arbitrary f_* kwargs. Thus, if a user wants to add net.myoptimizer_ as an attribute, they can pass f_myoptimizer to save_params. I haven't thought this through, but it could be worth investigating.

@thomasjpfan
Copy link
Member

What I would like to avoid is a custom skorch storing format

I agree. Although this use case of saving custom objects has been concerned a few times.

What was the historical purpose of save_params? When I was working with it, I was assuming it was used for checkpointing.

@BenjaminBossan
Copy link
Collaborator

What was the historical purpose of save_params?

I think that historically, the idea was to give a more general, pickle- and skorch-independent way of storing model parameters. That's why we tried to stick to what PyTorch recommends instead of providing our own storing format, even if that would be more convenient sometimes. At least for me, checkpoints were not even the first concern.

@BenjaminBossan
Copy link
Collaborator

This has been solved via #652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants