Checkpointing of auxiliary objects #635

ottonemo · 2020-05-13T10:07:21Z

As discussed in PR #621, in some circumstances the criterion might change during training (learnable parameters) which makes it desirable for check pointing.

Additionally, #597 makes it easier to add additional modules, optimizers and criteria which do not adhere to the standard naming, rendering checkpointing for such objects useless as well.

We need to save the criterion as well (possibly optional)
We need to be able to deal with multiple optimizers / models / criteria when checkpointing

I can see the following options:

introduce f_criterion to checkpoint and instead of saving only the one optimizer/model/criterion we save all of them in their respective file - we can find those using the prefix and the registered parameters
re-group files into a) models and b) training artifacts (history, optimizer, criterion) and save them independently

With (1) I'm a little worried that there is yet another parameter that needs saving in the future, resulting in an ever expanding parameter list for the checkpoint callback. Maybe this is irrational, I'm interested in other opinions.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2020-05-14T20:12:57Z

2. re-group files into a) models and b) training artifacts (history, optimizer, criterion) and save them independently

What I would like to avoid is a custom skorch storing format, even if it's just {'module': ..., 'criterion': ...}. Maybe that doesn't follow from your suggestion, I just want to throw this in.

If we look at the code for save_params and load_params, we see a certain regularity:

skorch/skorch/net.py

Lines 1638 to 1652 in 1f6b542

    
           if f_params is not None: 
        
               msg = ( 
        
                   "Cannot save parameters of an un-initialized model. " 
        
                   "Please initialize first by calling .initialize() " 
        
                   "or by fitting the model with .fit(...).") 
        
               self.check_is_fitted(msg=msg) 
        
               torch.save(self.module_.state_dict(), f_params) 
        
           if f_optimizer is not None: 
        
               msg = ( 
        
                   "Cannot save state of an un-initialized optimizer. " 
        
                   "Please initialize first by calling .initialize() " 
        
                   "or by fitting the model with .fit(...).") 
        
               self.check_is_fitted(attributes=['optimizer_'], msg=msg) 
        
               torch.save(self.optimizer_.state_dict(), f_optimizer)

I wonder if we can build on that and allow to pass arbitrary f_* kwargs. Thus, if a user wants to add net.myoptimizer_ as an attribute, they can pass f_myoptimizer to save_params. I haven't thought this through, but it could be worth investigating.

thomasjpfan · 2020-05-15T20:20:06Z

What I would like to avoid is a custom skorch storing format

I agree. Although this use case of saving custom objects has been concerned a few times.

What was the historical purpose of save_params? When I was working with it, I was assuming it was used for checkpointing.

BenjaminBossan · 2020-05-17T10:29:57Z

What was the historical purpose of save_params?

I think that historically, the idea was to give a more general, pickle- and skorch-independent way of storing model parameters. That's why we tried to stick to what PyTorch recommends instead of providing our own storing format, even if that would be more convenient sometimes. At least for me, checkpoints were not even the first concern.

BenjaminBossan · 2020-09-13T08:49:35Z

This has been solved via #652

ottonemo added the enhancement label May 13, 2020

ottonemo mentioned this issue May 13, 2020

ENH : set train / valid of criterion. #621

Merged

BenjaminBossan self-assigned this Jun 14, 2020

BenjaminBossan mentioned this issue Jun 14, 2020

Feature save params arbitrary modules #652

Merged

BenjaminBossan closed this as completed Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpointing of auxiliary objects #635

Checkpointing of auxiliary objects #635

ottonemo commented May 13, 2020 •

edited

Loading

BenjaminBossan commented May 14, 2020

thomasjpfan commented May 15, 2020

BenjaminBossan commented May 17, 2020

BenjaminBossan commented Sep 13, 2020

Checkpointing of auxiliary objects #635

Checkpointing of auxiliary objects #635

Comments

ottonemo commented May 13, 2020 • edited Loading

BenjaminBossan commented May 14, 2020

thomasjpfan commented May 15, 2020

BenjaminBossan commented May 17, 2020

BenjaminBossan commented Sep 13, 2020

ottonemo commented May 13, 2020 •

edited

Loading