Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate adding custom modules, criteria, or optimizers #597

Merged

Conversation

BenjaminBossan
Copy link
Collaborator

This is the third part of the change discussed in #593. For the docs,
it assumes that the changes in #596 have been merged.

Changes

Now, when creating a custom net that has additional modules, criteria,
or optimizers, skorch will take care of registering them inside
prefixes_ and cuda_dependent_attributes_. If this non-obvious
change is not done, it can result in errors down the line.

Additionally, during set_params, there is now re-initialization of
those components if any of the special keys (i.e. those that start
with one of the prefixes_) has "module", "criterion", or "optimizer"
as a substring.

This last part is not a perfect solution, but I couldn't find a better
way to determine whether a component needs to be reinitialized. Say we
add a new module called "mymodule" to the custom net. How can we know
if calling net.set_params(mymodule__foo=123) should result in module
re-initialization? We can't really, except by trying to guess from the
name. Other ideas that I thought of but rejected:

  1. Check if self.module is a nn.Module subclass. But what if it's
    not a module subclass but a function that returns an instantiated
    module?

  2. Instantiate self.mymodule() and see if we get a module back. But
    what if the module requires arguments in order to be called
    successfully? Okay, we could find those arguments and then
    initialize. But do we really want to make a "test initialization"? If
    feel like this could have unintended side-effects. Also, this would not
    handle any custom code a user might add to initialize_module.

If anybody can propose a better idea, that would be welcome. My
personal belief is that the substring matching is not that bad in the
sense that it's very unlikely to trigger a false positive. However, it
might not be easy to discover for a user (I added some documenation
and plan on adding the DCGAN example that shows this in practice).

Additional changes

  1. Refactor tests that used patched score method

The score method used to be patched to give fixed values for a net
fixture used variously in the tests. This was so that certain scores
could be asserted. However, this had the side effect that some tests
that actually want to calculate the real score had mysterious return
values from score().

Instead of fixing those tests, it's better not to patch the score
method in the fixture and to instead patch it in those tests where
it's necessary. This helps making those tests more explicit (which is
better than implicit).

I stumbled upon this accidentally when I tried to run some tests
in isolation.

  1. Fix typo: explicitely -> explicitly

TODO

In one of the tests, I use self._get_params_for, which should be
changed to self.get_params_for as soon as #596 has been merged.

BenjaminBossan added 4 commits February 23, 2020 23:31
The score method used to be patched to give fixed values for a net
fixture used variously in the tests. This was so that certain scores
could be asserted. However, this had the side effect that some tests
that actually want to calculate the real score had mysterious return
values from score().

Instead of fixing those tests, it's better not to patch the score
method in the fixture and to instead patch it in those tests where
it's necessary. This helps making those tests more explicit (which is
better than implicit).
Now, when creating a custom net that has additional modules, criteria,
or optimizers, skorch will take care of registering them inside
`prefixes_` and `cuda_dependent_attributes_`. If this non-obvious
change is not done, it can result in errors down the line.

Additionally, during `set_params`, there is now re-initialization of
those components if any of the special keys (i.e. those that start
with one of the `prefixes_`) has "module", "criterion", or "optimizer"
as a substring.

This last part is not a perfect solution, but I couldn't find a better
way to determine whether a component needs to be reinitialized. Say we
add a new module called "mymodule" to the custom net. How can we know
if calling `net.set_params(mymodule__foo=123)` should result in module
re-initialization? We can't really, except by trying to guess from the
name. Other ideas that I thought of but rejected:

1. Check if `self.module` is a `nn.Module` subclass. But what if it's
not a module subclass but a function that returns an instantiated
module?

2. Instantiate `self.mymodule()` and see if we get a module back. But
what if the module requires arguments in order to be called
successfully? Okay, we could find those arguments and then
initialize. But do we really want to make a "test initialization"? If
feel like this could have unintended side-effects.

If anybody can propose a better idea, that would be welcome. My
personal belief is that the substring matching is not that bad in the
sense that it's very unlikely to trigger a false positive. However, it
might not be easy to discover for a user (I added some documenation
and plan on adding the DCGAN example that shows this in practice).
In the previous commit, it was based on instance type or subclass, but
this was inconsistent with how re-initialization was handled. Also,
this new approach has the advantage that it covers, e.g., a `module`
argument that is actually a function that returns an initialized module.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

retrieve the arguments for the constructor.
3. The attribute name should contain the substring ``"module"`` if
it's a module, ``"criterion"`` if a criterion, and ``"optimizer"``
if an optimizer. This so so that skorch knows if a change in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if an optimizer. This so so that skorch knows if a change in
if an optimizer. This way skorch knows if a change in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

skorch/net.py Outdated
# just setattr as usual
is_known = name.endswith('_') or (name in self.prefixes_)
is_special_param = '__' in name
is_torch_mod_or_opt = any(c in name for c in PYTORCH_COMPONENTS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the conclusion to the discussion around PYTORCH_COMPONENTS we should document the reason why we chose this method over the others outlined in the conversation of this PR here.

Copy link
Collaborator Author

@BenjaminBossan BenjaminBossan May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked to this PR

skorch/net.py Outdated
Comment on lines 1609 to 1614
if is_known or is_special_param or not is_torch_mod_or_opt:
super().__setattr__(name, attr)
return

self._register_attribute(name)
super().__setattr__(name, attr)
Copy link
Member

@ottonemo ottonemo May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more straight forward to reduce this to

  1. register if necessary
  2. set attribute

instead of having (check, set) and (register, set)?

Suggestion:

Suggested change
if is_known or is_special_param or not is_torch_mod_or_opt:
super().__setattr__(name, attr)
return
self._register_attribute(name)
super().__setattr__(name, attr)
if is_torch_mod_or_opt or not (is_known and is_special_param):
self._register_attribute(name)
super().__setattr__(name, attr)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Streamlined the code (though a bit different than proposed)

cuda_dependent_attributes=True,
):
"""Add attribute name to prefixes_ and
cuda_dependent_attributes_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include the purpose of registering an attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cuda_dependent_attributes=True,
):
"""Remove attribute name from prefixes_ and
cuda_dependent_attributes_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include the consequence of un-registering an attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

BenjaminBossan added 2 commits May 12, 2020 22:00
* Fix typo in docs
* Better explain the (un)registering logic
* Stream line code
* Add link to this discussion for future reference
Copy link
Member

@ottonemo ottonemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

skorch/tests/test_net.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a nice improvement.

def _register_attribute(
self,
name,
prefixes=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we call _register_attribute with prefixes=False?

We also always call this with cuda_dependent_attributes=True. I guess there are no torch components that are not also cuda_dependent_attributes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could be that this is one of these cases of YAGNI. My idea was to provide a more general "setter" for prefixes and cuda dependent attributes that could be used in other places as well, where it might not make sense to change both at the same time.

skorch/net.py Outdated Show resolved Hide resolved
skorch/net.py Outdated Show resolved Hide resolved
Make `PYTORCH_COMPONENTS` "private" for now.

Co-authored-by: Thomas J Fan <[email protected]>

if cuda_dependent_attributes:
self.cuda_dependent_attributes_ = [
a for a in self.cuda_dependent_attributes_ if a != name + '_']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm

Suggested change
a for a in self.cuda_dependent_attributes_ if a != name + '_']
a for a in self.cuda_dependent_attributes_[:] if a != name + '_']

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to copy here since we create a new list anyway, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's okay here.


Parameters
----------
prefixes : bool (default=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a use case where one would set prefixes=False or cuda_dependent_attributes=False for _unregister_attribute.

Even if _register_attribute was called with prefixes=False, _unregister_attribute would do the right thing with prefixes set either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use would be a microscopic gain in speed I guess. Overall, I believe that we should keep it consistent between register and unregister, so either both have the option or neither. I don't have a strong opinion on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let's keep it


if cuda_dependent_attributes:
self.cuda_dependent_attributes_ = [
a for a in self.cuda_dependent_attributes_ if a != name + '_']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's okay here.

skorch/net.py Outdated Show resolved Hide resolved

Parameters
----------
prefixes : bool (default=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay let's keep it

@BenjaminBossan BenjaminBossan merged commit 6a9cf4f into master May 23, 2020
@BenjaminBossan BenjaminBossan deleted the feature/automatic-handling-when-setting-torch-attributes branch June 2, 2020 20:44
BenjaminBossan added a commit that referenced this pull request Jun 13, 2021
…es etc. (#751)

* Don't reinitialize uninitialized net bc set_params

Previously, when a parameter on, say, the module was changed via
set_params (e.g. net.set_params(module__hidden_units=123)), set_params
would always trigger (re-)initialization of the module. However, when
the net was not initialized in the first place, this is unnecessary. It
is sufficient to set the new attribute and wait for the net to be
initialized later.

Fortunately, this change doesn't seem to have any further impact, i.e.
we didn't implicitly rely on this behavior anywhere. The only exceptions
are 2 tests in test_cli.py, but those can easily be adjusted and this
shouldn't have any user impact.

* Simplify initialize_* methods

These methods started to become complicated because they did the
following:

1. Check if there is anything to initialize at all
2. Print message about reason for potential re-initialization
3. Moving to device

That made it quite difficult to override them without forgetting about
some aspect. With this change, there are now corresponding _intialize_*
methods that are called by net.initialize() and net.set_params. These
new methods now take care of the points above and call the initialize_*
methods inside.

Now, we can more easily make sure that the user can override
initialize_* without anything important being forgotten.

* Further clean up of set_params re-initialization

Removed code for states that could not be reached because of virtual
params. This simplifies the logic considerably.

* Rework logic of creating custom modules/optimizers

So far, the logic for creating custom modules or optimizers was separate
from the logic that created the default module, criterion and optimizer.
E.g., the "prefixes_" attribute was prefilled with 'module_',
'criterion_' and 'optimizer_'. This makes dealing with custom
modules/optimizers (e.g. creating a second module called 'mymodule_')
more difficult, because the logic for treating those was completely
disjoint from the logic of how the default modules/optimizer were
treated.

This change actually removes most of the "special status" of
module/criterion/optimizer. Therefore, the logic to treat those is now
the same as for any custom module. So for instance, they are no longer
pre-registered but instead are only registered later during their
initialize_* methods.

This is implemented by moving the registration to the respective
initialize_* methods. This is because during __init__, we don't actually
know if we deal with a module or optimizer yet (passed argument for
'module' can, for instance, be a function, so we cannot type check). But
during 'initialize', when the actual instances are created, we can check
if we deal with a nn.Module or optim.Optimizer. If we do, we register
them.

So overall, the logic and role of 'initialize' have changed. Users will
be expected to set custom modules/optimizers during their respective
'initialize_*' methods from now on (stricter checks and doc updates will
be added). This affords us to no longer rely on the name to infer the
function (remember that previously, a custom module needed to contain
the substring 'module', which is an ugly restriction).

As more of a side effect to these changes, the '_check_kwargs' call was
moved to 'initialize' as well, since we cannot really check for faulty
kwargs as long as we don't know what modules and optimizers will be
registered.

* Add battery of tests for custom modules/optimizers

Right now, there is a big hole in the treatment of custom
modules/optimizers that distinguishes them from the assumed
ones ('module', 'criterion', 'optimizer'). This battery of unit tests
covers behaviors that will fail but really shouldn't:

- custom module parameters should be passed to the optimizer
- set_params on a custom module should trigger re-initialization of
  criterion and optimizer
- set_params on a custom criterion should trigger re-initialization of
  optimizer
- custom modules and criteria are not automatically moved to cuda

* Remove _PYTORCH_COMPONENTS global

Since custom components are no longer matched by name, this became
obsolete.

* All optimizers perform updates automatically

Before this, only the default "optimizer_" was used and all others were
being ignored. With this change, "zero_grad" and "step" are called on
all optimizers automatically.

* Fix corner case with pre-initialized modules

This case had to be covered yet: When the module/criterion is already
initialized and none of it's parameters changed,
initialize_module/criterion was not called. However, what if a custom
module/criterion does need to be initialized? In that case, not calling
initialize_module/criterion is bad.

With this fix, this bad behavior no longer occurs. Tests were added to
cover this.

In order to achieve this change, we had to unfortunately push down the
checking whether module/criterion is already initialized from
_initialize_module/criterion to initialize_module/criterion. There was
no other way of checking this, since at first, we cannot know which
attributes are modules/criteria.

For the user, this means a little more work if they want to implement
initialize_module/criterion absolutely correctly. However, that's not so
bad because it is only important if the user wants to work with
pre-initialized modules/criteria and with custom modules/criteria, which
should happen very rarely in practice. And even if it does, the user can
just copy the default skorch code and will end up with a correct
implementation.

* Custom modules are set to train/eval mode

Until now, only module_ and criterion_ were automatically set into
training/evaluation mode, now custom modules are also set automatically.
This was implemented through a new method, net._set_training. It is
private for now, maybe consider adding a public one. Also, the name
could be changed to "train" as in PyTorch, but that name could be
confusing.

* Reviewer comment: Consider virtual params

I did not correctly handle virtual params with custom optimizers. This
has been fixed now. The ambiguous 'lr' parameter is only associated with
the default 'optimizer', not any custom optimizer, which need to be
addressed by 'myoptimizer__lr'.

* For completeness, the text from the PR is copied below:
Motivation

The initial reason why I wanted to work on this is that I'm currently
implementing a gpytorch integration (see this branch). For this, a big part is
adding a new custom module called "likelihood". Doing this correctly was
actually not trivial and required a lot of more or less duplicated code. Putting
such a burden on a user with less experience with skorch would not be possible.

The main reason for this difficulty is that module, criterion and optimizer are
treated "special" so far. We assume that they are already there and build
everything else around this. If a custom module is added, the user needs to be
aware of all the places where this is relevant, which is too error prone.
Previous work

Some changes to facilitate adding custom modules were already implemented in
#597. However, they don't go far enough.

Main changes

With this PR, we remove the special status of module, criterion and optimizer.
Instead, all the work that needs to be done when adding any of them to the net
is now implemented in a generic manner. This way, custom modules etc. can re-use
the same functionality and can therefore expect to be treated the same as these
"first class" components.

Here is a list of changed that were added to that effect:

* Until now, custom module parameters were not trained by the optimizer, now
  they are
* Until now, custom modules/criteria were not automatically moved to the
  indicated device, now they are
* Until now, custom modules/criteria were not automatically set into train/eval
  mode, now they are
* Simplified implementation of initialize_module et al. - they contained a lot
  of stuff that was irrelevant for the user, like messaging about why something
  was re-initialized; now this stuff is done inside the newly added methods
  _initialize_module etc., which are called by initialize() and shouldn't be a
  bother to the user
* Adding a custom module no longer requires the attribute name to contain the
  substring "module" (which was really not a nice solution), same for criterion
  and optimizer
* Re-initialization logic was changed: When any module is changed (via
  set_params), this triggers re-initialization of all modules, criteria and
  optimizers; when any criterion is changed, this triggers re-initialization of
  all optimizers (but not modules); this is a bit defensive since it could
  trigger unnecessary inits but it's better than missing any inits

Additions

* There is now a get_learnable_params method on the net to retrieve all
  learnable parameters (instead of just those of module_); it is meant to be
  overridable by the user (e.g. when they have two optimizers for two modules)
* Added attributes modules_, criteria_ and optimizers_ to the net to keep track
  of those; first started as OrderedDicts to mirror nn.Modules, but that was
  flaky, as the values in the dict would often get out of sync with the
  attributes on the net
* If the criterion/criteria have learnable params, those are now passed to the
  optimizer as well (think GANs)

Minor changes

* net.set_params(...) no longer initializes the net if it's not yet initialized
  - this was simply unnecessary and could lead to some unexpected behavior
* custom module instances now need to be set inside initialize_module (and the
  name must end on an underscore), else the user will get an appropriate error
  message; same logic for criterion and optimizer
* added a bunch of unit tests for the custom modules etc. that cover the cases
  not covered so far
* checking of kwargs is now done during initialize, not during __init__ anymore,
  since at that point, we don't know yet what custom modules could exist
* run a _check_kwargs during set_params - previously, this was a loophole that
  allowed users to set params with typos etc.
* two unconditional print statements are now conditioned on verbosity level

Notes

I took extra effort to write the code as clearly as possible and add lots of
comments, since this touches some complicated parts of the code base. But if
something is not obvious, please tell me so that I can improve the code since
now it's still fresh in my mind.

You will see that a few of the existing tests have been changed to now call
initialize on the net when previously they didn't. The reason is that some work
like checking kwargs is now moved to initialize.

Also, you will see that some tests now use mocked modules to check for device
calls. I found this preferable to actually moving to 'cuda' since this will also
work without a cuda device (e.g. during CI).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants