- Extend the implementation
- (Bernoulli Dropout) need 1d (exists), 2d and 3d
- (Convolutions) implement 3d convolutions and 3d Variational Dropout convolutions both real and complex
- (Transposed Convolutions) figure out the math and implement var dropout for transposed convos
- update the complex layer initialization from Kaiming to independent by default (check Trabelsi et al. 2018)
- this may break older experiments in third-party repos, so need to issue a warning and a patch
- deal with the
torch.nonzero(..., as_tuple=True)
deprecation warning inutils.spectrum
- figure out the issues with ONNX support
- make
.load_state_dict
respect components of CplxParameter and allow promoting real-tensors to complex-tensors provided the state dict has no.real
or.imag
, but a correct key referring to the parameter. - fix the incorrect naming of Bayesain methods in
nn.relevance
- rename
*ARD
named layers in.real
and.complex
to*VD
layers, since they use log-uniform prior and thus are in fact Variational Dropout layers - start deprecating importing
*ARD
named layers from.real
and.complex
- fix aliases of imported layers in
.extensions
- expose all base VD/ARD layers in
\_\_init\_\_.py
and require importing modifications from.extensions
- fix the text in
nn/relevance/README.md
- rename
- fix the names for
L0
regularized layer which in fact performs probabilistic sparsification, and is not related to Variational inference - check if
setup.py
has correct requirements and specify them explicitlyrequires
is not a keyword, useinstall_requires
andtests_require
- investigate reordering base classes in
LinearMasked(MaskedWeightMixin, Linear, _BaseRealMixin)
and similar innn.masked
.- could moving it further into the bases result in a slower property lookup? It seems no:
- from python descriptors doc
The implementation works through a precedence chain that gives data descriptors priority over instance variables, instance variables priority over non-data descriptors, and assigns lowest priority to __getattr__
- lookup order is thus by __getattribute__: descriptors (aka @property), instance __dict__, class attributes __dict__, and lastly __getattr__.
- moved MaskedWeightMixin into _BaseMixin
- could moving it further into the bases result in a slower property lookup? It seems no:
- get rid of
torch_module
from.utils
and declareactivations
explicitly - clean up the
nn
module itself- remove crap from
.sequential
:CplxResidualBottleneck
,CplxResidualSequential
and CplxBusResidualSequential must go, and move CplxSequential to base layers - split
.layers
,.activation
, and.sequential
.modules.base
: base classes (CplxToCplx
,BaseRealToCplx
,BaseCplxToReal
), and parameter type (CplxParameter
,CplxParameterAccessor
).modules.casting
: converting real tensors in various formats to and fromCplx
(InterleavedRealToCplx
,ConcatenatedRealToCplx
,CplxToInterleavedReal
,CplxToConcatenatedReal
,AsTypeCplx
).modules.linear
:Linear
,Bilinear
,Identity
,PhaseShift
.modules.conv
: everything convolutional.modules.activation
: activations (CplxModReLU
,CplxAdaptiveModReLU
,CplxModulus
,CplxAngle
) and layers (CplxReal
,CplxImag
).modules.container
:CplxSequential
.modules.extra
:Dropout
,AvgPool1d
- move
.batchnorm
to modules, keep.init
in.nn
- fix imports from adjacent modules:
nn.masked
andnn.relevance
.
- remove crap from
- in
nn.relevance.complex
: dropCplx(*map(torch.randn_like, (s2, s2)))
and writeCplx(torch.randn_like(s2), torch.randn_like(s2))
explicitly- implemented
cplx.randn
andcplx.randn_like
- implemented
- residual clean up in
nn
module.activation
:CplxActivation
is the same asCplxToCplx[...]
CplxActivation
promotes classic (real) torch functions to split activations, so yes.- See if it is possible to implement function promotion through
CplxToCplx[...]
- it is possible: just reuse
CplxActivation
- it is possible: just reuse
- Currently
CplxToCplx
promotes layers and real functions to independently applied layers/functions (split)- how should we proceed with
Cplx
trig. functions? a wrapper, or hardcoded activations?- the latter seems more natural, as the trig. functions are vendored by this module
- since torch is the base, and implements a great number of univariate tensor functions and could potentially be extended, it is more natural to use a wrapper (rationale behind
CplxToCplx[...]
).
- how should we proceed with
.modules.extra
: this needs thorough cleaning- drop
CplxResidualBottleneck
,CplxResidualSequential
andCplxBusResidualSequential
- abandon
torch_module
and code the trig. activations by hand. - remove alias
CplxDropout1d
: use torch.nn names as much as possible - deprecate
CplxAvgPool1d
: it can be created in runtime withCplxToCplx\[torch.nn.AvgPool1d\]
- drop
- documentation for Bayesian and maskable layers
- in
nn.relevance.base
, making it like innn.masked
- classes in
nn.relevance
.real
and.complex
should be also documented properly, the same goes for.extensions
- in
- restructure the extensions and non-Bayesian layers
- new folder structure
- take ARD-related declarations and move them to
relevance/ard.py
, everything else to a submodule .extensions
submodule:complex
forCplx
-specific extended layers: bogus penalties, approximations and other stuff, -- not directly related to Variational Dropout or automatic relevance determinationreal
for supplementary real-valued layers
- take ARD-related declarations and move them to
- decide the fate of
lasso
class innn.relevance
:- it is irrelevant to Bayesian methods: move it to
extensions/real
- it is irrelevant to Bayesian methods: move it to
- new folder structure
- documentation
- go through README-s in each submodule to make sure that info there is correct and typical use cases are described
nn.init
: document the initializations according to Trabelsi et al. (2018)- seems to be automatically documented using
functools.wraps
from the originaltorch.nn.init
procedures.
- seems to be automatically documented using
- add missing tests to the unit test suite
- tests for
*state_dict
api compliance ofnn.masked
andnn.base.CplxParameter
- implementing these test helped figure out and fix edge cases and fix them, so yay for TDD!
- tests for
The following ideas were at some point planned out, but ultimately were not pursued due to various reasons.
- begin migration to
complex
tensors inpytorch>=1.6
- #20220608 consider the discussion in issue #21
- For C->R real-valued loss functions grad.conj() gives a descent direction.
- complex autograd
- Consider replacing
Real
withTensor
in format-conversion layers, likeRealToCplx
,CplxToReal
- the term
Real
has connotations with real numbers, making it very unintuitive to convert betweenCplx
, which is perceived as a complex number, to a torchTensor
, which serves merely as a storage format. - need a deprecation cycle for these and related functions
- in
cplx
:from_interleaved_real
,from_concatenated_real
,to_interleaved_real
,to_concatenated_real
, aliasesfrom_real
andto_real
(affects__init__.py
) nn.modules.casting
:InterleavedRealToCplx
,ConcatenatedRealToCplx
,CplxToInterleavedReal
,CplxToConcatenatedReal
, also base classesBaseRealToCplx
andBaseCplxToReal
.- three basic types?
- Tensor -- aka Storage
- Real -- real-valued tensor
- Cplx -- complex-valued tensor
- in
- the term
- Implement scheduled mag-pruning of Zhu and Gupta (2017) or thresholded of Wu et al. (2019).
- use
nn.masked
as abackend
-- this will automatically support real andCplx
layers!!!! - implement as either wrapper around optimizer (bad), or as a separate entity (better)
- settings of the target sparsity per eligible layer (
dict
) - method
.step()
which updates the masks according to the schedule and the current sorted magnitudes of the parameters
- settings of the target sparsity per eligible layer (
- use