-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
python/mxnet/gluon/nn/activations.py
Outdated
Outputs: | ||
- **out**: output tensor with the same shape as `data`. | ||
""" | ||
def __init__(self, alpha_initializer='zeros', *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 0 initialization standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensorflow/Keras uses zeros, Pytorch uses 0.25
Do we have a constant initializer to achieve the latter?
python/mxnet/gluon/nn/activations.py
Outdated
""" | ||
def __init__(self, **kwargs): | ||
super(SELU, self).__init__(**kwargs) | ||
self.scale = 1.0507009873554804934193349852946 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._scale
python/mxnet/gluon/nn/activations.py
Outdated
|
||
def __init__(self, beta=1.0, **kwargs): | ||
super(Swish, self).__init__(**kwargs) | ||
self.beta = beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_beta
src/operator/leaky_relu-inl.h
Outdated
grad_weight = sumall_except_dim<1>(F<prelu_grad>(data) * grad); | ||
gdata = F<mshadow_op::xelu_grad>(data, mshadow::expr::broadcast<1>(weight, data.shape_)) | ||
* grad; | ||
if (weight.shape_[0] == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain? I thought prelu was already supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since uyou are adding those I would suggest also adding a learnable ISRLU:
https://arxiv.org/abs/1710.09967
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing PReLU had two problems:
- gamma parameter was never documented and can't be passed in using kwargs.
- it doesn't support scalar broadcast as was attempted in add Gluon PReLU activation layer #8912
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you meant to test for scalar than you should use weight.shape_.Size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's equivalent because weight is Tensor<xpu, 1>
. I think Size()
is safer choice in case weight
changes definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infershape sets it to data.shape[1].
When would weight's shape be (1,)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the weight parameter is shared across all axis, then the only one scalar value is shared everywhere, in which case the weight should be (1,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But infershape doesn't allow this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradcar sorry that I missed your comment earlier, and thanks for sharing your work. In this PR I'd like to first focus on wrapping up the previous two PRs for activations. Since you wrote the paper, would you like to implement that in mxnet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two options, either writing it in Gluon by defining hybrid_forward in python, or extending the leaky relu operator in C for better performance.
38e1e0e
to
f149553
Compare
@piiswrong addressed the infer shape issue. Let me know if you have more comments. |
python/mxnet/gluon/nn/activations.py
Outdated
return F.LeakyReLU(x, gamma=alpha, act_type='prelu', name='fwd') | ||
|
||
def __repr__(self): | ||
s = '{name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle these trivial cases in base class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
tests/python/unittest/test_gluon.py
Outdated
|
||
prelu = mx.gluon.nn.PReLU() | ||
prelu.initialize() | ||
x = point_to_validate.reshape((1, 1, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a different input shape that can catch the infershape problem
src/operator/leaky_relu-inl.h
Outdated
@@ -225,7 +242,11 @@ class LeakyReLUProp : public OperatorProperty { | |||
const TShape &dshape = in_shape->at(leakyrelu::kData); | |||
if (dshape.ndim() == 0) return false; | |||
if (param_.act_type == leakyrelu::kPReLU) { | |||
in_shape->at(leakyrelu::kGamma) = TShape(Shape1(dshape[1])); | |||
const TShape &gshape = in_shape->at(leakyrelu::kGamma); | |||
if (gshape.Size() != 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gshape could be empty, in which case Size is undefined.
Also if gshape.Size is 1, it could be (1,1), which is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I should check for both ndim and shape_[0] then. How do I check whether it’s undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if gshape is empty gshape.ndim() would be 0
@piiswrong, I addressed latest comments. Let me know if any further change is needed. |
ping @piiswrong |
hello ,when i run the mlp.cpp,there is an err like this: |
@szha I just run the mlp.cpp file, when it run this line: |
@piiswrong @szha what is the status of 9662 and prelu working? When I naively put PReLU (into a hybrid block (mxnet 1.2.0) and look at the source (activations.py) it seems that PReLU only has one learnable alpha per layer. Shouldn't each 'neuron' have its own learnable alpha? |
@bradcar the leaky relu operator in 'prelu' mode supports any broadcast-able alpha shapes. Since it's impossible to infer the shape of parameter until it sees the first input, we chose to put the simplest case in the constructor. For your use case when you need more than one alpha parameters, you can simply use the operator. |
* prelu, elu, selu, swish * update * fix infer shape * update infer shape * update
* prelu, elu, selu, swish * update * fix infer shape * update infer shape * update
Description
Picking up #8912 (@joeddav), #9111 (@anjishnu)
Checklist
Essentials
make lint
)Changes
LeakyReLU(act_type='prelu')
nn.PReLU
and testnn.ELU
,nn.SELU
,nn.Swish
and testsComments
LeakyReLU(act_type='prelu')
, since it didn't have the proper parameter or logic.