-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-978] Higher Order Gradient Support arcsin
, arccos
.
#15515
[MXNET-978] Higher Order Gradient Support arcsin
, arccos
.
#15515
Conversation
4c45f2f
to
7daaf76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barring that hacky way of testing (which makes sense but I'll wait for committers to approve it), LGTM! Thanks for your contribution!
Thank You. |
@mxnet-label-bot add [Operator, pr-awaiting-review] |
…to develop/add-higher-order/arcsin-arccos
…to develop/add-higher-order/arcsin-arccos
@apeforest @larroy @sxjscience Gentle Ping for review.:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi kshitij12345, thanks a lot for your contribution. In general looks good. One question regarding the first output of the second gradient.
auto grad_grad_x = op.mul(dydx_mul_grad_x, grad_x_square_mul_x); | ||
|
||
std::vector<nnvm::NodeEntry> ret; | ||
ret.emplace_back(op.mul(ograds[0], grad_x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the first input is y_grad or dL/dy, this gradient should be dL/(dy*dx) ?
didn't we have the convention of x_grad instead of grad_x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply.
if the first input is y_grad or dL/dy, this gradient should be dL/(dy*dx) ?
I am not sure of dL
part because we don't really use it in computing the loss function.
didn't we have the convention of x_grad instead of grad_x?
Oops. Thanks. Was a old PR. Will update the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, l guess calling it x_grad_y_grad is fine. Sorry CI is flaky now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x_grad_y_grad
we are not naming that particular variable anywhere. Or am I confusing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larroy Can you elaborate on what you meant with x_grad_y_grad
. I am slightly confused. Thanks.
@apeforest @larroy @sxjscience |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
If you have interest, one idea that circulated around to enhace higher order gradients is to add a "FGradientSymbolic" function that gets triggered when the higher order gradient is not avaiable and changes the graph to have the forward pass expressed in terms of differentiable primitives. We can talk more if you are interested.
@larroy , I am quite interested in the idea. What would be a good place to talk? Slack? |
@larroy @kshitij12345 @apeforest I think we can use Slack. What do you think? |
Slack sounds good to me. |
@kshitij12345 I have sent a slack invite to [email protected]. Please accept. Thanks! |
Slack or mailing list are fine for me. |
Description
PR intends to add support for higher order gradient for
arcsin
,arccos
.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
arcsin
,arccos
.