-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AD support via chainrules #177
Conversation
3867248
to
4b90548
Compare
Codecov ReportBase: 99.58% // Head: 99.58% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #177 +/- ##
=======================================
Coverage 99.58% 99.58%
=======================================
Files 17 18 +1
Lines 1448 1460 +12
=======================================
+ Hits 1442 1454 +12
Misses 6 6
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
f301a22
to
3e249e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clear contribution. I just have a few style remarks. As this is unfamiliar ground for me, is there any chance to have special differentiation rules for FunctionMap
s that would "expose" the code to Flux
? I have no idea what people need or use, but I imagine that would be awesome. Our usual toy example is cumsum
, which in terms of a matrix is just LowerTriangular(ones(m,m))
. That doesn't have parameters in it so the derivative should be zero. But one could imagine a "weighted cumulative sum" with some weight vector input. Since for any weight we obtain a linear map, taking derivatives shouldn't lead us out of the linear map realms, naively speaking? Or the other way around, forgetting about LinearMaps.jl
and just considering "weighted cumsum" as a piece of code, that should be accessible to AD, shouldn't it? Anyway, I'm just thinking out loud without any concrete idea.
Thanks for the quick review. Concerning |
What are the problems with AD not handling this (not sure what "this" is) automatically? I am currently experimenting with AD in the context of KrylovKit.jl, for the solution of linear problems or eigenvalue problems. In KrylovKit, the linear operator is generically specified as a function (not a |
Thanks for chiming in, @Jutho. And indeed, the provided tests almost work, they fail with Can't differentiate gc_preserve_end expression
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] Pullback
@ ~/Documents/julia/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/blas.jl:644 [inlined] from within BLAS. But with Mutating arrays is not supported -- called copyto!(::SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, _...) So maybe this PR is necessary to cover up internal "issues"? |
There is a lot of subtleties to it yes.
In this case, most of the rules for standard functions are already defined in
And that first For the example |
(sorry for the second message) Finally, defining the rule internally that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new feature, please bump the minor version, and please write a short "news" entry in the documentation (history.md
), then we're ready to go.
For reasons of performance and functionality (handling of inplace operations etc.), this seems like a necessary step. Shall we merge this? @Jutho Were you afraid that this PR may prevent getting derivatives of the linear operator working later? Perhaps, if we have concrete use cases, we may add more specific rules, like directing |
Sorry for the delay, I am on parental leave will get back to it asap. IMO there is two main cases
Happy to iterate over the PR to get it to a best usable point |
As I am no longer actively using this package, I don't think my opinion should be valued highly. That being said, I would indeed like to see use cases where this rule really helps, where there is no custom rule higher up that states that you need to do Assuming that just because a function has additional parameters would make it a nonlinear function of |
Sorry for the delay will get to it next week. |
I'd like to add my general voice of support for building in support for autodiff here. |
Bump? I think there will be increasing interest in automating AD for linear maps: |
Out of curiosity: Is there a reason to use |
I am quite late on my TODOs, I'm really sorry this is taking a while.
Just the habit, no particular reason and I'm fine switching to Zygote.
This is the assumption for a Linear Operator yes when used as part of a "backpropagated" code. If the adjoint is not defined then an error should occur but I am not sure what the "cleanest" way to do so is to avoid having it mized in the middle of Zygote error messages |
IIUC, then we technically could add differentiation w.r.t. the operator later, right? We would need to replace the @mloubout I think all is set up right now, so maybe you could write a little announcement in the documentation, perhaps with a little "two-liner" as an example. You could, for instance, use one of your earlier test examples that I have replaced with a plain |
Yes. I think it would go the I'll work on the tiny Doc update hopefully by end of the week max. Thanks for the patience |
|
Thank you for the patience, I think all comments are answered now. |
Co-authored-by: Jeff Fessler <[email protected]>
Supports AD via chainrules for the multiplication, closes #176.
Currently,
mul!
does not support AD because I'm not sure how to define a perturbation on the LinearMap itself for the forward rule.Added a test as well.