always unthunk results #79

oscardssmith · 2022-07-15T03:32:23Z

in theory this shouldn't be necessary, but it is a good fallback to make sure we don't return thunks to the user. Also, it's 0 overhead as long as the preceding calculation is correctly inferred.

mcabbott · 2022-07-17T15:51:53Z

This is intended to only affect the outermost level, the thing returned to the user who types f'(x), right?

Should the same be done to gradient(f, xs...)?

oscardssmith · 2022-07-17T15:58:12Z

this should be done to gradient. diffractor would actually prefer if thunks didn't exist in the first place, so this is just a fallback to prevent the user from seeing them. our ideal scenario would be a way to get derivatives from chainrules that don't have thunks in the first place, but that is a lot harder than this pr.

mcabbott · 2022-07-17T16:16:53Z

Some way of not computing things which will be discarded does seem desirable. With the example of JuliaDiff/ChainRulesCore.jl#558 (but picturing a,b huge arrays):

julia> Diffractor.PrimeDerivativeBack(x -> f(x, b) + 10f(2x, b))(a)
rrule is called
rrule is called
∇a is called
∇a is called
42.0f0  # with this PR, seems ideal? ∇b never run despite accumulation

julia> gradient(x -> f(x, b), a)  # should unthunk answer, but not run ∇b
rrule is called
(Thunk(var"#28#31"{Float32, Float32, Float32}(1.0f0, 1.0f0, 2.0f0)),)

Surely a sufficiently smart compiler could notice and eliminate the ∇b branch without @thunk, but is this likely to happen soon?

oscardssmith · 2022-07-17T18:23:56Z

The goal for diffractor is to use escape analysis to remove the computation entirely which is made easier with simpler types. We aren't there yet, but once stage 2 is integrated, we'll be closeish.

mcabbott · 2022-07-17T21:09:00Z

This would be great. If it works, it's possible all thunks could be stripped out of ChainRules, since (IIRC) nothing else uses them anyway?

Although, besides delayed calculation, they are intended one day to save memory too... xref #69 I guess.

oscardssmith · 2022-07-17T22:03:11Z

as I understand it, ReverseDiff.jl likes thunks, but I'm not sure.

mcabbott · 2022-07-18T00:08:18Z

Yes I suppose, although you have to opt-in. More directly I forgot that Yota doesn't un-thunk internally, only the final result:

julia> using Yota

julia> grad(x -> f(x, b), a)  # unthunks final result
rrule is called
∇a is called
(2.0f0, (ZeroTangent(), 2.0f0))

julia> using Zygote

julia> Zygote.gradient(x -> f(x, b), a)  # unthunks all
rrule is called
∇a is called
∇b is called
(2.0f0,)

julia> using ReverseDiff

julia> ReverseDiff.@grad_from_chainrules f(x::TrackedReal, y::Real)

julia> ReverseDiff.gradient(x -> f(x[1], b), [a])  # maybe this relies on thunks inside?
rrule is called
∇a is called
1-element Vector{Float32}:
 2.0

src/interface.jl

codecov-commenter · 2022-07-19T20:17:40Z

Codecov Report

Merging #79 (9356c2d) into main (82096ee) will decrease coverage by 1.20%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
- Coverage   52.62%   51.41%   -1.21%     
==========================================
  Files          21       21              
  Lines        2172     2118      -54     
==========================================
- Hits         1143     1089      -54     
  Misses       1029     1029

Impacted Files	Coverage Δ
src/interface.jl	`70.90% <100.00%> (ø)`
src/stage1/forward.jl	`69.38% <0.00%> (-7.49%)`	⬇️
src/tangent.jl	`32.97% <0.00%> (-1.07%)`	⬇️
src/stage1/recurse.jl	`91.48% <0.00%> (-0.85%)`	⬇️
src/jet.jl	`40.00% <0.00%> (-0.50%)`	⬇️
src/stage1/recurse_fwd.jl	`94.11% <0.00%> (-0.17%)`	⬇️
src/stage1/generated.jl	`73.30% <0.00%> (-0.13%)`	⬇️
src/stage2/interpreter.jl	`0.00% <0.00%> (ø)`
src/stage2/abstractinterpret.jl	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82096ee...9356c2d. Read the comment docs.

oscardssmith · 2022-07-19T20:53:57Z

I think this is good to merge. (jacobians have been extricated).

mcabbott · 2022-07-19T20:56:56Z

Can this branch unthunk apply to gradient too, and perhaps within Tangent? Examples from above:

julia> Diffractor.PrimeDerivativeBack(x -> sum(sum(x.a) .+ x.b))((a=[1,2], b=[3,4], c=[5,6]))  # maybe this should unthunk within Tangent? 
Tangent{NamedTuple{(:a, :b, :c), Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}}}(b = [1.0, 1.0], a = InplaceableThunk(ChainRules.var"#..., Thunk(ChainRules.var"#...)))

julia> Diffractor.gradient(x -> sum(sum(x.a) .+ x.b), (a=[1,2], b=[3,4], c=[5,6]))[1]  # ditto?
Tangent{NamedTuple{(:a, :b, :c), Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}}}(b = [1.0, 1.0], a = InplaceableThunk(ChainRules.var"#..., Thunk(ChainRules.var"#...)))

oscardssmith · 2022-07-25T19:55:59Z

I think this should be merged as is. I haven't found a great way to unthunk within Tangent, and these are more of a fallback anyway until we find a better way to get rrules that don't have Thunks in the first place.

mcabbott

Seems ok by me. Would be nice if tests passed, though...

For disabling thunks, xref JuliaDiff/ChainRulesCore.jl#568 (inspired by difficulties with Zygote over Zygote).

mcabbott · 2022-07-25T22:16:22Z

Master has the same errors now, on Julia nightly: ((((sin')')')')(1.0) == sin(1.0) MethodError: no method matching fieldnames(::Nothing), so it's not this PR's fault.

always unthunk results

f6d2f35

mcabbott reviewed Jul 19, 2022

View reviewed changes

src/interface.jl Outdated Show resolved Hide resolved

oscardssmith force-pushed the always-unthunk branch from 0d199b6 to f6d2f35 Compare July 19, 2022 20:06

mcabbott mentioned this pull request Jul 25, 2022

Add DiffractorRuleConfig to extra rrules #80

Merged

add unthunk for gradient

9356c2d

mcabbott approved these changes Jul 25, 2022

View reviewed changes

mcabbott merged commit a2ea087 into JuliaDiff:main Jul 25, 2022

oscardssmith deleted the always-unthunk branch July 25, 2022 22:25

mcabbott mentioned this pull request Feb 18, 2023

RFC: don't automatically unthunk grad outputs dfdx/Yota.jl#134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

always unthunk results #79

always unthunk results #79

oscardssmith commented Jul 15, 2022

mcabbott commented Jul 17, 2022

oscardssmith commented Jul 17, 2022

mcabbott commented Jul 17, 2022 •

edited

Loading

oscardssmith commented Jul 17, 2022

mcabbott commented Jul 17, 2022

oscardssmith commented Jul 17, 2022

mcabbott commented Jul 18, 2022

codecov-commenter commented Jul 19, 2022 •

edited

Loading

oscardssmith commented Jul 19, 2022

mcabbott commented Jul 19, 2022

oscardssmith commented Jul 25, 2022

mcabbott left a comment •

edited

Loading

mcabbott commented Jul 25, 2022

always unthunk results #79

always unthunk results #79

Conversation

oscardssmith commented Jul 15, 2022

mcabbott commented Jul 17, 2022

oscardssmith commented Jul 17, 2022

mcabbott commented Jul 17, 2022 • edited Loading

oscardssmith commented Jul 17, 2022

mcabbott commented Jul 17, 2022

oscardssmith commented Jul 17, 2022

mcabbott commented Jul 18, 2022

codecov-commenter commented Jul 19, 2022 • edited Loading

Codecov Report

oscardssmith commented Jul 19, 2022

mcabbott commented Jul 19, 2022

oscardssmith commented Jul 25, 2022

mcabbott left a comment • edited Loading

Choose a reason for hiding this comment

mcabbott commented Jul 25, 2022

mcabbott commented Jul 17, 2022 •

edited

Loading

codecov-commenter commented Jul 19, 2022 •

edited

Loading

mcabbott left a comment •

edited

Loading