-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getindex(::Fill, idx) #863
Comments
Hmm, the output seems consistent since you have gradients defined for some elements and zeros for others. Do we want to say DNE for the remainder of the elements? Either way this would materialize the array iiuc |
The point is not that the output is inconsistent (I agree that the output is consistent with what you would get if you provided an (Part) of a solution is to stop getting in the way of Zygote automatically deriving the correct adjoint by restricting this rrule to something less general than edit: only part of a solution because IIRC when I tried this I encountered some type stability issues with |
I guess cases like this are the motivation for treating julia> gradient(y -> sum(sqrt, gradient(x -> sum(x.^2), y)[1]), reshape(1:6,2,3))[1]
2×3 Matrix{Float64}:
0.707107 0.408248 0.316228
0.5 0.353553 0.288675 The gradient of the inner But what seems a little tricky is that, if the returned gradient escapes from Zygote, the returning the one-number version seems like it violates the fact that julia> gradient(x -> sum(cumsum(x, dims=2)), Fill(pi,2,7))[1]
2×7 Matrix{Float64}:
7.0 6.0 5.0 4.0 3.0 2.0 1.0
7.0 6.0 5.0 4.0 3.0 2.0 1.0 I suspect no-one literally types this, except when making up examples. But there might be other contexts where code outside of Zygote may do this, and expect this optimisation not to change the answer. Can these two contexts be clearly distinguished somehow, so that Zygote knows when it's OK to "pre-accumulate" the gradient? In the first case, I think the outer Zygote ought to be able to see the constructor which is making the Fill, while in the second it obviously cannot. That isn't visible to the Notice also that my first example uses a |
I see where you're coming from, but I think we're stuck with it as the raw output of Zygote for performance reasons if no other reason (same for all structured arrays really).
I've thought about this a bit before. It should be possible to implement a separate piece of tooling to convert the output appropriately by comparing it with the input. i.e. if you know the input was a
Sorry, I'm not sure what you mean by this. Could you elaborate? |
Maybe the short version is that I'd like more examples of where
No, that's not what I meant. The complaint in my second example is that replacing
What I'm trying to say is that a decision to treat (say) Here's another example: gradient(x -> range(1,x,length=10)[5], 13) # Need an adjoint for constructor StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}. Gradient is of type Vector{Float64}
gradient(x -> x[5], range(1,13,length=10)) # ([0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0],)
gradient(x -> x[5], collect(range(1,13,length=10))) # the same Similar arguments apply here -- when the range is constructed within the scope of Zygote, it would I believe always be safe to make it structurally rigid, i.e. to store only I think any structural constraint means that |
One place that I use
I think these points are the crux of where our views differ, and I think it comes down to differing views on Firstly, my rough (current) personal definition of a structured array type is that an array My view is that the correct thing to do is to treat The first reason I think it's a good idea to treat many structured arrays as You can construct a hand-wavy inductive argument that a standard reverse-mode AD system will satisfy property this provided that all of its If we accept the above, it follows that the gradients of any structured array which uses asymptotically less memory than an My second reason for wanting to treat structured arrays as being structured is that I actually believe this to be a desirable property -- I would very much like to be able to express that a particular array is always going to be diagonal, and I would like my AD tooling to respect that. If one adopts a structural perspective this can be achieved very naturally by utilising an appropriate structured array. Conversely, it's less clear to me how to achieve this cleanly if all So all of this is to say that while I agree that it is certainly surprising at first to find that x == y doesn't imply f'(x) == f'(y), I believe it to be something we should resign ourselves to in the context of AD. |
Perhaps it's useful to back up from AD, and ask simpler questions of FillArrays. I claim that operations which could in principle be O(1) will often end up O(length), and that this is almost the entire point of the type's existence. That is, the main reason to use That's not precisely the only reason. The struct also contains the size, which perhaps it's convenient to pass along together. And perhaps in The situation with I guess that's a long way of saying that I like the second argument more:
More basically, I've argued forever that Returning to arrays, this preservation does still seem much more natural for say Maybe this is a sharper version of my major concern: If someone has replaced If someone passes a |
I think the discussion has generally moved on since I opened this, but it's worth pointing out that quite literally no subtype of Conversely, if we restrict the definition of using Zygote
using StaticArrays
julia> Zygote.gradient(x -> SVector{2}(x)[1], (1.0, 2.0))
ERROR: Need an adjoint for constructor SVector{2, Float64}. Gradient is of type Zygote.OneElement{Float64, 1, Tuple{Int64}, Tuple{SOneTo{2}}}
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] (::Zygote.Jnew{SVector{2, Float64}, Nothing, false})(Δ::Zygote.OneElement{Float64, 1, Tuple{Int64}, Tuple{SOneTo{2}}})
@ Zygote ~/.julia/dev/Zygote/src/lib/lib.jl:323
[3] (::Zygote.var"#1788#back#231"{Zygote.Jnew{SVector{2, Float64}, Nothing, false}})(Δ::Zygote.OneElement{Float64, 1, Tuple{Int64}, Tuple{SOneTo{2}}})
@ Zygote ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
[4] Pullback
@ ~/.julia/packages/StaticArrays/vxjOO/src/SArray.jl:23 [inlined]
[5] (::typeof(∂(SVector{2, Float64})))(Δ::Zygote.OneElement{Float64, 1, Tuple{Int64}, Tuple{SOneTo{2}}})
@ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
[6] Pullback
@ ~/.julia/packages/StaticArrays/vxjOO/src/SVector.jl:20 [inlined]
[7] Pullback
@ ./REPL[7]:1 [inlined]
[8] (::typeof(∂(#3)))(Δ::Float64)
@ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
[9] (::Zygote.var"#50#51"{typeof(∂(#3))})(Δ::Float64)
@ Zygote ~/.julia/dev/Zygote/src/compiler/interface.jl:41
[10] gradient(f::Function, args::Tuple{Float64, Float64})
@ Zygote ~/.julia/dev/Zygote/src/compiler/interface.jl:76
[11] top-level scope
@ REPL[7]:1 vs restricting the applicability of the julia> Zygote.gradient(x -> SVector{2}(x)[1], (1.0, 2.0))
((1.0, nothing),) which is the correct result. edit: on a separate note, the current error message is misleading. It wrongly suggests that the need for an adjoint for the constructor is the only possible explanation for the error, whereas it could easily be that a rule needs modification. |
Moreover, it's really hard to fix and let AD run on however |
As pointed out here by @mcabbott , the following happens:
The type is innappropriate -- the answer should be
or at the very least another
Fill
.The text was updated successfully, but these errors were encountered: