-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support scalar numbers #33
Conversation
@@ -13,11 +13,11 @@ struct Descent{T} | |||
end | |||
Descent() = Descent(1f-1) | |||
|
|||
init(o::Descent, x::AbstractArray) = nothing | |||
init(o::Descent, x::Union{AbstractArray,Number}) = nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be too broad of a type. An alternative would be Union{AbstractArray,AbstractFloat,Complex{<:AbstractFloat}}
, with type aliases as appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think init
is called unconditionally by setup
. So there's no need to filter out integers at this stage, that's already happened or not, it's just to throw an error if you really can't handle something.
Zygote et. al. have Numeric{T<:Number} = Union{T, AbstractArray{T}}
, and then you can do Numeric{<:Real}
etc?
@@ -213,7 +213,7 @@ struct AdaMax{T} | |||
end | |||
AdaMax(η = 1f-3, β = (9f-1, 9.99f-1), ϵ = eps(typeof(η))) = AdaMax{typeof(η)}(η, β, ϵ) | |||
|
|||
init(o::AdaMax, x::AbstractArray) = (zero(x), zero(x), o.beta) | |||
init(o::AdaMax, x::Union{AbstractArray,Number}) = (zero(x), zero(x), o.beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also on types: this could be narrower since max
does not support complex numbers. Should it be handled on a case-by-case basis?
There's a limit to how fancy your optimiser can be, in one dimension. So perhaps not that many of them extend? If you have arrays and scalars, should they use the same optimiser or different ones? The other question is how you specify which numbers are parameters. Are you suggesting that any scalar float in a Flux model should by default be so? Or are we going to beef up |
I don't see a major difference between a scalar and a 1-element array (of arbitrary dimensionality) for most of the optimizers we have. The question of defaults is a trickier one, so take this PR as a RFC on that part of the design. |
Right, a 1-element array is how you simulate a scalar in today's Flux. But my question is more whether there's any point at all running ADAM on scalars, and also, whether it's a good default to use the same optimiser on both the scalars and the arrays. After #30 you could easily allow say
called on all |
I don't think there's a clear enough rule here to warrant being opinionated. ADAM doesn't share any state across elements, so whether you apply it to a single param or thousands the procedure is the same. We can wait for a real-world use case before adding more knobs to |
Sure, keeping setup simple for now is probably sensible. Must every optimiser accept scalars, or can some choose not to? If they don't have to, then |
isnumeric(x::AbstractFloat) = true | ||
isnumeric(x::Complex{<:AbstractFloat}) = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's cleaner to say what's excluded:
isnumeric(x::AbstractFloat) = true | |
isnumeric(x::Complex{<:AbstractFloat}) = true | |
isnumeric(x::Number) = true | |
isnumeric(x::Integer) = false |
Then when someone wants Unitful numbers, they may just work.
If this is going to be applied everywhere, not as an opt-in, then I think it ought to exclude integers. Even though AD mostly draws the line only at Bool. It's possible that arrays of integers should similarly be excluded from isnumeric
? Then all cases can just be:
isnumeric(x::Numeric) = isleaf(x)
isnumeric(x::Numeric{<:Integer}) = false
isnumeric(_) = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, though if in some distant future Flux is actually capable of quantization-aware training you might well see int8/int4 arrays. RE Unitful quantities, is there any reason to consider them any "safer" to optimize than integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the graphics people do some things with low-bit numbers, but wrap them up to disguise the integer inside --- presumably it would make sense to do something like that before trying to AD with Int8?
julia> g = Gray{ColorTypes.N0f8}(128/255)
Gray{N0f8}(0.502)
julia> 0 < g < 1
true
julia> g isa Integer
false
julia> dump(g)
Gray{FixedPointNumbers.N0f8}
val: FixedPointNumbers.N0f8
i: UInt8 0x80
RE Unitful quantities, is there any reason to consider them any "safer" to optimize than integers?
My thinking is that if you add float(zero(x))
to them, which is what some trivial gradient could produce, nothing much will happen. Whereas if you do this to a size or an index, it'll break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a thought: if the concern is about indices and bools, can we just exclude Int64, Int32 and Bool? Maybe the UInts down to 8 and Int16 as well. I've not seen anyone using those types as model params, and it would eliminate 99% of the confusion with indices + true/false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After playing around for a bit with Unitful, I'm still not comfortable with giving carte blanche to all non-integer number types. Even with floats, it's not clear that a scalar param should be trainable by default.
This is what I was asking with the comment about a case-by-case consideration. The easiest way to do things would be strictly constraining the type of |
As with many things Optimisers, this turned out to be a deeper rabbit hole than expected. My current inclination is that trainable scalars should be an opt-in thing, perhaps via some wrapper type (is there an existing one that would fit the bill?). At least with Optimisers.jl one can opt to use a stack-allocated array type for this as a stopgap. |
@ToucheSir pointed me to this PR because I'm trying to use For my case, I have a physical simulation over a These differentiable parameters are tagged as fields for I would like to be able to As an illustrative example, here's what I was hoping for given that we tag these parameters as functor fields: struct Foo{X}
x::X
end
@functor Foo (x,)
julia> first(destructure(Foo(1.1)))
1-element Vector{Float64}:
1.1 I don't need |
This is something we've discussed before as a selling point of the new optimizers. It's also one of the most requested features of Flux's current optimizers.