Support scalar numbers #33

ToucheSir · 2022-01-28T04:19:24Z

This is something we've discussed before as a selling point of the new optimizers. It's also one of the most requested features of Flux's current optimizers.

ToucheSir · 2022-01-28T04:21:22Z

src/rules.jl

@@ -13,11 +13,11 @@ struct Descent{T}
 end
 Descent() = Descent(1f-1)

-init(o::Descent, x::AbstractArray) = nothing
+init(o::Descent, x::Union{AbstractArray,Number}) = nothing


This may be too broad of a type. An alternative would be Union{AbstractArray,AbstractFloat,Complex{<:AbstractFloat}}, with type aliases as appropriate.

I think init is called unconditionally by setup. So there's no need to filter out integers at this stage, that's already happened or not, it's just to throw an error if you really can't handle something.

Zygote et. al. have Numeric{T<:Number} = Union{T, AbstractArray{T}}, and then you can do Numeric{<:Real} etc?

ToucheSir · 2022-01-28T04:23:11Z

src/rules.jl

@@ -213,7 +213,7 @@ struct AdaMax{T}
 end
 AdaMax(η = 1f-3, β = (9f-1, 9.99f-1), ϵ = eps(typeof(η))) = AdaMax{typeof(η)}(η, β, ϵ)

-init(o::AdaMax, x::AbstractArray) = (zero(x), zero(x), o.beta)
+init(o::AdaMax, x::Union{AbstractArray,Number}) = (zero(x), zero(x), o.beta)


Also on types: this could be narrower since max does not support complex numbers. Should it be handled on a case-by-case basis?

mcabbott · 2022-01-28T04:42:50Z

There's a limit to how fancy your optimiser can be, in one dimension. So perhaps not that many of them extend?

If you have arrays and scalars, should they use the same optimiser or different ones?

The other question is how you specify which numbers are parameters. Are you suggesting that any scalar float in a Flux model should by default be so? Or are we going to beef up trainable to opt some in?

ToucheSir · 2022-01-28T04:53:38Z

I don't see a major difference between a scalar and a 1-element array (of arbitrary dimensionality) for most of the optimizers we have. The question of defaults is a trickier one, so take this PR as a RFC on that part of the design.

mcabbott · 2022-01-28T05:33:43Z

Right, a 1-element array is how you simulate a scalar in today's Flux. But my question is more whether there's any point at all running ADAM on scalars, and also, whether it's a good default to use the same optimiser on both the scalars and the arrays.

After #30 you could easily allow say setup(model; scalar=Descent(), array=ADAM()). Or perhaps it looks like

Optimisers.setup(model) do x
  x isa Number ? Opt0() : ndims(x) == 1 ? Opt1() : Opt2()
end

called on all isnumeric leaves?

ToucheSir · 2022-01-28T05:58:09Z

I don't think there's a clear enough rule here to warrant being opinionated. ADAM doesn't share any state across elements, so whether you apply it to a single param or thousands the procedure is the same. We can wait for a real-world use case before adding more knobs to setup too. This PR will likely have to sit for a while until the Flux side of things is figured out.

mcabbott · 2022-01-28T12:45:47Z

Sure, keeping setup simple for now is probably sensible.

Must every optimiser accept scalars, or can some choose not to? If they don't have to, then setup(ArrayOpt(), model) will throw if model contains (say) Dropout, with what looks like a parameter. The optimiser should never be called, since the gradient will be nothing, but init will be called. That could be solved by init(::Any, ::Number) = nothing. But in other cases, the optimiser may be called, and you will get a confusing error. What should happen?

mcabbott · 2022-01-28T14:06:01Z

src/interface.jl

+isnumeric(x::AbstractFloat) = true
+isnumeric(x::Complex{<:AbstractFloat}) = true


Maybe it's cleaner to say what's excluded:

Suggested change

isnumeric(x::AbstractFloat) = true

isnumeric(x::Complex{<:AbstractFloat}) = true

isnumeric(x::Number) = true

isnumeric(x::Integer) = false

Then when someone wants Unitful numbers, they may just work.

If this is going to be applied everywhere, not as an opt-in, then I think it ought to exclude integers. Even though AD mostly draws the line only at Bool. It's possible that arrays of integers should similarly be excluded from isnumeric? Then all cases can just be:

isnumeric(x::Numeric) = isleaf(x) isnumeric(x::Numeric{<:Integer}) = false isnumeric(_) = false

Maybe, though if in some distant future Flux is actually capable of quantization-aware training you might well see int8/int4 arrays. RE Unitful quantities, is there any reason to consider them any "safer" to optimize than integers?

I think the graphics people do some things with low-bit numbers, but wrap them up to disguise the integer inside --- presumably it would make sense to do something like that before trying to AD with Int8?

julia> g = Gray{ColorTypes.N0f8}(128/255) Gray{N0f8}(0.502) julia> 0 < g < 1 true julia> g isa Integer false julia> dump(g) Gray{FixedPointNumbers.N0f8} val: FixedPointNumbers.N0f8 i: UInt8 0x80

RE Unitful quantities, is there any reason to consider them any "safer" to optimize than integers?

My thinking is that if you add float(zero(x)) to them, which is what some trivial gradient could produce, nothing much will happen. Whereas if you do this to a size or an index, it'll break.

Here's a thought: if the concern is about indices and bools, can we just exclude Int64, Int32 and Bool? Maybe the UInts down to 8 and Int16 as well. I've not seen anyone using those types as model params, and it would eliminate 99% of the confusion with indices + true/false.

After playing around for a bit with Unitful, I'm still not comfortable with giving carte blanche to all non-integer number types. Even with floats, it's not clear that a scalar param should be trainable by default.

ToucheSir · 2022-01-28T15:44:27Z

Must every optimiser accept scalars, or can some choose not to?

This is what I was asking with the comment about a case-by-case consideration. The easiest way to do things would be strictly constraining the type of x in init and loosening it as needed for each optimizer. Then you get an early and relatively clear MethodError during setup. That of course requires us to figure out #35 first.

ToucheSir · 2022-02-08T18:17:01Z

As with many things Optimisers, this turned out to be a deeper rabbit hole than expected. My current inclination is that trainable scalars should be an opt-in thing, perhaps via some wrapper type (is there an existing one that would fit the bill?). At least with Optimisers.jl one can opt to use a stack-allocated array type for this as a stopgap.

SBuercklin · 2022-04-22T16:23:12Z

@ToucheSir pointed me to this PR because I'm trying to use destructure with a type where the parameters are scalars stored in custom types.

For my case, I have a physical simulation over a Model, where a Model contains many independent types with parameters that contribute to the simulation. These parameters are stored as scalars in types because they correspond to physical dimensions. As an example, we might have a bunch of rectangles in our simulation, so it's natural to express the problem over a collection of rectangles in the model rather than a collection of widths.

These differentiable parameters are tagged as fields for Functors.jl, but they vary from instance to instance, so we instead store the specific fields and use @flexiblefunctor (i.e. something the width is a parameter, sometimes the width is fixed).

I would like to be able to destructure(model) and get out a vector of the parameters, along with the accompanying Restructure object, so we can view the simulation as a mapping from R^n -> R with the first stage being restructuring.

As an illustrative example, here's what I was hoping for given that we tag these parameters as functor fields:

struct Foo{X}
    x::X
end

@functor Foo (x,)

julia> first(destructure(Foo(1.1)))
1-element Vector{Float64}:
 1.1

I don't need destructure to give me a collection of independent scalars, a vector of the functor fields is fine.

Support scalar numbers

db5f5a2

ToucheSir commented Jan 28, 2022

View reviewed changes

mcabbott reviewed Jan 28, 2022

View reviewed changes

mcabbott mentioned this pull request Feb 8, 2022

Register 0.2 #52

Closed

ToucheSir closed this Feb 8, 2022

ToucheSir mentioned this pull request Jun 28, 2022

Optimizing scalars #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support scalar numbers #33

Support scalar numbers #33

ToucheSir commented Jan 28, 2022

ToucheSir Jan 28, 2022

mcabbott Jan 28, 2022

ToucheSir Jan 28, 2022

mcabbott commented Jan 28, 2022

ToucheSir commented Jan 28, 2022

mcabbott commented Jan 28, 2022

ToucheSir commented Jan 28, 2022

mcabbott commented Jan 28, 2022

mcabbott Jan 28, 2022 •

edited

Loading

ToucheSir Jan 28, 2022

mcabbott Jan 29, 2022

ToucheSir Feb 8, 2022 •

edited

Loading

ToucheSir Feb 8, 2022

ToucheSir commented Jan 28, 2022

ToucheSir commented Feb 8, 2022

SBuercklin commented Apr 22, 2022

		isnumeric(x::AbstractFloat) = true
		isnumeric(x::Complex{<:AbstractFloat}) = true

Support scalar numbers #33

Support scalar numbers #33

Conversation

ToucheSir commented Jan 28, 2022

ToucheSir Jan 28, 2022

Choose a reason for hiding this comment

mcabbott Jan 28, 2022

Choose a reason for hiding this comment

ToucheSir Jan 28, 2022

Choose a reason for hiding this comment

mcabbott commented Jan 28, 2022

ToucheSir commented Jan 28, 2022

mcabbott commented Jan 28, 2022

ToucheSir commented Jan 28, 2022

mcabbott commented Jan 28, 2022

mcabbott Jan 28, 2022 • edited Loading

Choose a reason for hiding this comment

ToucheSir Jan 28, 2022

Choose a reason for hiding this comment

mcabbott Jan 29, 2022

Choose a reason for hiding this comment

ToucheSir Feb 8, 2022 • edited Loading

Choose a reason for hiding this comment

ToucheSir Feb 8, 2022

Choose a reason for hiding this comment

ToucheSir commented Jan 28, 2022

ToucheSir commented Feb 8, 2022

SBuercklin commented Apr 22, 2022

mcabbott Jan 28, 2022 •

edited

Loading

ToucheSir Feb 8, 2022 •

edited

Loading