Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scalar numbers #33

Closed
wants to merge 1 commit into from
Closed

Support scalar numbers #33

wants to merge 1 commit into from

Conversation

ToucheSir
Copy link
Member

This is something we've discussed before as a selling point of the new optimizers. It's also one of the most requested features of Flux's current optimizers.

@@ -13,11 +13,11 @@ struct Descent{T}
end
Descent() = Descent(1f-1)

init(o::Descent, x::AbstractArray) = nothing
init(o::Descent, x::Union{AbstractArray,Number}) = nothing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be too broad of a type. An alternative would be Union{AbstractArray,AbstractFloat,Complex{<:AbstractFloat}}, with type aliases as appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think init is called unconditionally by setup. So there's no need to filter out integers at this stage, that's already happened or not, it's just to throw an error if you really can't handle something.

Zygote et. al. have Numeric{T<:Number} = Union{T, AbstractArray{T}}, and then you can do Numeric{<:Real} etc?

@@ -213,7 +213,7 @@ struct AdaMax{T}
end
AdaMax(η = 1f-3, β = (9f-1, 9.99f-1), ϵ = eps(typeof(η))) = AdaMax{typeof(η)}(η, β, ϵ)

init(o::AdaMax, x::AbstractArray) = (zero(x), zero(x), o.beta)
init(o::AdaMax, x::Union{AbstractArray,Number}) = (zero(x), zero(x), o.beta)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on types: this could be narrower since max does not support complex numbers. Should it be handled on a case-by-case basis?

@mcabbott
Copy link
Member

There's a limit to how fancy your optimiser can be, in one dimension. So perhaps not that many of them extend?

If you have arrays and scalars, should they use the same optimiser or different ones?

The other question is how you specify which numbers are parameters. Are you suggesting that any scalar float in a Flux model should by default be so? Or are we going to beef up trainable to opt some in?

@ToucheSir
Copy link
Member Author

I don't see a major difference between a scalar and a 1-element array (of arbitrary dimensionality) for most of the optimizers we have. The question of defaults is a trickier one, so take this PR as a RFC on that part of the design.

@mcabbott
Copy link
Member

Right, a 1-element array is how you simulate a scalar in today's Flux. But my question is more whether there's any point at all running ADAM on scalars, and also, whether it's a good default to use the same optimiser on both the scalars and the arrays.

After #30 you could easily allow say setup(model; scalar=Descent(), array=ADAM()). Or perhaps it looks like

Optimisers.setup(model) do x
  x isa Number ? Opt0() : ndims(x) == 1 ? Opt1() : Opt2()
end

called on all isnumeric leaves?

@ToucheSir
Copy link
Member Author

I don't think there's a clear enough rule here to warrant being opinionated. ADAM doesn't share any state across elements, so whether you apply it to a single param or thousands the procedure is the same. We can wait for a real-world use case before adding more knobs to setup too. This PR will likely have to sit for a while until the Flux side of things is figured out.

@mcabbott
Copy link
Member

Sure, keeping setup simple for now is probably sensible.

Must every optimiser accept scalars, or can some choose not to? If they don't have to, then setup(ArrayOpt(), model) will throw if model contains (say) Dropout, with what looks like a parameter. The optimiser should never be called, since the gradient will be nothing, but init will be called. That could be solved by init(::Any, ::Number) = nothing. But in other cases, the optimiser may be called, and you will get a confusing error. What should happen?

Comment on lines +42 to +43
isnumeric(x::AbstractFloat) = true
isnumeric(x::Complex{<:AbstractFloat}) = true
Copy link
Member

@mcabbott mcabbott Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's cleaner to say what's excluded:

Suggested change
isnumeric(x::AbstractFloat) = true
isnumeric(x::Complex{<:AbstractFloat}) = true
isnumeric(x::Number) = true
isnumeric(x::Integer) = false

Then when someone wants Unitful numbers, they may just work.

If this is going to be applied everywhere, not as an opt-in, then I think it ought to exclude integers. Even though AD mostly draws the line only at Bool. It's possible that arrays of integers should similarly be excluded from isnumeric? Then all cases can just be:

isnumeric(x::Numeric) = isleaf(x)
isnumeric(x::Numeric{<:Integer}) = false
isnumeric(_) = false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, though if in some distant future Flux is actually capable of quantization-aware training you might well see int8/int4 arrays. RE Unitful quantities, is there any reason to consider them any "safer" to optimize than integers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the graphics people do some things with low-bit numbers, but wrap them up to disguise the integer inside --- presumably it would make sense to do something like that before trying to AD with Int8?

julia> g = Gray{ColorTypes.N0f8}(128/255)
Gray{N0f8}(0.502)

julia> 0 < g < 1
true

julia> g isa Integer
false

julia> dump(g)
Gray{FixedPointNumbers.N0f8}
  val: FixedPointNumbers.N0f8
    i: UInt8 0x80

RE Unitful quantities, is there any reason to consider them any "safer" to optimize than integers?

My thinking is that if you add float(zero(x)) to them, which is what some trivial gradient could produce, nothing much will happen. Whereas if you do this to a size or an index, it'll break.

Copy link
Member Author

@ToucheSir ToucheSir Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a thought: if the concern is about indices and bools, can we just exclude Int64, Int32 and Bool? Maybe the UInts down to 8 and Int16 as well. I've not seen anyone using those types as model params, and it would eliminate 99% of the confusion with indices + true/false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing around for a bit with Unitful, I'm still not comfortable with giving carte blanche to all non-integer number types. Even with floats, it's not clear that a scalar param should be trainable by default.

@ToucheSir
Copy link
Member Author

Must every optimiser accept scalars, or can some choose not to?

This is what I was asking with the comment about a case-by-case consideration. The easiest way to do things would be strictly constraining the type of x in init and loosening it as needed for each optimizer. Then you get an early and relatively clear MethodError during setup. That of course requires us to figure out #35 first.

@mcabbott mcabbott mentioned this pull request Feb 8, 2022
@ToucheSir
Copy link
Member Author

As with many things Optimisers, this turned out to be a deeper rabbit hole than expected. My current inclination is that trainable scalars should be an opt-in thing, perhaps via some wrapper type (is there an existing one that would fit the bill?). At least with Optimisers.jl one can opt to use a stack-allocated array type for this as a stopgap.

@ToucheSir ToucheSir closed this Feb 8, 2022
@SBuercklin
Copy link

@ToucheSir pointed me to this PR because I'm trying to use destructure with a type where the parameters are scalars stored in custom types.

For my case, I have a physical simulation over a Model, where a Model contains many independent types with parameters that contribute to the simulation. These parameters are stored as scalars in types because they correspond to physical dimensions. As an example, we might have a bunch of rectangles in our simulation, so it's natural to express the problem over a collection of rectangles in the model rather than a collection of widths.

These differentiable parameters are tagged as fields for Functors.jl, but they vary from instance to instance, so we instead store the specific fields and use @flexiblefunctor (i.e. something the width is a parameter, sometimes the width is fixed).

I would like to be able to destructure(model) and get out a vector of the parameters, along with the accompanying Restructure object, so we can view the simulation as a mapping from R^n -> R with the first stage being restructuring.

As an illustrative example, here's what I was hoping for given that we tag these parameters as functor fields:

struct Foo{X}
    x::X
end

@functor Foo (x,)

julia> first(destructure(Foo(1.1)))
1-element Vector{Float64}:
 1.1 

I don't need destructure to give me a collection of independent scalars, a vector of the functor fields is fine.

@ToucheSir ToucheSir mentioned this pull request Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants