-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise a subset of parameters #35
Comments
|
It should work fine with But anyway the scope here is narrower. I think you are agreeing that this |
You're right, I completely misread the issue. We definitely agree on what's wrong here. Why not just have |
That's an option though I'd suggest we don't introduce state we can get for free. What I mean by this is that if |
We could do this. Flux's rules can be updated to do that without breaking anything there. I am picturing that, in a not distant version of Flux, it should depend on Optimisers and provide methods which work here, while still being usable in its old way. If user code returns a Tuple, we can convert it (using objectid) and print a warning. |
You can write I guess that they start to differ more once we handle tied parameters. I picture the tree of states also having (at its root) some lenses or something telling us about what transformations to perform before starting; these things are figured out ounce during setup. They could also be done every time, but I think that would need another pass over the model before the update one. |
But |
Flux's
trainable
works like this:This doesn't seem great, it relies on
objectid
to know which parameters those really are. So this:will not work correctly for say
β === SA[0.0, 0.0] === μ
.How should it work?
One idea would be to clone the
@functor
macro to have@trainable BatchNorm (β, γ)
? In fact this case is even worse, it checks a value here but we could probably moveaffine
into the type.Another idea would be just to have
trainable(:: BatchNorm) = (:β, :γ)
the symbols. That's much easier to write and perhaps less mysterious. Might be slower, do we care? Or might not be, if the symbols are known from the type. It would be easy here to allow Flux-style tuples as a fallback, detecting NTuple{Symbol} etc, making it easier to have both old- and new-style at once.This would be used during
setup
, just one pass. After that, the tree of optimiser states should tell you whether or not to update a given array, soupdate
need never call this.What might call it more often is
destructure
, which I think we want to walk only the trainable parameters, and will sometimes be called in a loop.The text was updated successfully, but these errors were encountered: