-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic implementation of Anderson acceleration #101
Conversation
nice - I look forward to trying this out. |
Thanks for getting started on this! The code is clean and easy to read 💯 Have you tried running more of the example problems on this solver? Perhaps those in the minpack.jl test suite? I'd be curious to see how it holds up compared to newton and trust region. I also think there will be some "easy" performance improvements on the table, but I'll let the algorithm settle and be well-tested before diving into them |
Anderson acceleration is not really a general-purpose routine for nonlinear equations, it's mostly useful when NLsolve is a good place to put it because that would allow for an easy comparison with other methods of this kind (mostly Broyden, when that's implemented). It's harder to test for because it's not expected to work on generic nonlinear systems. OTOH in the linear regime it's basically GMRES, so when started near a solution and with a large enough history length it converges, which is what I tested for. |
fwiw I think we should have a separate |
Not sure about that: fixed-point iterations and nonlinear solvers are very similar, and having a unified interface allows to change solvers at will. |
They are indeed very similar and finding a fixed point can be (and is often) framed as solving a set of nonlinear equations. Solving nonlinear equations is not the same as finding a fixed point in general though, so I would prefer
Then of course, you can choose amongst all the nonlinear solvers in your fixedpoint call and it will automatically subtract fixedpoint(F, x0, Newton())
# is equivalent to
solve(x-> F(x)-x, x0, Newton()) |
To clarify: I'm not asking you to do that now, just saying that I think we should do it later. |
# fixed-point iteration | ||
df.f!(xs[:,1], fx) | ||
f_calls += 1 | ||
gs[:,1] .= xs[:,1] .+ β.*fx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't views be used to cut down allocations?
src/anderson.jl
Outdated
m_eff = min(n-1,m) | ||
new_x = gs[:,1] | ||
if m_eff > 0 | ||
mat = (gs[:,2:m_eff+1] .- xs[:,2:m_eff+1]) .- (gs[:,1] - xs[:,1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-allocate to get rid of the temporaries here and at 62?
I think that for completeness it would be nice to have a simple Picard version as well, if not just for benchmarking and possibly identifying issues. |
Pre-allocation and views done, thanks! For simple Picard, just set m=0. |
Tests seem to fail |
Indeed, fixed. I ran into JuliaLang/LinearAlgebra.jl#440 while trying to find a good in-place op. I'm happy to live with the current out-of-place implementation. (it is only temporary until julia gets qr updates anyway) |
Great! I'll have a closer look as soon as I find the time (hopefully tomorrow) so this doesn't end up sitting here over the summer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sort of confused by the test, it doesn't seem to be a fixed point problem?
src/anderson.jl
Outdated
tracing = store_trace || show_trace || extended_trace | ||
old_x = xs[:,1] | ||
x_converged, f_converged, converged = false, false, false | ||
mat = zeros(T, N, m) #pre-allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-allocation of what? could you simply state what it's used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
src/anderson.jl
Outdated
store_trace::Bool, | ||
show_trace::Bool, | ||
extended_trace::Bool, | ||
hist_size::Integer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this called hist_size
? Should be m
. Same all similar places below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the user-facing method, hist_size is more readable than m (but in the algorithm m is shorter and closer to the notations of Walker & Ni). Can change if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think m
is fine; that's what we use in LBFGS over at Optim, and I've never heard complaints. I guess it just has to be mentioned. Speaking of mentioned, you should probably add a note somewhere in the readme about this new method.
# Notations from Walker & Ni, "Anderson acceleration for fixed-point iterations", SINUM 2011 | ||
# Attempts to accelerate the iteration xn+1 = xn + β f(x) | ||
|
||
@views function anderson_{T}(df::AbstractDifferentiableMultivariateFunction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscores for internal functions are usually placed in front
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did the same as the other functions, e.g. newton_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, weird convention in NLsolve I guess. 👍 then
src/anderson.jl
Outdated
end | ||
|
||
return SolverResults("Anderson m=$m β=$β", | ||
# returning gs[:,1] would be slightly better here, but the fnorm is not guaranteed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually when solving a nonlinear problem with anderson acceleration, you expect that the base iteration xn+1 = xn + beta f(xn) is convergent, but slowly, so that returning gs[:,1], which is xs[:,1] + beta xs[:,1], should be slightly better than xs[:,1]. The problem is that you don't know f() of this guy, so cannot guarantee that it satisfies the convergence criterion.
f(x) = 0 <=> x = x + beta*f(x), so everything is a fixed-point. Of course, the FP iteration might not converge and it might not make sense. However, if you start it close to a zero of f and with a small beta, f(x) is almost affine f(x) = Ax-b, and Anderson acceleration is basically GMRES on Ax=b so should converge (with a large enough history size). This test checks this. |
Right right, sorry. So to solve Personally I'd like to see a readme entry and hist_size -> m and then I think we're good to go! |
test/2by2.jl
Outdated
# test local convergence of Anderson: close to a fixed-point and with | ||
# a small beta, f should be almost affine, in which case Anderson is | ||
# equivalent to GMRES and should converge | ||
r = nlsolve(df, [ 0.01; .99], method = :anderson, hist_size = 10, beta=.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmissed a hist_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll teach me to push without testing... thanks!
Looks like NLsolve is ready to accept a new solver for the first time in quite some time! Thanks for playing along @antoine-levitt ! |
Sweet! |
Great, thanks for the thorough review! (on this and other PRs) |
This is in response to #88. This is a very basic implementation of Anderson acceleration but it works fine enough for well-conditioned problems.
A better implementation would use updates to qr factors, not implemented in julia yet (JuliaLang/LinearAlgebra.jl#407). This would enable a better scaling with the history size, and more stability (through condition number monitoring).