Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic implementation of Anderson acceleration #101

Merged
merged 8 commits into from
Jul 1, 2017

Conversation

antoine-levitt
Copy link
Contributor

@antoine-levitt antoine-levitt commented May 22, 2017

This is in response to #88. This is a very basic implementation of Anderson acceleration but it works fine enough for well-conditioned problems.

A better implementation would use updates to qr factors, not implemented in julia yet (JuliaLang/LinearAlgebra.jl#407). This would enable a better scaling with the history size, and more stability (through condition number monitoring).

@cortner
Copy link

cortner commented May 22, 2017

nice - I look forward to trying this out.

@sglyon
Copy link
Collaborator

sglyon commented May 23, 2017

Thanks for getting started on this!

The code is clean and easy to read 💯

Have you tried running more of the example problems on this solver? Perhaps those in the minpack.jl test suite?

I'd be curious to see how it holds up compared to newton and trust region.

I also think there will be some "easy" performance improvements on the table, but I'll let the algorithm settle and be well-tested before diving into them

@antoine-levitt
Copy link
Contributor Author

Anderson acceleration is not really a general-purpose routine for nonlinear equations, it's mostly useful when
1 the function to solve for is expensive
2 no jacobian is available
3 there's already a "good" underlying fixed-point iteration

NLsolve is a good place to put it because that would allow for an easy comparison with other methods of this kind (mostly Broyden, when that's implemented). It's harder to test for because it's not expected to work on generic nonlinear systems. OTOH in the linear regime it's basically GMRES, so when started near a solution and with a large enough history length it converges, which is what I tested for.

@pkofod
Copy link
Member

pkofod commented May 30, 2017

fwiw I think we should have a separate fixedpoint function for fixed point finders. Not necessarily here, and possibly in a separate package, but...

@antoine-levitt
Copy link
Contributor Author

Not sure about that: fixed-point iterations and nonlinear solvers are very similar, and having a unified interface allows to change solvers at will.

@pkofod
Copy link
Member

pkofod commented May 30, 2017

Not sure about that: fixed-point iterations and nonlinear solvers are very similar, and having a unified interface allows to change solvers at will.

They are indeed very similar and finding a fixed point can be (and is often) framed as solving a set of nonlinear equations. Solving nonlinear equations is not the same as finding a fixed point in general though, so I would prefer

# F(x) = x is solved by
fixedpoint(F, x0, ...)
# G(x) = 0 is solved by
solve(G, x0, ...)

Then of course, you can choose amongst all the nonlinear solvers in your fixedpoint call and it will automatically subtract x

fixedpoint(F, x0, Newton())
# is equivalent to
solve(x-> F(x)-x, x0, Newton())

@pkofod
Copy link
Member

pkofod commented May 30, 2017

To clarify: I'm not asking you to do that now, just saying that I think we should do it later.

# fixed-point iteration
df.f!(xs[:,1], fx)
f_calls += 1
gs[:,1] .= xs[:,1] .+ β.*fx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't views be used to cut down allocations?

src/anderson.jl Outdated
m_eff = min(n-1,m)
new_x = gs[:,1]
if m_eff > 0
mat = (gs[:,2:m_eff+1] .- xs[:,2:m_eff+1]) .- (gs[:,1] - xs[:,1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-allocate to get rid of the temporaries here and at 62?

@ChrisRackauckas
Copy link
Contributor

I think that for completeness it would be nice to have a simple Picard version as well, if not just for benchmarking and possibly identifying issues.

@antoine-levitt
Copy link
Contributor Author

Pre-allocation and views done, thanks! For simple Picard, just set m=0.

@pkofod
Copy link
Member

pkofod commented Jun 26, 2017

Tests seem to fail

@antoine-levitt
Copy link
Contributor Author

Indeed, fixed. I ran into JuliaLang/LinearAlgebra.jl#440 while trying to find a good in-place op. I'm happy to live with the current out-of-place implementation. (it is only temporary until julia gets qr updates anyway)

@pkofod
Copy link
Member

pkofod commented Jun 26, 2017

Great! I'll have a closer look as soon as I find the time (hopefully tomorrow) so this doesn't end up sitting here over the summer.

Copy link
Member

@pkofod pkofod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sort of confused by the test, it doesn't seem to be a fixed point problem?

src/anderson.jl Outdated
tracing = store_trace || show_trace || extended_trace
old_x = xs[:,1]
x_converged, f_converged, converged = false, false, false
mat = zeros(T, N, m) #pre-allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-allocation of what? could you simply state what it's used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

src/anderson.jl Outdated
store_trace::Bool,
show_trace::Bool,
extended_trace::Bool,
hist_size::Integer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called hist_size? Should be m. Same all similar places below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the user-facing method, hist_size is more readable than m (but in the algorithm m is shorter and closer to the notations of Walker & Ni). Can change if you like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think m is fine; that's what we use in LBFGS over at Optim, and I've never heard complaints. I guess it just has to be mentioned. Speaking of mentioned, you should probably add a note somewhere in the readme about this new method.

# Notations from Walker & Ni, "Anderson acceleration for fixed-point iterations", SINUM 2011
# Attempts to accelerate the iteration xn+1 = xn + β f(x)

@views function anderson_{T}(df::AbstractDifferentiableMultivariateFunction,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscores for internal functions are usually placed in front

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did the same as the other functions, e.g. newton_

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, weird convention in NLsolve I guess. 👍 then

src/anderson.jl Outdated
end

return SolverResults("Anderson m=$m β=$β",
# returning gs[:,1] would be slightly better here, but the fnorm is not guaranteed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this comment?

Copy link
Contributor Author

@antoine-levitt antoine-levitt Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when solving a nonlinear problem with anderson acceleration, you expect that the base iteration xn+1 = xn + beta f(xn) is convergent, but slowly, so that returning gs[:,1], which is xs[:,1] + beta xs[:,1], should be slightly better than xs[:,1]. The problem is that you don't know f() of this guy, so cannot guarantee that it satisfies the convergence criterion.

@antoine-levitt
Copy link
Contributor Author

f(x) = 0 <=> x = x + beta*f(x), so everything is a fixed-point. Of course, the FP iteration might not converge and it might not make sense. However, if you start it close to a zero of f and with a small beta, f(x) is almost affine f(x) = Ax-b, and Anderson acceleration is basically GMRES on Ax=b so should converge (with a large enough history size). This test checks this.

@pkofod
Copy link
Member

pkofod commented Jun 29, 2017

f(x) = 0 <=> x = x + beta*f(x)

Right right, sorry. So to solve F(x)=x people should provide f(x)=F(x)-x). This would be nice to make 100% clear in the readme entry.

Personally I'd like to see a readme entry and hist_size -> m and then I think we're good to go!

test/2by2.jl Outdated
# test local convergence of Anderson: close to a fixed-point and with
# a small beta, f should be almost affine, in which case Anderson is
# equivalent to GMRES and should converge
r = nlsolve(df, [ 0.01; .99], method = :anderson, hist_size = 10, beta=.01)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmissed a hist_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll teach me to push without testing... thanks!

@pkofod
Copy link
Member

pkofod commented Jul 1, 2017

Looks like NLsolve is ready to accept a new solver for the first time in quite some time! Thanks for playing along @antoine-levitt !

@pkofod pkofod merged commit fff21cf into JuliaNLSolvers:master Jul 1, 2017
@ChrisRackauckas
Copy link
Contributor

Sweet!

@antoine-levitt
Copy link
Contributor Author

Great, thanks for the thorough review! (on this and other PRs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants