More robust and modular way of detecting divergence #16

yebai · 2019-03-05T20:09:52Z

Approximation error of leapfrog integration (i.e. accumulated Hamiltonian energy error) can sometimes explode, for example when the curvature of the current region is very high. This type of approximation error is sometimes called divergence [1] since it shifts a leapfrog simulation away from the correct solution.

In Turing, this type of errors is currently caught a relatively ad-hoc function called is_valid ,

AdvancedHMC.jl/src/integrator.jl

Line 7 in 734c0fa

function is_valid(v::AbstractVector{<:Real})

is_valid can catch cases where one or more elements of the parameter vector is either nan or inf. This has several drawbacks

it's not general enough for catching all leapfrog approximation errors, e.g. when the parameter vector is valid but Hamiltonian energy is invalid.
it may be delayed because numerical errors can appear in Hamiltonian energy earlier than in parameter vectors
it's hard to propagate the exception around, i.e. at the moment we use a heuristic to find the previous valid point before approximation/numerical error happens
- AdvancedHMC.jl/src/integrator.jl
  
  Line 26 in 734c0fa
  
  !_is_valid && return θ, r, false
- AdvancedHMC.jl/src/integrator.jl
  
  Line 45 in 734c0fa
  
  r, _ = lf_momentum(-lf.ϵ / 2, h, θ, r)

Therefore, we might want to refactor the current code a bit for a more robust mechanism for handling leapfrog approximation errors. Perhaps we can learn from the DynamicHMC implementation:

https://github.com/tpapp/DynamicHMC.jl/blob/master/src/buildingblocks.jl#L168

[1]: Betancourt, M. (2017). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434.

The text was updated successfully, but these errors were encountered:

yebai · 2019-03-08T11:20:48Z

Update:

The current NUTS implementation rejects only leaf nodes that contain numerical errors (by setting leaf nodes energy to Inf, see below). This leads to biased samples in general. We need to reject the trajectory increment that contains the error (e.g. numerical error or divergent Hamiltonian). This increment is the part of a trajectory that is added to the current trajectory during a doubling tree step (i.e. the subtree being created).

AdvancedHMC.jl/src/trajectory.jl

Line 85 in 734c0fa

_H′ = _is_valid ? hamiltonian_energy(h, θ′, r′) : Inf

yebai · 2019-03-08T15:57:25Z

Update:

The following 2 lines in find_good_eps are problematic. (position, momentum) shouldn't be refreshed during the initial search for stepsize (see Alg 4 of Hoffman & Gelman, 2011)

AdvancedHMC.jl/src/stepsize.jl

Line 16 in e7bb08b

θ = θ′

AdvancedHMC.jl/src/stepsize.jl

Line 18 in e7bb08b

r = rand_momentum(rng, h)

yebai · 2019-03-12T20:26:48Z

Update: The following function should be removed: taking the last point from a NoUTurnTrajectory trajectory is not a valid approach to HMC since time-reversibility is no longer guaranteed. That is, p(theta -> theta') = p(theta' -> theta) no longer holds. It is possible to adjust the MH ratio to correct this, but it will almost reject all proposals since Hamiltonian proposals are deterministic -- p(theta -> theta') is either 0 or 1.

AdvancedHMC.jl/src/trajectory.jl

Line 118 in 734c0fa

    
           function lastpoint(rng::AbstractRNG, nt::NoUTurnTrajectory, h::Hamiltonian, θ::AbstractVector{T}, r::AbstractVector{T}; j_max::Int=10) where {T<:Real}

xukai92 · 2019-03-15T19:52:59Z

Regarding #16 (comment), does it mean we should reject the whole sub-tree?

Regarding #16 (comment), I guess I name the function wrong. This actually doing a slice sampling from the trajectory.

yebai · 2019-03-15T21:15:54Z

Regarding #16 (comment), does it mean we should reject the whole sub-tree?

Yes, we should reject each trajectory increment that causes numerical error / divergent Hamiltonian.

xukai92 · 2019-03-22T09:47:26Z

@yebai I think we are tackling this correctly in Turing; see https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L91-L93. An invalid numerical error would cause H′ to be Inf and make s′ equal to 0, which is the terminating condition of doubling tree in https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L153.

I guess DynamicHMC follows a different abstraction from the NUTS paper so we didn't realise that before. What I can do is to add this divergence abstraction into the NUTS code to make it more readable if that makes sense.

yebai · 2019-03-22T15:57:04Z

@yebai I think we are tackling this correctly in Turing; see https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L91-L93. An invalid numerical error would cause H′ to be Inf and make s′ equal to 0, which is the terminating condition of doubling tree in https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L153.

I guess DynamicHMC follows a different abstraction from the NUTS paper so we didn't realise that before. What I can do is to add this divergence abstraction into the NUTS code to make it more readable if that makes sense.

I see - that makes sense. Introducing some abstractions for

termination (e.g. divergence, numerical error, U-turn),
phase point, i.e. (position, momentum),
current selected proposal candidate in a Dynamic trajectory
and other runtime information, e.g. tree-depth, MH ratio, step-size and mass matrix during adaption and sampling.

sounds like a good idea.

xukai92 · 2019-03-23T09:56:02Z

Good plan!

yebai · 2019-04-04T18:55:39Z

update:

The following line seems to be problematic, since the leapfrog integration should depend on the value of v. The current code will always simulate trajectory forward.

AdvancedHMC.jl/src/proposal.jl

Line 101 in 64a72da

θ′, r′, _is_valid = step(nt.integrator, h, θ, r)

xukai92 · 2019-04-05T13:03:57Z

Nice catch! I'm wondering what's the best way to fix it, since we lost our interface of passing step size.

let nt has forward_integrator and backward_integrator?
pass forward=true in step?

yebai · 2019-04-05T14:38:22Z

update:

the following line seems problematic too:

AdvancedHMC.jl/src/integrator.jl

Lines 45 to 46 in a0af7cc

    
           θ_new = lf_position(lf.ϵ, h, θ, r) 
        
           r_new, _is_valid = lf_momentum(i == n_steps ? lf.ϵ / 2 : lf.ϵ, h, θ, r)

shouldn't θ be θ_new? That is:

        θ_new = lf_position(lf.ϵ, h, θ, r)
        r_new, _is_valid = lf_momentum(i == n_steps ? lf.ϵ / 2 : lf.ϵ, h, θ_new, r)

yebai · 2019-04-05T14:44:49Z

pass forward=true in step?

I'll add a fwd flag in #49

xukai92 · 2019-07-09T01:50:48Z

So what do we really want here? @yebai

yebai added enhancement help wanted labels Mar 5, 2019

yebai mentioned this issue Mar 10, 2019

Bug-fixing in dual averaging for HMC TuringLang/Turing.jl#709

Merged

yebai mentioned this issue Apr 5, 2019

Some refactoring of Trajectory #49

Merged

xukai92 closed this as completed in #49 Apr 16, 2019

yebai reopened this Apr 16, 2019

yebai mentioned this issue Apr 16, 2019

Type design for PhasePoint and DualValue. #51

Merged

3 tasks

xukai92 mentioned this issue Jul 6, 2019

Support multinomial sampling for NUTS, and more #79

Merged

5 tasks

xukai92 added this to the Release v0.2 milestone Jul 9, 2019

yebai mentioned this issue Jul 10, 2019

Plan for v0.2 #64

Closed

10 tasks

xukai92 mentioned this issue Jul 11, 2019

Towards v0.2.0 #81

Merged

6 tasks

xukai92 closed this as completed Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More robust and modular way of detecting divergence #16

More robust and modular way of detecting divergence #16

yebai commented Mar 5, 2019 •

edited

Loading

yebai commented Mar 8, 2019 •

edited

Loading

yebai commented Mar 8, 2019 •

edited

Loading

yebai commented Mar 12, 2019 •

edited

Loading

xukai92 commented Mar 15, 2019

yebai commented Mar 15, 2019

xukai92 commented Mar 22, 2019

yebai commented Mar 22, 2019 •

edited

Loading

xukai92 commented Mar 23, 2019

yebai commented Apr 4, 2019

xukai92 commented Apr 5, 2019

yebai commented Apr 5, 2019 •

edited

Loading

yebai commented Apr 5, 2019 •

edited

Loading

xukai92 commented Jul 9, 2019

More robust and modular way of detecting divergence #16

More robust and modular way of detecting divergence #16

Comments

yebai commented Mar 5, 2019 • edited Loading

yebai commented Mar 8, 2019 • edited Loading

yebai commented Mar 8, 2019 • edited Loading

yebai commented Mar 12, 2019 • edited Loading

xukai92 commented Mar 15, 2019

yebai commented Mar 15, 2019

xukai92 commented Mar 22, 2019

yebai commented Mar 22, 2019 • edited Loading

xukai92 commented Mar 23, 2019

yebai commented Apr 4, 2019

xukai92 commented Apr 5, 2019

yebai commented Apr 5, 2019 • edited Loading

yebai commented Apr 5, 2019 • edited Loading

xukai92 commented Jul 9, 2019

yebai commented Mar 5, 2019 •

edited

Loading

yebai commented Mar 8, 2019 •

edited

Loading

yebai commented Mar 8, 2019 •

edited

Loading

yebai commented Mar 12, 2019 •

edited

Loading

yebai commented Mar 22, 2019 •

edited

Loading

yebai commented Apr 5, 2019 •

edited

Loading

yebai commented Apr 5, 2019 •

edited

Loading