Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust and modular way of detecting divergence #16

Closed
yebai opened this issue Mar 5, 2019 · 13 comments · Fixed by #49
Closed

More robust and modular way of detecting divergence #16

yebai opened this issue Mar 5, 2019 · 13 comments · Fixed by #49

Comments

@yebai
Copy link
Member

yebai commented Mar 5, 2019

Approximation error of leapfrog integration (i.e. accumulated Hamiltonian energy error) can sometimes explode, for example when the curvature of the current region is very high. This type of approximation error is sometimes called divergence [1] since it shifts a leapfrog simulation away from the correct solution.

In Turing, this type of errors is currently caught a relatively ad-hoc function called is_valid ,

function is_valid(v::AbstractVector{<:Real})

is_valid can catch cases where one or more elements of the parameter vector is either nan or inf. This has several drawbacks

  • it's not general enough for catching all leapfrog approximation errors, e.g. when the parameter vector is valid but Hamiltonian energy is invalid.
  • it may be delayed because numerical errors can appear in Hamiltonian energy earlier than in parameter vectors
  • it's hard to propagate the exception around, i.e. at the moment we use a heuristic to find the previous valid point before approximation/numerical error happens

Therefore, we might want to refactor the current code a bit for a more robust mechanism for handling leapfrog approximation errors. Perhaps we can learn from the DynamicHMC implementation:

https://github.com/tpapp/DynamicHMC.jl/blob/master/src/buildingblocks.jl#L168

[1]: Betancourt, M. (2017). A Conceptual Introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434.

@yebai
Copy link
Member Author

yebai commented Mar 8, 2019

Update:

The current NUTS implementation rejects only leaf nodes that contain numerical errors (by setting leaf nodes energy to Inf, see below). This leads to biased samples in general. We need to reject the trajectory increment that contains the error (e.g. numerical error or divergent Hamiltonian). This increment is the part of a trajectory that is added to the current trajectory during a doubling tree step (i.e. the subtree being created).

_H′ = _is_valid ? hamiltonian_energy(h, θ′, r′) : Inf

@yebai
Copy link
Member Author

yebai commented Mar 8, 2019

Update:

The following 2 lines in find_good_eps are problematic. (position, momentum) shouldn't be refreshed during the initial search for stepsize (see Alg 4 of Hoffman & Gelman, 2011)

θ = θ′

r = rand_momentum(rng, h)

@yebai
Copy link
Member Author

yebai commented Mar 12, 2019

Update: The following function should be removed: taking the last point from a NoUTurnTrajectory trajectory is not a valid approach to HMC since time-reversibility is no longer guaranteed. That is, p(theta -> theta') = p(theta' -> theta) no longer holds. It is possible to adjust the MH ratio to correct this, but it will almost reject all proposals since Hamiltonian proposals are deterministic -- p(theta -> theta') is either 0 or 1.

function lastpoint(rng::AbstractRNG, nt::NoUTurnTrajectory, h::Hamiltonian, θ::AbstractVector{T}, r::AbstractVector{T}; j_max::Int=10) where {T<:Real}

@xukai92
Copy link
Member

xukai92 commented Mar 15, 2019

Regarding #16 (comment), does it mean we should reject the whole sub-tree?

Regarding #16 (comment), I guess I name the function wrong. This actually doing a slice sampling from the trajectory.

@yebai
Copy link
Member Author

yebai commented Mar 15, 2019

Regarding #16 (comment), does it mean we should reject the whole sub-tree?

Yes, we should reject each trajectory increment that causes numerical error / divergent Hamiltonian.

@xukai92
Copy link
Member

xukai92 commented Mar 22, 2019

@yebai I think we are tackling this correctly in Turing; see https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L91-L93. An invalid numerical error would cause H′ to be Inf and make s′ equal to 0, which is the terminating condition of doubling tree in https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L153.

I guess DynamicHMC follows a different abstraction from the NUTS paper so we didn't realise that before. What I can do is to add this divergence abstraction into the NUTS code to make it more readable if that makes sense.

@yebai
Copy link
Member Author

yebai commented Mar 22, 2019

@yebai I think we are tackling this correctly in Turing; see https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L91-L93. An invalid numerical error would cause H′ to be Inf and make s′ equal to 0, which is the terminating condition of doubling tree in https://github.com/TuringLang/Turing.jl/blob/master/src/inference/nuts.jl#L153.

I guess DynamicHMC follows a different abstraction from the NUTS paper so we didn't realise that before. What I can do is to add this divergence abstraction into the NUTS code to make it more readable if that makes sense.

I see - that makes sense. Introducing some abstractions for

  • termination (e.g. divergence, numerical error, U-turn),
  • phase point, i.e. (position, momentum),
  • current selected proposal candidate in a Dynamic trajectory
  • and other runtime information, e.g. tree-depth, MH ratio, step-size and mass matrix during adaption and sampling.

sounds like a good idea.

@xukai92
Copy link
Member

xukai92 commented Mar 23, 2019

Good plan!

@yebai
Copy link
Member Author

yebai commented Apr 4, 2019

update:

The following line seems to be problematic, since the leapfrog integration should depend on the value of v. The current code will always simulate trajectory forward.

θ′, r′, _is_valid = step(nt.integrator, h, θ, r)

@xukai92
Copy link
Member

xukai92 commented Apr 5, 2019

Nice catch! I'm wondering what's the best way to fix it, since we lost our interface of passing step size.

  1. let nt has forward_integrator and backward_integrator?
  2. pass forward=true in step?

@yebai
Copy link
Member Author

yebai commented Apr 5, 2019

update:

the following line seems problematic too:

θ_new = lf_position(lf.ϵ, h, θ, r)
r_new, _is_valid = lf_momentum(i == n_steps ? lf.ϵ / 2 : lf.ϵ, h, θ, r)

shouldn't θ be θ_new? That is:

        θ_new = lf_position(lf.ϵ, h, θ, r)
        r_new, _is_valid = lf_momentum(i == n_steps ? lf.ϵ / 2 : lf.ϵ, h, θ_new, r) 

@yebai
Copy link
Member Author

yebai commented Apr 5, 2019

pass forward=true in step?

I'll add a fwd flag in #49

@xukai92
Copy link
Member

xukai92 commented Jul 9, 2019

So what do we really want here? @yebai

@xukai92 xukai92 added this to the Release v0.2 milestone Jul 9, 2019
@yebai yebai mentioned this issue Jul 10, 2019
10 tasks
@xukai92 xukai92 mentioned this issue Jul 11, 2019
6 tasks
@xukai92 xukai92 closed this as completed Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants