Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug-fixing in dual averaging for HMC #709

Merged
merged 10 commits into from
Mar 12, 2019
Merged

Bug-fixing in dual averaging for HMC #709

merged 10 commits into from
Mar 12, 2019

Conversation

yebai
Copy link
Member

@yebai yebai commented Mar 8, 2019

The step-size adaption problem is probably resolved in this PR. So far I have been using the gdemo example, and the m12_2 model from StatisticalRethinking for testing. Please take a look and let me know if you have better models for testing.

The effective sample size for m12_2

I'm working on a more robust version of _find_good_eps. When this is done, the following line will be changed back to ϵ = direction == 1 ? 2 * ϵ : 0.5 * ϵ

ϵ = direction == 1 ? 1.1 * ϵ : 0.5 * ϵ

Related: #628

@yebai
Copy link
Member Author

yebai commented Mar 8, 2019

Models being used

gdemo

using Revise, Turing

@model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
end

c5 = sample(gdemo(1.5, 2), HMCDA(2000, 0.65, 1.0))

m12_2

using StatisticalRethinking
using Turing

Turing.setadbackend(:reverse_diff)

d = CSV.read(joinpath(dirname(Base.pathof(StatisticalRethinking)), "..", "data",
    "reedfrogs.csv"), delim=';')
size(d) # Should be 48x5

# Set number of tanks
d[:tank] = 1:size(d,1)

@model m12_2(density, tank, surv) = begin
    # Separate priors on α and σ for each tank
    σ ~ Truncated(Cauchy(0, 1), 0, Inf)
    α ~ Normal(0, 1) #Check stancode this might need to be in the for loop below?

    # Number of unique tanks in the data set
    N_tank = length(tank)

    # Set an TArray for the priors/param
    # a_tank = TArray{Any}(undef, N_tank)
    a_tank = Vector{Real}(undef, N_tank)

    # For each tank [1,..,48] set a prior
    for i = 1:N_tank
        a_tank[i] ~ Normal(α, σ)
    end

    for i = 1:N_tank
        logitp = a_tank[tank[i]]
        surv[i] ~ BinomialLogit(density[i], logitp)
    end
end

posterior = sample(m12_2(d[:density], d[:tank], d[:surv]),
                   Turing.NUTS(4000, 500, 0.95),
                   # Turing.HMCDA(4000, 500, 0.95, 0.3)
                  )
describe(posterior)

LDA

# Simulate data
using Distributions

V = 5   # vocab
K = 3   # topic
M = 25  # doc

alpha = collect(ones(K) / K);
beta = collect(ones(V) / V);

theta = rand(Dirichlet(alpha), M)
phi = rand(Dirichlet(beta), K)
@info "true vals" phi theta sum(phi * theta, dims=1)

avg_doc_length = 50

doc_length = rand(Poisson(avg_doc_length), M)

N = sum(doc_length)

w = Vector{Int}(undef, N)
doc = Vector{Int}(undef, N)

n = 1
for m = 1:M, i=1:doc_length[m]
    global n
    z = rand(Categorical(theta[:,m]))
    w[n] = rand(Categorical(phi[:,z]))
    doc[n] = m
    n = n + 1
end

# LDA
using Turing

struct CategoricalReal<: Distributions.DiscreteUnivariateDistribution
    p
end

function Distributions.logpdf(d::CategoricalReal, k::Int64)
    return log(d.p[k])
end

@model lda_collapsed(K, V, M, N, w, doc, beta, alpha) = begin
    theta = Matrix{Real}(undef, K, M)
    for m = 1:M
        theta[:,m] ~ Dirichlet(alpha)
    end

    phi = Matrix{Real}(undef, V, K)
    for k = 1:K
        phi[:,k] ~ Dirichlet(beta)
    end

    word_dist = phi * theta

    for n = 1:N
        w[n] ~ CategoricalReal(word_dist[:,doc[n]])
    end
end

mf = lda_collapsed(K, V, M, N, w, doc, beta, alpha)

c3= sample(mf, Turing.NUTS(2000, 0.65))

@yebai
Copy link
Member Author

yebai commented Mar 10, 2019

Update:

Running the LDA example will occasionally throw log(0) error. I don't think this is directly linked to step-size adaption. In fact, step-size seems well behaved with this PR. I didn't observe any sharp fluctuations and/or extreme step-size values. This type numerical error will always occur due to the nature of numerical computing, and unavoiadable Hamiltonian simulation error.

At the moment, we don't handle this kind of errors in Turing, and the default Julia behavior is to stop executing the error function and print the stacktrace. A complete solution requires us to catch this type of exception/error in Turing, and reject the current HMC proposal when it happens. This is likely to require some re-work of the error handling mechanism mentioned in TuringLang/AdvancedHMC.jl#16

julia> chain = sample(mf, HMCDA(2000, 0.65, 1.0))
[ Info: [Turing] looking for good initial eps...
[ Info: [Turing] ϵ = 0.2, accept ratio a = 6.58293101371248e-25
[ Info: [Turing] ϵ = 0.15000000000000002, accept ratio a = 1.0
[ Info: [Turing] ϵ = 0.17500000000000002, accept ratio a = 1.0
[ Info: [Turing] ϵ = 0.1875, accept ratio a = 1.0
[ Info: [Turing] ϵ = 0.19375, accept ratio a = 1.0
[ Info: [Turing] ϵ = 0.19687500000000002, accept ratio a = 1.6572427130567486e-9
[ Info: [Turing] ϵ = 0.1953125, accept ratio a = 0.04197809303831285
[ Info: [Turing] ϵ = 0.19453125, accept ratio a = 1.0
[ Info: [Turing] ϵ = 0.194921875, accept ratio a = 1.0
[ Info: [Turing] ϵ = 0.1951171875, accept ratio a = 0.3421968454472759
[ Info: [Turing] found initial ϵ: 0.1951171875
ERROR: DomainError with -2.220446049250313e-16:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
Stacktrace:  1.0
 [1] throw_complex_domainerror(::Symbol, ::Float64) at ./math.jl:31
 [2] log(::Float64) at ./special/log.jl:285
 [3] log(::ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10}) at /Users/hg344/.julia/packages/ForwardDiff/N0wMF/src/dual.jl:203
 [4] logpdf_with_trans(::Dirichlet{Float64}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},Real},Real,10},1}, ::Bool) at /Users/hg344/.julia/packages/Bijectors/vCpPQ/src/Bijectors.jl:292
 [5] assume(::Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}, ::Dirichlet{Float64}, ::Turing.Core.VarReplay.VarName, ::Turing.Core.VarReplay.VarInfo) at /Users/hg344/.julia/dev/Turing/src/inference/hmc.jl:277
 [6] macro expansion at /Users/hg344/.julia/dev/Turing/src/core/compiler.jl:102 [inlined]
 [7] macro expansion at ./REPL[27]:9 [inlined]
 [8] (::getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}})(::Turing.Core.VarReplay.VarInfo, ::Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}, ::Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}) at /Users/hg344/.julia/dev/Turing/src/core/compiler.jl:389
 [9] #call#3 at /Users/hg344/.julia/dev/Turing/src/Turing.jl:62 [inlined]
 [10] Model at /Users/hg344/.julia/dev/Turing/src/Turing.jl:62 [inlined]
 [11] runmodel!(::Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}, ::Turing.Core.VarReplay.VarInfo, ::Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}) at /Users/hg344/.julia/dev/Turing/src/core/VarReplay.jl:110
 [12] f at /Users/hg344/.julia/dev/Turing/src/core/ad.jl:109 [inlined]
 [13] chunk_mode_gradient!(::Array{Real,1}, ::getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}, ::Array{Real,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},Real},Real,10,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},Real},Real,10},1}}) at /Users/hg344/.julia/packages/ForwardDiff/N0wMF/src/gradient.jl:140
 [14] gradient! at /Users/hg344/.julia/packages/ForwardDiff/N0wMF/src/gradient.jl:37 [inlined]
 [15] gradient!(::Array{Real,1}, ::getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}, ::Array{Real,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},Real},Real,10,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#26")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}},Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},Real},Real,10},1}}) at /Users/hg344/.julia/packages/ForwardDiff/N0wMF/src/gradient.jl:33
 [16] gradient_logp_forward(::Array{Real,1}, ::Turing.Core.VarReplay.VarInfo, ::Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}, ::Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}) at /Users/hg344/.julia/dev/Turing/src/core/ad.jl:116
 [17] gradient_logp at /Users/hg344/.julia/dev/Turing/src/core/ad.jl:80 [inlined]
 [18] #2 at /Users/hg344/.julia/dev/Turing/src/inference/support/hmc_core.jl:12 [inlined]
 [19] #_leapfrog#26(::getfield(Turing.Inference, Symbol("##6#7")){Turing.Core.VarReplay.VarInfo,Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}, ::getfield(Turing.Inference, Symbol("##8#9")){Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}, ::Function, ::Array{Real,1}, ::Array{Float64,1}, ::Int64, ::Float64, ::getfield(Turing.Inference, Symbol("##2#3")){Turing.Core.VarReplay.VarInfo,Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}},Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}}) at /Users/hg344/.julia/dev/Turing/src/inference/support/hmc_core.jl:158
 [20] (::getfield(Turing.Inference, Symbol("#kw##_leapfrog")))(::NamedTuple{(:rev_func, :log_func),Tuple{getfield(Turing.Inference, Symbol("##6#7")){Turing.Core.VarReplay.VarInfo,Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},getfield(Turing.Inference, Symbol("##8#9")){Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}}}, ::typeof(Turing.Inference._leapfrog), ::Array{Real,1}, ::Array{Float64,1}, ::Int64, ::Float64, ::Function) at ./none:0
 [21] #_hmc_step#27(::Function, ::Function, ::Function, ::Array{Real,1}, ::Float64, ::getfield(Turing.Inference, Symbol("##4#5")){Turing.Core.VarReplay.VarInfo,Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}},Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}}, ::Function, ::getfield(Turing.Inference, Symbol("##20#21")){Array{Float64,1}}, ::Int64, ::Float64, ::getfield(Turing.Inference, Symbol("##18#19")){Int64,Array{Float64,1}}) at /Users/hg344/.julia/dev/Turing/src/inference/support/hmc_core.jl:195
 [22] (::getfield(Turing.Inference, Symbol("#kw##_hmc_step")))(::NamedTuple{(:rev_func, :log_func),Tuple{getfield(Turing.Inference, Symbol("##6#7")){Turing.Core.VarReplay.VarInfo,Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},getfield(Turing.Inference, Symbol("##8#9")){Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}}}, ::typeof(Turing.Inference._hmc_step), ::Array{Real,1}, ::Float64, ::Function, ::Function, ::Function, ::Int64, ::Float64, ::getfield(Turing.Inference, Symbol("##18#19")){Int64,Array{Float64,1}}) at ./none:0
 [23] #_hmc_step#28 at /Users/hg344/.julia/dev/Turing/src/inference/support/hmc_core.jl:224 [inlined]
 [24] #_hmc_step at ./none:0 [inlined]
 [25] #hmc_step#30 at /Users/hg344/.julia/dev/Turing/src/inference/hmcda.jl:72 [inlined]
 [26] (::getfield(Turing.Inference, Symbol("#kw##hmc_step")))(::NamedTuple{(:rev_func, :log_func),Tuple{getfield(Turing.Inference, Symbol("##6#7")){Turing.Core.VarReplay.VarInfo,Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}},getfield(Turing.Inference, Symbol("##8#9")){Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}}}}, ::typeof(Turing.Inference.hmc_step), ::Array{Real,1}, ::Float64, ::Function, ::Function, ::Function, ::Float64, ::HMCDA{Turing.Core.ForwardDiffAD{40},Any}, ::getfield(Turing.Inference, Symbol("##18#19")){Int64,Array{Float64,1}}) at ./none:0
 [27] step(::Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}, ::Turing.Sampler{HMCDA{Turing.Core.ForwardDiffAD{40},Any}}, ::Turing.Core.VarReplay.VarInfo, ::Val{false}) at /Users/hg344/.julia/dev/Turing/src/inference/hmc.jl:236
 [28] macro expansion at ./util.jl:213 [inlined]
 [29] #sample#37(::Bool, ::Nothing, ::Int64, ::Nothing, ::Function, ::Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}, ::HMCDA{Turing.Core.ForwardDiffAD{40},Any}) at /Users/hg344/.julia/dev/Turing/src/inference/hmc.jl:153
 [30] sample(::Turing.Model{Tuple{:theta,:phi},Tuple{:w},getfield(Main, Symbol("###inner_function#413#22")){Int64,Int64,Int64,Int64,Array{Int64,1},Array{Float64,1},Array{Float64,1}},NamedTuple{(:w,),Tuple{Array{Int64,1}}},NamedTuple{(:w,),Tuple{Symbol}}}, ::HMCDA{Turing.Core.ForwardDiffAD{40},Any}) at /Users/hg344/.julia/dev/Turing/src/inference/hmc.jl:108
 [31] top-level scope at none:0

@yebai yebai changed the title WIP: Bug-fixing in dual averaging for HMC Bug-fixing in dual averaging for HMC Mar 10, 2019
@yebai yebai requested review from xukai92 and trappmartin March 10, 2019 17:24
@yebai
Copy link
Member Author

yebai commented Mar 10, 2019

Ready for a review now. The remaining issue with LDA is not step-size related and better be addressed in a separate PR.

while (iter_num <= max_num_iters)
θ = θ_prime
delta_H = H0 - h # logp(θ`) - logp(θ)
direction = delta_H > log(a_cross) ? 1 : -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@xukai92
Copy link
Member

xukai92 commented Mar 12, 2019

Looks great! Many thanks for catching the bug. I'm happy to merge this PR!

@yebai yebai merged commit 6c0f4b8 into master Mar 12, 2019
@yebai yebai deleted the hg/fix-dual-averaging branch March 12, 2019 09:34
@yebai yebai mentioned this pull request Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants