Log probability of an observation #817

trappmartin · 2019-06-21T10:09:41Z

We are currently not able to compute the log probability p(x_i | theta) for each observation in Turing. Instead, we always compute sum_i log p(x_i, theta) which makes a lot of sense from an inference point of view. However, adding this functionality would allow:

Model selection (computing the model evidence)
Prediction using a Turing model (required for MLJ integration)

I started discussing this with @mohamed82008 and we had a few ideas on how to approach this, all of them seem rather hacked in my opinion.

Here is an alternative proposal....

One of the issues (from what I see) is that we currently do not have a VarName for observe statements. However, we could easily extend the compiler by generating those allowing us to pass on a VarName object to each observe statement.

function generate_observe(observation, dist, model_info)
    main_body_names = model_info[:main_body_names]
    vi = main_body_names[:vi]
    sampler = main_body_names[:sampler]

    varname = gensym(:varname)
    sym, idcs, csym = gensym(:sym), gensym(:idcs), gensym(:csym)
    csym_str, indexing, syms = gensym(:csym_str), gensym(:indexing), gensym(:syms)

    if observation isa Symbol
        varname_expr = quote
            $sym, $idcs, $csym = Turing.@VarName $observation
            $csym = Symbol($(QuoteNode(model_info[:name])), $csym)
            $syms = Symbol[$csym, $(QuoteNode(observation))]
            $varname = Turing.VarName($syms, "")
        end
    else
        varname_expr = quote
            $sym, $idcs, $csym = Turing.@VarName $observation
            $csym_str = string($(QuoteNode(model_info[:name])))*string($csym)
            $indexing = foldl(*, $idcs, init = "")
            $varname = Turing.VarName(Symbol($csym_str), $sym, $indexing)
        end
    end

    return quote
        $varname_expr
        isdist = if isa($dist, AbstractVector)
            # Check if the right-hand side is a vector of distributions.
            all(d -> isa(d, Distribution), $dist)
        else
            # Check if the right-hand side is a distribution.
            isa($dist, Distribution)
        end
        @assert isdist @error($(wrong_dist_errormsg(@__LINE__)))
        Turing.observe($sampler, $dist, $observation, $varname, $vi)
    end
end

Further, we would

need to change the way we manipulate the vi.logp field as this could now be a Float64 in case of aggregation or a Vector{Float64} in case of log probability values for each observation, or
store the logp values for each observation inside the VarInfo, i.e. similar to the parameter values, and treat logp the way we do it now.

The first option would require us to additionally write a tailored sampler that computes only the log pdf and not the log joint. This is easy but maybe unnecessary overhead and would require to re-evaluate the model for each iteration in case of model-selection.

If we go for option 2 (which is similar to what a user can do in Stan) we would store the aggregated log joint in logp and the log pdf in addition in the VarInfo. This additional storing of the logp values for each observation would be disabled by default and could be used by setting a kwarg. In contrast to option 1, this one would be memory intensive if we aim to compute the model evidence which could be prevented by re-evaluting the model for each iteration (similar to option 1).

I think option 2 might be the more convenient option.

(cc'ing @yebai @xukai92 @willtebbutt @ablaom )

The text was updated successfully, but these errors were encountered:

torfjelde · 2019-06-21T18:01:36Z

This would also be very useful for variational inference:

Ammortized VI requires computing logjoint for different parameter theta_i for each observation, i.e. need to compute p(x_i, theta_i) = p(x_i | theta_i) p(theta_i) for each i, with possibly theta_i \ne theta_j.
For larger datasets, being able to do stochastic gradient descent, i.e. use random subset of observations for the gradient estimate, is quite important.

mohamed82008 · 2019-06-22T03:29:01Z

I think we need a "Taking VI seriously" kind of issue, to discuss all the requirements of VI from the rest of Turing. Supporting SGD, this issue and the Bijectors issue are all things that can and should be resolved with enough hours behind them. @torfjelde when you feel that you have a sufficiently panoramic view of VI algorithms and how to support them fully in Turing, consider starting an issue discussing your thoughts.

DoktorMike · 2019-07-08T21:16:49Z

Just giving my two cents here, but if VI should be included in Turing, which I think is really a great idea, then I think it should be based on F-divergence instead of only ELBO (A small subset of the former). This way Turing would move ahead of much of the competition in the UPPL space.

trappmartin · 2019-07-08T21:51:03Z

I think it would be interesting to have a way to plug in the divergence you like, e.g. KL or beta. But I don’t know how easy such a framework would be to realise.

xukai92 · 2019-09-15T22:38:03Z

I think we could potentially introduce two types of VarInfo that peforms "storing log density for each ~ notation" and the current behaviour (aggreation) seperately. And for the first type, we could define vi.logp to sum all the invidual log density values.

mohamed82008 · 2019-12-25T12:25:28Z

Closed via #997 .

trappmartin mentioned this issue Jun 21, 2019

Integrate Turing.jl probablistic programming JuliaAI/MLJ.jl#157

Open

torfjelde mentioned this issue Sep 13, 2019

API design for non-Turing samplers #886

Closed

yebai added compiler labels Oct 3, 2019

mohamed82008 closed this as completed Dec 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log probability of an observation #817

Log probability of an observation #817

trappmartin commented Jun 21, 2019

torfjelde commented Jun 21, 2019

mohamed82008 commented Jun 22, 2019

DoktorMike commented Jul 8, 2019

trappmartin commented Jul 8, 2019

xukai92 commented Sep 15, 2019

mohamed82008 commented Dec 25, 2019

Log probability of an observation #817

Log probability of an observation #817

Comments

trappmartin commented Jun 21, 2019

torfjelde commented Jun 21, 2019

mohamed82008 commented Jun 22, 2019

DoktorMike commented Jul 8, 2019

trappmartin commented Jul 8, 2019

xukai92 commented Sep 15, 2019

mohamed82008 commented Dec 25, 2019