Simple syntax for querying the model and posterior sample #989

mohamed82008 · 2019-11-22T23:09:10Z

In this issue, I propose the creation of a new macro @logprob to query Turing models as well as the sampled posterior. Let's take the following model as an example using the syntax of #965:

@model demo(x, y) = begin
    a ~ Normal()
    b ~ Gamma()
    y .~ Normal.(a .* x, b)
end
model = demo(rand(100), rand(100))
chain = sample(model, NUTS(10, 0.65), 10000)

I propose the following syntax:

@logprob a = 1.0, b = 2.0 | model = model -> returns log the prior
@logprob a = 1.0, b = 2.0, x = rand(100), y = rand(100) | model = model -> returns log the joint probability
@logprob 0.2 <= a <= 0.3, 1.0 <= b <= 1.1 | model = model, chain = chain -> returns log the ratio of the number of samples in chain with 0.2 <= a <= 0.3, 1.0 <= b <= 1.1. For discrete distributions, we can also use a = 2 or a == 2 for example. Passing model here can be optional.
@logprob x = rand(10), y = rand(10) | model = model, chain = chain returns the log likelihood of x = rand(10), y = rand(10) for each sample in chain
@logprob x = rand(10), y = rand(10) | model = model, a = 1.0, b = 2.0 returns the log likelihood of x = rand(10), y = rand(10) using a = 1.0, b = 2.0.

Let me know if you have comments on the syntax or if I missed any use case.

The text was updated successfully, but these errors were encountered:

mohamed82008 · 2019-11-22T23:14:08Z

I can also provide an @prob macro that returns the probability instead of log the probability.

mohamed82008 · 2019-11-22T23:17:12Z

I suggest we also allow = and == to be used interchangeably in the examples above.

cpfiffer · 2019-11-25T02:44:40Z

I like it, I think it'd be a good tool to have. I have no helpful comments, so I guess you're welcome?

mohamed82008 · 2019-11-25T06:08:08Z

I am working on it.

xukai92 · 2019-11-26T18:45:25Z

@logprob a = 1.0, b = 2.0, x = rand(100), y = rand(100) | model = model -> returns log the joint probability

model is created by demo(rand(100), rand(100)) in which x and y are already provided. It's unclear to me what the log joint is supposed to return.

devmotion · 2019-11-26T19:12:34Z

I agree, it feels a bit strange to me. I think it would be more natural to write

@logprob a = 1.0, b = 2.0 | model = demo

for the prior probabilities p(a = 1.0, b = 2.0) and

@logprob a = 1.0, b = 2.0, x = x, y = y | model = demo

for the joint probability p(a = 1.0, b = 2.0, x = x, y = y). I'm not even sure if it should be a macro, couldn't one just define a function logprob that allows to evaluate logprob(demo, (a = 1.0, b = 2.0)) or logprob(demo, (a = 1.0, b = 2.0, x = x, y = y))? Maybe for convenience one could even define demo.logprob instead which allows to write demo.logprob(a = 1.0, b = 2.0).

mohamed82008 · 2019-11-26T22:00:37Z

@xukai92 the log joint probability case will replace the data inside model and compute the log joint probability for the given data and parameters. If that's not intuitive enough, I can limit this macro to the likelihood and posterior only. Convenience functions can then be made to calculate the prior and joint probability for NamedTuple parameters as @devmotion suggested. In case of the joint probability, the data inside the model can be used.

Actually, for all of them I can just make a function under the hood that can be called instead of the macro. So I have a few questions for you all:

Do you prefer a macro and a function API, a function only API or a macro only API?
Should I include the prior and joint probability syntax in the macro with the model = demo suggestion?
Is conveniently calculating the log joint probability for new data and new parameters useful in practice or not?

cpfiffer · 2019-11-29T17:27:40Z

I would actually prefer to have both. People who do a lot of automated model stuff would prefer a function API, whereas all the folks who are messing around with the REPL prefer the macro API. Might be that one ends up being better than the other, so we could deprecate one if it is just less useful.

mohamed82008 · 2019-12-02T01:43:14Z

Sounds good.

xukai92 · 2019-12-02T14:36:02Z

Re 1: I agree with Cameron that we do need both.
Re 2: I feel model = demo is more intuitive than model = model
Re 3: I think it's useful in general.

jonasmac16 · 2019-12-10T23:03:29Z

Will this allow this also allow to return the point logp of the each data point given a parameter set?

mohamed82008 · 2019-12-10T23:38:16Z

Yes

jonasmac16 · 2019-12-10T23:51:25Z

That's pretty cool. Is this available once the referenced PR is accepted?

mohamed82008 · 2019-12-11T00:01:51Z

Yup!

yebai · 2019-12-17T15:14:08Z

@logprob 0.2 <= a <= 0.3, 1.0 <= b <= 1.1 | model = model, chain = chain -> returns log the ratio of the number of samples in chain with 0.2 <= a <= 0.3, 1.0 <= b <= 1.1. For discrete distributions, we can also use a = 2 or a == 2 for example. Passing model here can be optional.

I am slightly confused by the above feature. It seems like this should be a filter-style function in the MCMCChains package. Also, what are the potential use cases?

cpfiffer · 2019-12-17T17:14:33Z

Hypothesis testing, for one. If you wanted to have some measure of confidence about whether a parameter is negative and you wanted a quick check:

@logprob a >= 0 | model=model, chain=chain

I think it's a little nicer to work with than a Chains object, and it's very quick to write.

devmotion · 2019-12-17T20:13:25Z

I am slightly confused by the above feature. It seems like this should be a filter-style function in the MCMCChains package.

Indeed IMO that feature seems more like one of the other convenient functionalities provided by MCMCChains. It is also not clear to me why specifying model = model would be useful in that example?

The nice thing about using a function instead of a macro would be that one could just dispatch on Chains (or AbstractChains) and implement the chain-based evaluations in MCMCChains. Since there seems to be an agreement that also a macro is needed, maybe one solution could be to provide a generic @logpdf macro in Turing that parses the input and forwards NamedTuples (or something else?) to the functional variant. The macro feels slightly more restricted, however, since it seems to require some protected parameter names (such as model or chain, e.g.).

I think it's a little nicer to work with than a Chains object, and it's very quick to write.

To me logprob(x -> x.a >= 0, chain) or

logprob(chain) do sample
    sample.a >= 0
end

doesn't feel too bad and actually more powerful and flexible than a macro.

cpfiffer · 2019-12-17T20:56:38Z

To me logprob(x -> x.a >= 0, chain) or
logprob(chain) do sample
    sample.a >= 0
end

I like that! I think we could probably stick this on the MCMCChains side.

mohamed82008 · 2019-12-17T23:03:16Z

I agree that this doesn't belong here. I actually left it out from #997 .

devmotion · 2019-12-18T09:40:39Z

You're referring only to the filtering functionality, I guess? As far as I can see, #997 contains the chain-based feature

Query the likelihood given every sample in the chain using prob"y = rand(3) | chain = chain, model = demo"

mohamed82008 · 2019-12-18T11:28:54Z

@devmotion yes that feature is actually important, @trappmartin and @sethaxen have uses for it. Imo, I see this belongs to Turing (and in the future DynamicPPL) because the same VarNames we use in VarInfo are keys in the chain:Chain output by Turing.sample. I make use of that to feed chain[i] for example into the VarInfo before running the model in the LikelihoodContext. This may not be possible for any arbitrary chain coming from other packages, and going from chain to NamedTuple or another standard underlying representation is completely non-trivial for nested array types, e.g. when the parameter is an array of matrices. So model = demo, chain = chain makes sense for 2 reasons:

Both demo and chain are used to compute the likelihoods, and
chain was generated from demo to begin with and the 2 are compatible, while other seemingly equivalent models may not work with chain and other seemingly equivalent chains may not work with demo.

devmotion · 2019-12-18T12:59:29Z

chain was generated from demo to begin with and the 2 are compatible, while other seemingly equivalent models may not work with chain and other seemingly equivalent chains may not work with demo.

To me that sounds like the syntax should be just y = rand(3) | chain = chain (without model = demo) to avoid any incompatibilities between chain and model, and just use the model saved in chain.info which was used to generate chain.

This may not be possible for any arbitrary chain coming from other packages

Sure, but does this imply that this functionality can't be provided in, e.g., MCMCChains?

mohamed82008 · 2019-12-18T13:27:43Z

Hmm using the model in chain sounds good, I had missed the fact that we can store these in chain. The only catch is that the chain.info field optionally stores the model in Turing, but that's easily fixable. Thanks!

mohamed82008 · 2019-12-18T13:35:03Z

Sure, but does this imply that this functionality can't be provided in, e.g., MCMCChains?

Yes we need access to VarInfo which is not seen by MCMCChains, may be in a future PR we can consider moving things and interfacing.

mohamed82008 · 2019-12-18T17:52:57Z

So here is a nice middle ground, if model is defined in chain, we use that. Otherwise, we check if model was passed on the RHS. We do the same for varinfo. That way even if the user forgot to save model and VarInfo in chain, we can still save the day. Sounds good?

mohamed82008 · 2019-12-18T17:58:36Z

Hmm nevermind, that won't be necessary if chain always saves varinfo and model. Any reason not to do save them by default in my PR @cpfiffer?

mohamed82008 · 2019-12-18T18:03:57Z

Even if it is saved by default, I will leave the backup there.

cpfiffer · 2019-12-19T01:48:04Z

No reason that I can think. You could probably just set the keyword save_state=true and everything should be good.

mohamed82008 · 2019-12-19T06:33:36Z

Cool, thanks!

mohamed82008 · 2019-12-25T12:23:38Z

Closed via #997.

mohamed82008 mentioned this issue Nov 26, 2019

prob and logprob macros for simple model queries #997

Merged

yebai mentioned this issue Dec 17, 2019

Wrapping an external sampler: PolyChord #895

Closed

mohamed82008 closed this as completed Dec 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple syntax for querying the model and posterior sample #989

Simple syntax for querying the model and posterior sample #989

mohamed82008 commented Nov 22, 2019 •

edited

Loading

mohamed82008 commented Nov 22, 2019

mohamed82008 commented Nov 22, 2019

cpfiffer commented Nov 25, 2019

mohamed82008 commented Nov 25, 2019

xukai92 commented Nov 26, 2019

devmotion commented Nov 26, 2019 •

edited

Loading

mohamed82008 commented Nov 26, 2019 •

edited

Loading

cpfiffer commented Nov 29, 2019

mohamed82008 commented Dec 2, 2019

xukai92 commented Dec 2, 2019

jonasmac16 commented Dec 10, 2019

mohamed82008 commented Dec 10, 2019

jonasmac16 commented Dec 10, 2019 •

edited

Loading

mohamed82008 commented Dec 11, 2019

yebai commented Dec 17, 2019

cpfiffer commented Dec 17, 2019

devmotion commented Dec 17, 2019 •

edited

Loading

cpfiffer commented Dec 17, 2019

mohamed82008 commented Dec 17, 2019

devmotion commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

devmotion commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

cpfiffer commented Dec 19, 2019

mohamed82008 commented Dec 19, 2019

mohamed82008 commented Dec 25, 2019

Simple syntax for querying the model and posterior sample #989

Simple syntax for querying the model and posterior sample #989

Comments

mohamed82008 commented Nov 22, 2019 • edited Loading

mohamed82008 commented Nov 22, 2019

mohamed82008 commented Nov 22, 2019

cpfiffer commented Nov 25, 2019

mohamed82008 commented Nov 25, 2019

xukai92 commented Nov 26, 2019

devmotion commented Nov 26, 2019 • edited Loading

mohamed82008 commented Nov 26, 2019 • edited Loading

cpfiffer commented Nov 29, 2019

mohamed82008 commented Dec 2, 2019

xukai92 commented Dec 2, 2019

jonasmac16 commented Dec 10, 2019

mohamed82008 commented Dec 10, 2019

jonasmac16 commented Dec 10, 2019 • edited Loading

mohamed82008 commented Dec 11, 2019

yebai commented Dec 17, 2019

cpfiffer commented Dec 17, 2019

devmotion commented Dec 17, 2019 • edited Loading

cpfiffer commented Dec 17, 2019

mohamed82008 commented Dec 17, 2019

devmotion commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

devmotion commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

mohamed82008 commented Dec 18, 2019

cpfiffer commented Dec 19, 2019

mohamed82008 commented Dec 19, 2019

mohamed82008 commented Dec 25, 2019

mohamed82008 commented Nov 22, 2019 •

edited

Loading

devmotion commented Nov 26, 2019 •

edited

Loading

mohamed82008 commented Nov 26, 2019 •

edited

Loading

jonasmac16 commented Dec 10, 2019 •

edited

Loading

devmotion commented Dec 17, 2019 •

edited

Loading