Model syntax: provide automatically all random variables as argument? #792

scheidan · 2019-05-22T08:15:53Z

Following the discussion about sampling from the prior and missings (#786), I put some thoughts together about the model syntax. Maybe this is helpful for your refactoring plans.

Currently, when we define a model, we already have to decide in advance on which random variables (RV) we want to condition on later. It would be nice to decouple this steps:

define the model, i.e. the joint distribution
condition on any RV we have data

This is already possible to a large degree, if we list manually all RV as arguments:

# -----------
# define model

# Note, x_det is deterministic data, not a RV!

# defines p(y, a, b, s ; x_det)
@model model1(x_det, y) = begin
    a ~ Normal(0, 10)
    b ~ Normal(0, 10)
    s ~ Exponential(1)

    yhat = a*x_det + b
    y ~ Normal(yhat, s)

end

# defines also p(y, a, b, s ; x_det)
# with the added advantage that we
# can also condition on a, b, and s, see examples below
@model model2(x_det, y, a, b, s) = begin
    a ~ Normal(0, 10)
    b ~ Normal(0, 10)
    s ~ Exponential(1)

    yhat = a*x_det + b
    y ~ Normal(yhat, s)

end

# -----------
# sample

# p(a,b,s | y; x_det)
chain1 = sample(model1(10.0, 20.0), sampler)

# same as above: p(a,b,s | y; x_det)
chain2 = sample(model2(10.0, 20.0), sampler)

# same as above: p(a,b,s | y; x_det)
chain3 = sample(model2(10.0, 20.0, missing, missing, missing), sampler)

# with model2 we can also condition on other RV:
# p(y, a | b, s; x_det)
chain4 = sample(model2(10.0, missing, missing, 2.2, 0.2), sampler)

# Conditioning on x_det is not meaningful, because it is not a RV
# p(x_det | y, a, b, s ) -> MethodError
chain5 = sample(model2(missing, 3.3, 1.1, 2.2, 0.2), sampler)

So model1 and model2 are identical, but the later one is much more flexible. Therefore, I was wondering if it is a good idea, to generate something like model2 automatically. I could imagine a syntax like this:

# The user only specifies the deterministic variables as argument
@model model(x_det) = begin
    # prior
    a ~ Normal(0, 10)
    b ~ Normal(0, 10)
    s ~ Exponential(1)

    yhat = a*x_det + b
    y ~ Normal(yhat, s)
end

# this would get translated to:
@model model(x_det; y=missing, a=missing, b=missing, s=missing) = begin
    ...
end

# Now we can condition on every RV we want. 
# However, deterministic variables must always be given.
sample(model2(1.1, y=3.3, s=0.1), sampler)

For vector RV this would need some additional thoughts on how to pass the dimensions.

Conceptually this seems neat, however I cannot judge how difficult such an implementation would be.

The text was updated successfully, but these errors were encountered:

mohamed82008 · 2019-11-17T22:35:56Z

Thanks for your proposal @scheidan and sorry for the late response; this issue somehow slipped through the cracks. I like your proposal of making all random variables potentially observed variables. With #972, it may be possible to support a feature like this quite neatly using an additional model constructor method. Follow #965 for progress on this idea.

mohamed82008 · 2019-12-15T08:22:10Z

So I tried implementing this but I am not too happy about 3 things:

Random variables used on the LHS of .~ cannot be changed to observations easily. This is because the initialization of the variable on the LHS happens inline and we don't know where, so we cannot remove it.
Implementing this feature adds a level of complexity to the compiler that I am not comfortable with adding, especially after putting some effort into simplifying the compiler design in Compiler 3.0 #965 .
The feature is almost already supported as you said, if you have all the symbols in the model's arguments with default value missing and selectively define some to be data. In my opinion, this is the correct way of dealing with variables that can be either observations or random variables.

Given that the value added here is tiny compared to the inconvenience of automatically implementing an imperfect version of this feature, I don't think we should have it in Turing at all. I will close this issue for now. Later when we have Soss interop, perhaps we can borrow this feature from Soss.

This was referenced Jun 20, 2019

Completely eliminating Real from the model and pre-allocating #665

Closed

TuringLang Roadmap #774

Closed

yebai added compiler labels Oct 3, 2019

mohamed82008 mentioned this issue Nov 17, 2019

Unifying assume and observe and handling missing data and user-input variable names #972

Closed

mohamed82008 closed this as completed Dec 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model syntax: provide automatically all random variables as argument? #792

Model syntax: provide automatically all random variables as argument? #792

scheidan commented May 22, 2019

mohamed82008 commented Nov 17, 2019

mohamed82008 commented Dec 15, 2019

Model syntax: provide automatically all random variables as argument? #792

Model syntax: provide automatically all random variables as argument? #792

Comments

scheidan commented May 22, 2019

mohamed82008 commented Nov 17, 2019

mohamed82008 commented Dec 15, 2019