Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model syntax: provide automatically all random variables as argument? #792

Closed
scheidan opened this issue May 22, 2019 · 2 comments
Closed

Comments

@scheidan
Copy link

Following the discussion about sampling from the prior and missings (#786), I put some thoughts together about the model syntax. Maybe this is helpful for your refactoring plans.

Currently, when we define a model, we already have to decide in advance on which random variables (RV) we want to condition on later. It would be nice to decouple this steps:

  1. define the model, i.e. the joint distribution
  2. condition on any RV we have data

This is already possible to a large degree, if we list manually all RV as arguments:

# -----------
# define model

# Note, x_det is deterministic data, not a RV!

# defines p(y, a, b, s ; x_det)
@model model1(x_det, y) = begin
    a ~ Normal(0, 10)
    b ~ Normal(0, 10)
    s ~ Exponential(1)

    yhat = a*x_det + b
    y ~ Normal(yhat, s)

end

# defines also p(y, a, b, s ; x_det)
# with the added advantage that we
# can also condition on a, b, and s, see examples below
@model model2(x_det, y, a, b, s) = begin
    a ~ Normal(0, 10)
    b ~ Normal(0, 10)
    s ~ Exponential(1)

    yhat = a*x_det + b
    y ~ Normal(yhat, s)

end

# -----------
# sample

# p(a,b,s | y; x_det)
chain1 = sample(model1(10.0, 20.0), sampler)

# same as above: p(a,b,s | y; x_det)
chain2 = sample(model2(10.0, 20.0), sampler)

# same as above: p(a,b,s | y; x_det)
chain3 = sample(model2(10.0, 20.0, missing, missing, missing), sampler)

# with model2 we can also condition on other RV:
# p(y, a | b, s; x_det)
chain4 = sample(model2(10.0, missing, missing, 2.2, 0.2), sampler)

# Conditioning on x_det is not meaningful, because it is not a RV
# p(x_det | y, a, b, s ) -> MethodError
chain5 = sample(model2(missing, 3.3, 1.1, 2.2, 0.2), sampler)

So model1 and model2 are identical, but the later one is much more flexible. Therefore, I was wondering if it is a good idea, to generate something like model2 automatically. I could imagine a syntax like this:

# The user only specifies the deterministic variables as argument
@model model(x_det) = begin
    # prior
    a ~ Normal(0, 10)
    b ~ Normal(0, 10)
    s ~ Exponential(1)

    yhat = a*x_det + b
    y ~ Normal(yhat, s)
end

# this would get translated to:
@model model(x_det; y=missing, a=missing, b=missing, s=missing) = begin
    ...
end

# Now we can condition on every RV we want. 
# However, deterministic variables must always be given.
sample(model2(1.1, y=3.3, s=0.1), sampler)

For vector RV this would need some additional thoughts on how to pass the dimensions.

Conceptually this seems neat, however I cannot judge how difficult such an implementation would be.

@mohamed82008
Copy link
Member

Thanks for your proposal @scheidan and sorry for the late response; this issue somehow slipped through the cracks. I like your proposal of making all random variables potentially observed variables. With #972, it may be possible to support a feature like this quite neatly using an additional model constructor method. Follow #965 for progress on this idea.

@mohamed82008
Copy link
Member

So I tried implementing this but I am not too happy about 3 things:

  1. Random variables used on the LHS of .~ cannot be changed to observations easily. This is because the initialization of the variable on the LHS happens inline and we don't know where, so we cannot remove it.
  2. Implementing this feature adds a level of complexity to the compiler that I am not comfortable with adding, especially after putting some effort into simplifying the compiler design in Compiler 3.0 #965 .
  3. The feature is almost already supported as you said, if you have all the symbols in the model's arguments with default value missing and selectively define some to be data. In my opinion, this is the correct way of dealing with variables that can be either observations or random variables.

Given that the value added here is tiny compared to the inconvenience of automatically implementing an imperfect version of this feature, I don't think we should have it in Turing at all. I will close this issue for now. Later when we have Soss interop, perhaps we can borrow this feature from Soss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants