-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement observe
and do
model transformations
#168
Conversation
ricardoV94
commented
May 17, 2023
•
edited
Loading
edited
d1a5853
to
4ef10c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me. I’d only add more expansions and examples to the docstrings.
By the way, what do you think about adding the inverse operations? Something like unobserve? I don’t think that a do actually has a well defined inverse. People could use do again to get the original model back though
Yeah I thought about it... but I am not sure what an unobserved variable should be. A Deterministic? A FreeRV?
Yeah that can't have an inverse because it's just a constant. The user would have to tell us what RV to replace it with and whether it's free, an observed or even a deterministic. Given the flexibility I think it should have it's own name? I think I would focus on just the two transforms we have in this PR for now. We still have to see if the approach is even useful in real applications. If it is, we can come back and expand with the remaining subspace of model transformations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you get your new (mutilated) model back from pm.do
, what's the next step? Calling pm.sample_posterior_predictive
? What would happen if you call pm.sample_prior_predictive()
, would you get the same result? It might be useful to include that step as an example in the docstring maybe also a test?
Yes, most common use case would be to call |
1b47247
to
0f56fe1
Compare
1a0eedf
to
ffd21e7
Compare
Tests are passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, at the moment the do
operator only operates in the situation where you want to replace a random variable with observed data. This is fine, but this is only one use-case. In this case, a user would implement the do operator with pm.do()
then pm.sample_posterior_predictive
But if you take the potential outcomes approach to confounder adjustment OR take the SCM approach with do-calculus to calculate an adjustment set (with backdoor criterion), then in both cases you basically end up with a linear regression where you enter in the variables you decide to condition upon. In this situation, these are defined in the model as pm.MutableData
. In this case, a user should implement the do operator with pm.set_data()
then pm.sample_posterior_predictive
From an implementation point of view, I can see that we might want different functions to implement these different things (ie. replace observed with observed vs replace RV with observed). But from a user-facing point of view, they could see it as frustrating that they have to remember which they have to use (pm.set_data
or pm.do
) when in both cases they want to "do".
My proposal would be along these lines:
pm.do
check to see if the target node(s) are data or RV's.- If they are data, then you could either get a friendly error message telling you to use
pm.set_data
, or (ideally) it would callpm.set_data
- If they are RV's, then they carry on and do the currently implemented graph manipulation
This is not the case. You can replace the variables by anything you want (as long as the variables have the same type as the thing that is being replaced). Check the test where we replace two variables by an expression with a shared variable that acts as a switch: I just mentioned you didn't have to, but you can certainly replace constant data by other constant data if you want to use the same method |
That was actually supposed to work. Gonna try and fix it |
Looks like a bug. Slowly but surely we're getting there xD Does it also change status when |
No. Only when |
Given that we've now tested this quite a bit, shouldn't we just put this into pymc proper? |
I would say no. The underlying functionality (model->fgraph) was changed like 20x in the course of this PR, and it really helps to be able to break it and start from scratch without worries about breaking user compat. |
I think it's fixed now! |
Yes - certainly for the examples I was looking at the mutilated graph with |
Let me know if there's anything else you want me to test. Otherwise I'm happy to approve |
If you think this covers all the use cases for the blogpost we can merge it (need to rebase once more first). Edit: Already rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we now have the functionality needed for a blog post. Ideally we get a bit more road testing, and eyeballs from other people, to catch any issues. But moving into the pymc repo reasonably soon would be good.
I'll be honest, I don't want to do that super soon. Not because of the I don't get the rush either |
There's some anticipation because it's cool and would be good to get out there. But I agree, if it relies on stuff that is still experimental then there's no need to rush. Getting a blog post out there which calls on pymc-experimental would sate the desire I think. |
Tests should now pass again. Need a green review to merge. We can cut a release after |
We should make sure to add an example NB / case study and then promote. |
Also, congrats @ricardoV94, this is majorly cool new functionality. |
observe
and do
model transformations