Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement observe and do model transformations #168

Merged
merged 4 commits into from
Jun 5, 2023

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented May 17, 2023

import pymc as pm
from pymc_experimental.model_transform.conditioning import do

with pm.Model() as m:
    x = pm.Normal("x", 0, 1)
    y = pm.Normal("y", x, 1)
    z = pm.Normal("z", y + x, 1)

# Dummy posterior, same as calling `pm.sample`
idata_m = az.from_dict({rv.name: [pm.draw(rv, draws=500)] for rv in [x, y, z]})

# Replace `y` by a constant `100.0`
m_do = do(m, {y: 100.0})
with m_do:
    idata_do = pm.sample_posterior_predictive(idata_m, var_names="z")

@ricardoV94 ricardoV94 marked this pull request as draft May 17, 2023 13:58
@ricardoV94 ricardoV94 added the enhancements New feature or request label May 17, 2023
@ricardoV94 ricardoV94 force-pushed the do_operation branch 2 times, most recently from d1a5853 to 4ef10c9 Compare May 17, 2023 14:15
Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. I’d only add more expansions and examples to the docstrings.

By the way, what do you think about adding the inverse operations? Something like unobserve? I don’t think that a do actually has a well defined inverse. People could use do again to get the original model back though

@ricardoV94
Copy link
Member Author

ricardoV94 commented May 22, 2023

Something like unobserve?

Yeah I thought about it... but I am not sure what an unobserved variable should be. A Deterministic? A FreeRV?

I don’t think that a do actually has a well defined inverse. People could use do again to get the original model back though

Yeah that can't have an inverse because it's just a constant. The user would have to tell us what RV to replace it with and whether it's free, an observed or even a deterministic. Given the flexibility I think it should have it's own name? replace_by_var?

I think I would focus on just the two transforms we have in this PR for now. We still have to see if the approach is even useful in real applications. If it is, we can come back and expand with the remaining subspace of model transformations.

Copy link

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you get your new (mutilated) model back from pm.do, what's the next step? Calling pm.sample_posterior_predictive? What would happen if you call pm.sample_prior_predictive(), would you get the same result? It might be useful to include that step as an example in the docstring maybe also a test?

@ricardoV94
Copy link
Member Author

ricardoV94 commented May 26, 2023

Once you get your new (mutilated) model back from pm.do, what's the next step? Calling pm.sample_posterior_predictive? What would happen if you call pm.sample_prior_predictive(), would you get the same result? It might be useful to include that step as an example in the docstring maybe also a test?

Yes, most common use case would be to call sample_posterior_predictive afterwards.
If you call sample_prior_predictive you ignore everything you learned about the parameters that are still in the model.
I'll add the example in the docstring.

@ricardoV94 ricardoV94 force-pushed the do_operation branch 2 times, most recently from 1b47247 to 0f56fe1 Compare May 29, 2023 13:21
@ricardoV94 ricardoV94 force-pushed the do_operation branch 3 times, most recently from 1a0eedf to ffd21e7 Compare May 30, 2023 11:10
@ricardoV94 ricardoV94 marked this pull request as ready for review May 30, 2023 11:38
@ricardoV94
Copy link
Member Author

Tests are passing

Copy link

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, at the moment the do operator only operates in the situation where you want to replace a random variable with observed data. This is fine, but this is only one use-case. In this case, a user would implement the do operator with pm.do() then pm.sample_posterior_predictive

But if you take the potential outcomes approach to confounder adjustment OR take the SCM approach with do-calculus to calculate an adjustment set (with backdoor criterion), then in both cases you basically end up with a linear regression where you enter in the variables you decide to condition upon. In this situation, these are defined in the model as pm.MutableData. In this case, a user should implement the do operator with pm.set_data() then pm.sample_posterior_predictive

From an implementation point of view, I can see that we might want different functions to implement these different things (ie. replace observed with observed vs replace RV with observed). But from a user-facing point of view, they could see it as frustrating that they have to remember which they have to use (pm.set_data or pm.do) when in both cases they want to "do".

My proposal would be along these lines:

  • pm.do check to see if the target node(s) are data or RV's.
  • If they are data, then you could either get a friendly error message telling you to use pm.set_data, or (ideally) it would call pm.set_data
  • If they are RV's, then they carry on and do the currently implemented graph manipulation

@drbenvincent
Copy link

  • It could be better if the newly injected ConstantData node could inherit the dims from the RV that it replaces

  • I'd also vote for fully removing the parent nodes from any nodes that have been intervened on to make the graphviz simpler. See pics below

Example before
Screenshot 2023-06-01 at 12 09 31

Example after
Screenshot 2023-06-01 at 12 09 45

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 1, 2023

As far as I understand, at the moment the do operator only operates in the situation where you want to replace a random variable with observed data.

This is not the case. You can replace the variables by anything you want (as long as the variables have the same type as the thing that is being replaced). Check the test where we replace two variables by an expression with a shared variable that acts as a switch:

https://github.com/pymc-devs/pymc-experimental/blob/ff32a66d1e2eaf96493cc7c29e99da029320e9f0/pymc_experimental/tests/model_transform/test_conditioning.py#L108-L110

I just mentioned you didn't have to, but you can certainly replace constant data by other constant data if you want to use the same method

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 1, 2023

It could be better if the newly injected ConstantData node could inherit the dims from the RV that it replaces

That was actually supposed to work. Gonna try and fix it

https://github.com/pymc-devs/pymc-experimental/blob/ff32a66d1e2eaf96493cc7c29e99da029320e9f0/pymc_experimental/model_transform/conditioning.py#L176-L177

@drbenvincent
Copy link

drbenvincent commented Jun 2, 2023

I tried it out.
BEFORE
Screenshot 2023-06-02 at 13 45 29
AFTER

model_control = do(model_scm, {"z": np.zeros(N, dtype='int32')}, prune_vars=True)

Screenshot 2023-06-02 at 13 46 03
Only thing that looks like it might be an issue is the status of y_data has changed.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 2, 2023

Only thing that looks like it might be an issue is the status of y_data has changed.

Looks like a bug. Slowly but surely we're getting there xD

Does it also change status when prune_vars=False?

@drbenvincent
Copy link

Does it also change status when prune_vars=False?

No. Only when prune_vars=True

@twiecki
Copy link
Member

twiecki commented Jun 2, 2023

Given that we've now tested this quite a bit, shouldn't we just put this into pymc proper?

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 2, 2023

Given that we've now tested this quite a bit, shouldn't we just put this into pymc proper?

I would say no.

The underlying functionality (model->fgraph) was changed like 20x in the course of this PR, and it really helps to be able to break it and start from scratch without worries about breaking user compat.

@ricardoV94
Copy link
Member Author

No. Only when prune_vars=True

I think it's fixed now!

@drbenvincent
Copy link

I think it's fixed now!

Yes - certainly for the examples I was looking at the mutilated graph with prune_vars=True looks good.

@drbenvincent
Copy link

Let me know if there's anything else you want me to test. Otherwise I'm happy to approve

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 2, 2023

Let me know if there's anything else you want me to test. Otherwise I'm happy to approve

If you think this covers all the use cases for the blogpost we can merge it (need to rebase once more first).

Edit: Already rebased

@drbenvincent drbenvincent self-requested a review June 2, 2023 19:35
drbenvincent
drbenvincent previously approved these changes Jun 2, 2023
Copy link

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we now have the functionality needed for a blog post. Ideally we get a bit more road testing, and eyeballs from other people, to catch any issues. But moving into the pymc repo reasonably soon would be good.

@ricardoV94
Copy link
Member Author

But moving into the pymc repo reasonably soon would be good.

I'll be honest, I don't want to do that super soon. Not because of the do which is pretty self-contained (although I wouldn't be surprised if we need more tweaks once people try this out in real cases), but because of the fgraph stuff.

I don't get the rush either

@drbenvincent
Copy link

There's some anticipation because it's cool and would be good to get out there. But I agree, if it relies on stuff that is still experimental then there's no need to rush. Getting a blog post out there which calls on pymc-experimental would sate the desire I think.

@ricardoV94
Copy link
Member Author

Tests should now pass again. Need a green review to merge. We can cut a release after

@twiecki
Copy link
Member

twiecki commented Jun 5, 2023

We should make sure to add an example NB / case study and then promote.

@twiecki twiecki merged commit d640232 into pymc-devs:main Jun 5, 2023
@twiecki
Copy link
Member

twiecki commented Jun 5, 2023

Also, congrats @ricardoV94, this is majorly cool new functionality.

@ricardoV94 ricardoV94 changed the title Implement observe and do transformations Implement observe and do model transformations Jun 5, 2023
@ricardoV94 ricardoV94 changed the title Implement observe and do model transformations Implement observe and do model transformations Jun 5, 2023
@ricardoV94 ricardoV94 deleted the do_operation branch July 25, 2023 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancements New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants