Implement `observe` and `do` model transformations #168

ricardoV94 · 2023-05-17T13:57:05Z

import pymc as pm
from pymc_experimental.model_transform.conditioning import do

with pm.Model() as m:
    x = pm.Normal("x", 0, 1)
    y = pm.Normal("y", x, 1)
    z = pm.Normal("z", y + x, 1)

# Dummy posterior, same as calling `pm.sample`
idata_m = az.from_dict({rv.name: [pm.draw(rv, draws=500)] for rv in [x, y, z]})

# Replace `y` by a constant `100.0`
m_do = do(m, {y: 100.0})
with m_do:
    idata_do = pm.sample_posterior_predictive(idata_m, var_names="z")

pymc_experimental/model_transform/conditioning.py

pymc_experimental/tests/model_transform/test_conditioning.py

lucianopaz

This looks fine to me. I’d only add more expansions and examples to the docstrings.

By the way, what do you think about adding the inverse operations? Something like unobserve? I don’t think that a do actually has a well defined inverse. People could use do again to get the original model back though

pymc_experimental/model_transform/conditioning.py

ricardoV94 · 2023-05-22T13:27:36Z

Something like unobserve?

Yeah I thought about it... but I am not sure what an unobserved variable should be. A Deterministic? A FreeRV?

I don’t think that a do actually has a well defined inverse. People could use do again to get the original model back though

Yeah that can't have an inverse because it's just a constant. The user would have to tell us what RV to replace it with and whether it's free, an observed or even a deterministic. Given the flexibility I think it should have it's own name? replace_by_var?

I think I would focus on just the two transforms we have in this PR for now. We still have to see if the approach is even useful in real applications. If it is, we can come back and expand with the remaining subspace of model transformations.

drbenvincent

Once you get your new (mutilated) model back from pm.do, what's the next step? Calling pm.sample_posterior_predictive? What would happen if you call pm.sample_prior_predictive(), would you get the same result? It might be useful to include that step as an example in the docstring maybe also a test?

ricardoV94 · 2023-05-26T16:02:18Z

Once you get your new (mutilated) model back from pm.do, what's the next step? Calling pm.sample_posterior_predictive? What would happen if you call pm.sample_prior_predictive(), would you get the same result? It might be useful to include that step as an example in the docstring maybe also a test?

Yes, most common use case would be to call sample_posterior_predictive afterwards.
If you call sample_prior_predictive you ignore everything you learned about the parameters that are still in the model.
I'll add the example in the docstring.

pymc_experimental/model_transform/conditioning.py

pymc_experimental/tests/model_transform/test_conditioning.py

ricardoV94 · 2023-05-30T11:38:28Z

Tests are passing

drbenvincent

As far as I understand, at the moment the do operator only operates in the situation where you want to replace a random variable with observed data. This is fine, but this is only one use-case. In this case, a user would implement the do operator with pm.do() then pm.sample_posterior_predictive

But if you take the potential outcomes approach to confounder adjustment OR take the SCM approach with do-calculus to calculate an adjustment set (with backdoor criterion), then in both cases you basically end up with a linear regression where you enter in the variables you decide to condition upon. In this situation, these are defined in the model as pm.MutableData. In this case, a user should implement the do operator with pm.set_data() then pm.sample_posterior_predictive

From an implementation point of view, I can see that we might want different functions to implement these different things (ie. replace observed with observed vs replace RV with observed). But from a user-facing point of view, they could see it as frustrating that they have to remember which they have to use (pm.set_data or pm.do) when in both cases they want to "do".

My proposal would be along these lines:

pm.do check to see if the target node(s) are data or RV's.
If they are data, then you could either get a friendly error message telling you to use pm.set_data, or (ideally) it would call pm.set_data
If they are RV's, then they carry on and do the currently implemented graph manipulation

drbenvincent · 2023-06-01T11:13:09Z

It could be better if the newly injected ConstantData node could inherit the dims from the RV that it replaces
I'd also vote for fully removing the parent nodes from any nodes that have been intervened on to make the graphviz simpler. See pics below

Example before

Example after

ricardoV94 · 2023-06-01T11:32:33Z

As far as I understand, at the moment the do operator only operates in the situation where you want to replace a random variable with observed data.

This is not the case. You can replace the variables by anything you want (as long as the variables have the same type as the thing that is being replaced). Check the test where we replace two variables by an expression with a shared variable that acts as a switch:

https://github.com/pymc-devs/pymc-experimental/blob/ff32a66d1e2eaf96493cc7c29e99da029320e9f0/pymc_experimental/tests/model_transform/test_conditioning.py#L108-L110

I just mentioned you didn't have to, but you can certainly replace constant data by other constant data if you want to use the same method

ricardoV94 · 2023-06-01T11:35:59Z

It could be better if the newly injected ConstantData node could inherit the dims from the RV that it replaces

That was actually supposed to work. Gonna try and fix it

https://github.com/pymc-devs/pymc-experimental/blob/ff32a66d1e2eaf96493cc7c29e99da029320e9f0/pymc_experimental/model_transform/conditioning.py#L176-L177

drbenvincent · 2023-06-02T12:46:54Z

I tried it out.
BEFORE

AFTER

model_control = do(model_scm, {"z": np.zeros(N, dtype='int32')}, prune_vars=True)

Only thing that looks like it might be an issue is the status of y_data has changed.

ricardoV94 · 2023-06-02T12:50:55Z

Only thing that looks like it might be an issue is the status of y_data has changed.

Looks like a bug. Slowly but surely we're getting there xD

Does it also change status when prune_vars=False?

drbenvincent · 2023-06-02T12:52:46Z

Does it also change status when prune_vars=False?

No. Only when prune_vars=True

twiecki · 2023-06-02T15:29:03Z

Given that we've now tested this quite a bit, shouldn't we just put this into pymc proper?

ricardoV94 · 2023-06-02T17:36:20Z

Given that we've now tested this quite a bit, shouldn't we just put this into pymc proper?

I would say no.

The underlying functionality (model->fgraph) was changed like 20x in the course of this PR, and it really helps to be able to break it and start from scratch without worries about breaking user compat.

ricardoV94 · 2023-06-02T17:37:25Z

No. Only when prune_vars=True

I think it's fixed now!

drbenvincent · 2023-06-02T18:25:30Z

I think it's fixed now!

Yes - certainly for the examples I was looking at the mutilated graph with prune_vars=True looks good.

drbenvincent · 2023-06-02T19:06:06Z

Let me know if there's anything else you want me to test. Otherwise I'm happy to approve

ricardoV94 · 2023-06-02T19:09:12Z

Let me know if there's anything else you want me to test. Otherwise I'm happy to approve

If you think this covers all the use cases for the blogpost we can merge it (need to rebase once more first).

Edit: Already rebased

drbenvincent

I believe we now have the functionality needed for a blog post. Ideally we get a bit more road testing, and eyeballs from other people, to catch any issues. But moving into the pymc repo reasonably soon would be good.

ricardoV94 · 2023-06-02T20:30:39Z

But moving into the pymc repo reasonably soon would be good.

I'll be honest, I don't want to do that super soon. Not because of the do which is pretty self-contained (although I wouldn't be surprised if we need more tweaks once people try this out in real cases), but because of the fgraph stuff.

I don't get the rush either

drbenvincent · 2023-06-02T21:01:45Z

There's some anticipation because it's cool and would be good to get out there. But I agree, if it relies on stuff that is still experimental then there's no need to rush. Getting a blog post out there which calls on pymc-experimental would sate the desire I think.

ricardoV94 · 2023-06-05T08:04:27Z

Tests should now pass again. Need a green review to merge. We can cut a release after

twiecki · 2023-06-05T09:03:01Z

We should make sure to add an example NB / case study and then promote.

twiecki · 2023-06-05T09:03:25Z

Also, congrats @ricardoV94, this is majorly cool new functionality.

ricardoV94 mentioned this pull request May 17, 2023

Rename _replace_rvs_in_graphs and fix bug when replacing input pymc-devs/pymc#6720

Merged

ricardoV94 marked this pull request as draft May 17, 2023 13:58

ricardoV94 requested review from lucianopaz and drbenvincent May 17, 2023 13:58

ricardoV94 added the enhancements New feature or request label May 17, 2023

ricardoV94 force-pushed the do_operation branch 2 times, most recently from d1a5853 to 4ef10c9 Compare May 17, 2023 14:15

twiecki reviewed May 18, 2023

View reviewed changes

pymc_experimental/model_transform/conditioning.py Outdated Show resolved Hide resolved

twiecki reviewed May 18, 2023

View reviewed changes

pymc_experimental/tests/model_transform/test_conditioning.py Show resolved Hide resolved

lucianopaz reviewed May 22, 2023

View reviewed changes

pymc_experimental/model_transform/conditioning.py Outdated Show resolved Hide resolved

pymc_experimental/model_transform/conditioning.py Outdated Show resolved Hide resolved

ricardoV94 force-pushed the do_operation branch from 4ef10c9 to 2f7931b Compare May 22, 2023 13:54

drbenvincent reviewed May 25, 2023

View reviewed changes

drbenvincent reviewed May 28, 2023

View reviewed changes

pymc_experimental/model_transform/conditioning.py Show resolved Hide resolved

pymc_experimental/model_transform/conditioning.py Outdated Show resolved Hide resolved

pymc_experimental/tests/model_transform/test_conditioning.py Show resolved Hide resolved

ricardoV94 force-pushed the do_operation branch 2 times, most recently from 1b47247 to 0f56fe1 Compare May 29, 2023 13:21

ricardoV94 mentioned this pull request May 29, 2023

Ignore named variables that are not traceable in get_vars_in_point_list pymc-devs/pymc#6741

Merged

ricardoV94 force-pushed the do_operation branch 3 times, most recently from 1a0eedf to ffd21e7 Compare May 30, 2023 11:10

ricardoV94 marked this pull request as ready for review May 30, 2023 11:38

ricardoV94 requested review from drbenvincent and lucianopaz May 30, 2023 11:38

ricardoV94 force-pushed the do_operation branch from ffd21e7 to ff32a66 Compare May 30, 2023 16:36

drbenvincent reviewed Jun 1, 2023

View reviewed changes

ricardoV94 force-pushed the do_operation branch from eda61e7 to 9bf74b0 Compare June 2, 2023 17:33

ricardoV94 force-pushed the do_operation branch from 9bf74b0 to 6b9f87b Compare June 2, 2023 19:18

drbenvincent self-requested a review June 2, 2023 19:35

drbenvincent previously approved these changes Jun 2, 2023

View reviewed changes

ricardoV94 dismissed drbenvincent’s stale review via 14d37ec June 2, 2023 20:36

ricardoV94 force-pushed the do_operation branch from 6b9f87b to 14d37ec Compare June 2, 2023 20:37

ricardoV94 added 4 commits June 5, 2023 10:02

Make fgraph Deterministic conversion logic more robust

1262a44

Allow inlining of Deterministics and Data in fgraph IR

5304406

Implement observe and do model transformations

52706e8

Add option to prune variables after do intervention

4deb91a

ricardoV94 force-pushed the do_operation branch from 14d37ec to 4deb91a Compare June 5, 2023 08:03

twiecki approved these changes Jun 5, 2023

View reviewed changes

twiecki merged commit d640232 into pymc-devs:main Jun 5, 2023

ricardoV94 changed the title ~~Implement observe and do transformations~~ Implement observe and do model transformations Jun 5, 2023

ricardoV94 changed the title ~~Implement observe and do model transformations~~ Implement observe and do model transformations Jun 5, 2023

ricardoV94 deleted the do_operation branch July 25, 2023 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `observe` and `do` model transformations #168

Implement `observe` and `do` model transformations #168

ricardoV94 commented May 17, 2023 •

edited

Loading

lucianopaz left a comment •

edited

Loading

ricardoV94 commented May 22, 2023 •

edited

Loading

drbenvincent left a comment

ricardoV94 commented May 26, 2023 •

edited

Loading

ricardoV94 commented May 30, 2023

drbenvincent left a comment •

edited

Loading

drbenvincent commented Jun 1, 2023

ricardoV94 commented Jun 1, 2023 •

edited

Loading

ricardoV94 commented Jun 1, 2023 •

edited

Loading

drbenvincent commented Jun 2, 2023 •

edited

Loading

ricardoV94 commented Jun 2, 2023 •

edited

Loading

drbenvincent commented Jun 2, 2023

twiecki commented Jun 2, 2023

ricardoV94 commented Jun 2, 2023 •

edited

Loading

ricardoV94 commented Jun 2, 2023

drbenvincent commented Jun 2, 2023

drbenvincent commented Jun 2, 2023

ricardoV94 commented Jun 2, 2023 •

edited

Loading

drbenvincent left a comment

ricardoV94 commented Jun 2, 2023

drbenvincent commented Jun 2, 2023

ricardoV94 commented Jun 5, 2023

twiecki commented Jun 5, 2023

twiecki commented Jun 5, 2023

Implement observe and do model transformations #168

Implement observe and do model transformations #168

Conversation

ricardoV94 commented May 17, 2023 • edited Loading

lucianopaz left a comment • edited Loading

Choose a reason for hiding this comment

ricardoV94 commented May 22, 2023 • edited Loading

drbenvincent left a comment

Choose a reason for hiding this comment

ricardoV94 commented May 26, 2023 • edited Loading

ricardoV94 commented May 30, 2023

drbenvincent left a comment • edited Loading

Choose a reason for hiding this comment

drbenvincent commented Jun 1, 2023

ricardoV94 commented Jun 1, 2023 • edited Loading

ricardoV94 commented Jun 1, 2023 • edited Loading

drbenvincent commented Jun 2, 2023 • edited Loading

ricardoV94 commented Jun 2, 2023 • edited Loading

drbenvincent commented Jun 2, 2023

twiecki commented Jun 2, 2023

ricardoV94 commented Jun 2, 2023 • edited Loading

ricardoV94 commented Jun 2, 2023

drbenvincent commented Jun 2, 2023

drbenvincent commented Jun 2, 2023

ricardoV94 commented Jun 2, 2023 • edited Loading

drbenvincent left a comment

Choose a reason for hiding this comment

ricardoV94 commented Jun 2, 2023

drbenvincent commented Jun 2, 2023

ricardoV94 commented Jun 5, 2023

twiecki commented Jun 5, 2023

twiecki commented Jun 5, 2023

Implement `observe` and `do` model transformations #168

Implement `observe` and `do` model transformations #168

ricardoV94 commented May 17, 2023 •

edited

Loading

lucianopaz left a comment •

edited

Loading

ricardoV94 commented May 22, 2023 •

edited

Loading

ricardoV94 commented May 26, 2023 •

edited

Loading

drbenvincent left a comment •

edited

Loading

ricardoV94 commented Jun 1, 2023 •

edited

Loading

ricardoV94 commented Jun 1, 2023 •

edited

Loading

drbenvincent commented Jun 2, 2023 •

edited

Loading

ricardoV94 commented Jun 2, 2023 •

edited

Loading

ricardoV94 commented Jun 2, 2023 •

edited

Loading

ricardoV94 commented Jun 2, 2023 •

edited

Loading