LKJ priors influence sampling of unconnected RVs #3641

sammosummo · 2019-10-03T12:18:25Z

Both LKJCorr and LKJCholeskyCov influence the sampling of random variables they are not connected to. This is not expected behavior, right?

The following code produces several figures to illustrate this:

import pymc3 as pm
import matplotlib.pyplot as plt


if __name__ == '__main__':

    with pm.Model():

        a = pm.Normal(name="a")
        trace = pm.sample(10000, tune=2000, chains=2)
        pm.traceplot(trace, compact=True)
        plt.savefig("test_00.png")
        pm.autocorrplot(trace, "a")
        plt.savefig("test_01.png")

    with pm.Model():

        a = pm.Normal(name="a")
        M = pm.LKJCorr(name="C", n=4, eta=1)
        trace = pm.sample(10000, tune=2000, chains=2)
        pm.traceplot(trace, compact=True)
        plt.savefig("test_10.png")
        pm.autocorrplot(trace, "a")
        plt.savefig("test_11.png")

    with pm.Model():

        a = pm.Normal(name="a")
        M = pm.LKJCholeskyCov(name="M", n=4, eta=1, sd_dist=pm.HalfCauchy.dist(1))
        trace = pm.sample(10000, tune=2000, chains=2)
        pm.traceplot(trace, compact=True)
        plt.savefig("test_20.png")
        pm.autocorrplot(trace, "a")
        plt.savefig("test_21.png")

Here they are, in order of creation:

What is causing the obvious pathologies when sampling a in the latter two models if a and M are not connected in any way?

Also, why is the LKJ prior on the correlation matrix in test_1.png not flat? Is it related to https://github.com/pymc-devs/pymc3/issues/3473#issue-442041528?

The text was updated successfully, but these errors were encountered:

junpenglao · 2019-10-03T16:02:02Z

That's because our transformation for LKJ is incorrect (#3473 exactly). In pure forward sampling, this will gives invalid correlation matrix, that interact with the other RVs in unexpected way (recall that all RVs are sampled in the same augmented auxiliary space, and if some RVs are in a geometric space that is difficult/incorrect, it will make NUTS terminated too early and affect the correctness of other RVs)

junpenglao · 2019-10-03T16:02:57Z

LKJCholeskyCov should give you correct result, if you are not using half cauchy in this case.

sammosummo · 2019-10-03T16:05:15Z

LKJCholeskyCov should give you correct result, if you are not using half cauchy in this case.

What is the issue with the half Cauchy? The same thing happens with other choices of sd_dist.

sammosummo · 2019-10-03T16:14:27Z

Confirmed same thing with sd_dist=HalfNormal.dist().

junpenglao · 2019-10-03T19:22:40Z

How can you tell there is problem? There is not divergent sample.

junpenglao · 2019-10-03T19:23:39Z

Note that eyeballing the trace is usually not a good way to diagnose problem

sammosummo · 2019-10-03T19:41:48Z

Note that eyeballing the trace is usually not a good way to diagnose problem

Perhaps not in general, but there is clearly a pathology at ~1500 samples and ~2 in the posterior density of M, and another smaller one at ~7000 samples. There is also more autocorrelation in a, especially in chain 0.

How can you tell there is problem? There is not divergent sample.

Just ran it for a second time:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [M, a]
Sampling 2 chains: 100%|██████████| 24000/24000 [01:01<00:00, 388.15draws/s]
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
The estimated number of effective samples is smaller than 200 for some parameters.

And a third:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [M, a]
Sampling 2 chains: 100%|██████████| 24000/24000 [00:44<00:00, 534.12draws/s]
The acceptance probability does not match the target. It is 0.6687596986054827, but should be close to 0.8. Try to increase the number of tuning steps.
The number of effective samples is smaller than 10% for some parameters.

There is not always a divergence but there is always a problem. There are usually more of them with smaller chains and less tuning.

Here is the figure from the third run. Again, clearly, a massive pathology.

junpenglao · 2019-10-03T19:50:20Z

Usually the prior distribution is in a much larger range, which makes the tail area much more pathological. For this reason we usually recommend using prior_sample for sampling from model with no observed.

junpenglao · 2019-10-06T08:38:45Z

Closing this for now but feel free to keep discussing it here.

junpenglao closed this as completed Oct 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LKJ priors influence sampling of unconnected RVs #3641

LKJ priors influence sampling of unconnected RVs #3641

sammosummo commented Oct 3, 2019 •

edited

Loading

junpenglao commented Oct 3, 2019

junpenglao commented Oct 3, 2019

sammosummo commented Oct 3, 2019

sammosummo commented Oct 3, 2019

junpenglao commented Oct 3, 2019

junpenglao commented Oct 3, 2019

sammosummo commented Oct 3, 2019 •

edited

Loading

junpenglao commented Oct 3, 2019

junpenglao commented Oct 6, 2019

LKJ priors influence sampling of unconnected RVs #3641

LKJ priors influence sampling of unconnected RVs #3641

Comments

sammosummo commented Oct 3, 2019 • edited Loading

junpenglao commented Oct 3, 2019

junpenglao commented Oct 3, 2019

sammosummo commented Oct 3, 2019

sammosummo commented Oct 3, 2019

junpenglao commented Oct 3, 2019

junpenglao commented Oct 3, 2019

sammosummo commented Oct 3, 2019 • edited Loading

junpenglao commented Oct 3, 2019

junpenglao commented Oct 6, 2019

sammosummo commented Oct 3, 2019 •

edited

Loading

sammosummo commented Oct 3, 2019 •

edited

Loading