Add alternative default priors #360

tomicapretto · 2021-06-21T12:51:51Z

This PR aims to add alternative default priors that will be the default priors for those models where we lack statsmodels support. This new prior does not replace existing defaults (until we've evidence they're equivalent). I will be updating the following list of changes as I commit to this PR.

Changes

Remove Model._match_derived_terms(). This was unused because categorical group-specific terms are indeed one term in the model and not several terms (as many as dummies in the encoding of the categorical variable) as it used to be before.
Constant terms (categoricals with one level and numerics with a unique value) are flagged more appropriately.
Splitted PriorFactory._get_prior() into several methods with clearer names and goals. Also modified config.json as proposed in Use objects instead of arrays in the config.json of the priors #361.
Prior._auto_scale is now Prior.auto_scale. It's bothering to add pylint exceptions all the time.
Nuisance parameters of the response distribution are scaled with the prior scaler and not when the prior term is added.
Added alternative automatic priors (inspired on rstanarm priors). See PriorScaler2.
Our tests should faster because I removed unnecessary Model.build() calls.
Family names and priors are checked slightly differently, which makes the code simpler.
Removed methods and attributes that were used when we had multiple backends. Now that we only have PyMC3, it does not make sense to keep asking which backend is being used.
Added more tests.
Model has a new argument, priors_cor. It accepts dictionaries where keys are the names of the groups, and values are the eta parameter in the LKJ distribution for correlation matrices. If such a dictionary is present, priors for group-specific terms are a multivariate normal distribution with a non-zero correlation.

codecov-commenter · 2021-06-21T13:30:49Z

Codecov Report

Merging #360 (f77d302) into master (c0ca107) will increase coverage by 1.33%.
The diff coverage is 83.24%.

@@            Coverage Diff             @@
##           master     #360      +/-   ##
==========================================
+ Coverage   88.87%   90.20%   +1.33%     
==========================================
  Files          16       17       +1     
  Lines        1411     1613     +202     
==========================================
+ Hits         1254     1455     +201     
- Misses        157      158       +1

Impacted Files	Coverage Δ
bambi/backends/pymc.py	`79.78% <56.00%> (-15.15%)`	⬇️
bambi/models.py	`84.01% <64.63%> (+12.53%)`	⬆️
bambi/priors/scaler_mle.py	`75.33% <92.30%> (ø)`
bambi/priors/scaler_default.py	`95.16% <95.16%> (ø)`
bambi/priors/__init__.py	`100.00% <100.00%> (ø)`
bambi/priors/priors.py	`92.85% <100.00%> (+5.47%)`	⬆️
bambi/terms.py	`100.00% <100.00%> (+4.47%)`	⬆️
bambi/tests/test_built_models.py	`90.86% <100.00%> (+0.71%)`	⬆️
bambi/tests/test_model_construction.py	`100.00% <100.00%> (+5.76%)`	⬆️
bambi/tests/test_priors.py	`100.00% <100.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0ca107...f77d302. Read the comment docs.

…with machinery to use correlated priors

…essions in group-specific terms

tomicapretto · 2021-06-28T14:59:41Z

I think this is ready for a review @aloctavodia, @canyon289, @twiecki. I know there's a lot going on in this PR and many things may not be that clear. Please ask as many questions as you want.

The TL;DR of this PR would be

We can now use LKJ prior for the correlation matrix of the prior of the group-specific terms.
We have an alternative method to compute default priors inspired on rstanarm priors. These are going to be used in the coming implementations for t family, beta family, etc.

aloctavodia

A few small comments, overall seems good. Many small details that improve readability also some changes that I need to read with more care to understand them. I will try to keep reading tomorrow.

bambi/backends/pymc.py

aloctavodia · 2021-07-01T10:47:59Z

bambi/backends/pymc.py

+    sigma = pm.HalfNormal.dist(sigma=sigma, shape=rows)
+
+    # Obtain Cholesky factor for the covariance
+    lkj_decomp, corr, sigma = pm.LKJCholeskyCov(  # pylint: disable=unused-variable


You are not using corr, right? then do :

lkj_decomp, _, sigma

also can we use a different name for the returned sigma and the input sigma?

I would like to use corr in the nearby future. I've been thinking we should report it even when independent priors are used. That's why it's there.

But in the meantime, I have no problem if you think the underscore is more appropriate

And sigma... Well, they represent the same random variable in the model. The problem is the first one is a .dist, so I have to recover the one returned by lkjcholeskycov and add it to the trace.

I don't know if there are plans in pymc3 to allow a random variable in lkjcholeskycov, that would be the best solution I think

I think this is done in this PR

aloctavodia · 2021-07-01T10:57:28Z

bambi/models.py

+    automatic_priors: str
+        An optional specification to compute/scale automatic priors. ``"default"`` means to use
+        Bambi's default method. ``"rstanarm"`` means to use default priors from the R rstanarm
+        library. The latter are available in more scenarios because they don't depend on MLE.


Can we detect when default Bambi priors fail and switch to rstanarm's priors?

What is the advantage of keep using Bambi defaults if they are more restricted?

I'm going to change the defaults for the rstanarm inspired priors. This will make it simpler to implement t and beta families.

canyon289 · 2021-07-04T04:04:36Z

bambi/backends/pymc.py

+
+
+def add_lkj(terms, eta=1):
+    # Parameters


I would make this a full comment in numpy style, that way it shows up under add_lkj.__doc__

Makes sense!

canyon289 · 2021-07-04T04:06:25Z

bambi/priors/scaler_mle.py

+            if self.model.family.name == "gaussian":
+                sigma = np.std(self.model.response.data)
+                self.model.response.prior.update(sigma=Prior("HalfStudentT", nu=4, sigma=sigma))
+            # Add cases for other families


Whats this comment for?

For example, the Gamma family has an auxiliary parameter alpha whose prior is HalfCauchy(beta=1). I left the comment to remember that it might be good to consider updating beta to another value based on the data, as with the sigma in the HalfStudentT prior above.

canyon289 · 2021-07-04T04:07:05Z

bambi/priors/scaler_mle.py

            # Convert scale names to floats
            if isinstance(term.prior.scale, str):
                term.prior.scale = self.names[term.prior.scale]

+            if self.mle is None:
+                self.fit_mle()
+
            # Scale it


expand it to full name, makes comment easier to read in isolation

What do you mean with full name? Something like self.fit_maximum_likeliihood_estimator? Or is it about the comment below?

canyon289 · 2021-07-04T04:08:56Z

bambi/utils.py

    elif isinstance(family, Family):
        # Only work if there are nuisance parameters in the family, and if any of these nuisance
        # parameters is present in 'priors' dictionary.
        nuisance_params = [k for k in family.prior.args if k not in ["observed", family.parent]]
        if set(nuisance_params).intersection(set(priors)):
-            return {k: priors[k] for k in nuisance_params if k in priors}
+            return {k: priors.pop(k) for k in nuisance_params if k in priors}
    return None


What happens if none is returned to downstream call?

Nothing. When None is returned it means the user didn't pass any prior for any parameter in the response distribution and there's no need to update them.

canyon289

Short review. Will do more indepth review in next 24 hours

canyon289 · 2021-07-04T04:11:33Z

I think this is ready for a review @aloctavodia, @canyon289, @twiecki. I know there's a lot going on in this PR and many things may not be that clear. Please ask as many questions as you want.

The TL;DR of this PR would be
* We can now use LKJ prior for the correlation matrix of the prior of the group-specific terms.

* We have an alternative method to compute default priors inspired on rstanarm priors. These are going to be used in the coming implementations for t family, beta family, etc.

Im sorry for missing this @mention. I get so many github emails things get lost. If Im not responding to a PR fast enough message me on slack, or even proactively message. That reduces chances I'll miss PRs tremendously

tomicapretto · 2021-07-05T12:59:58Z

I think this is ready for a review @aloctavodia, @canyon289, @twiecki. I know there's a lot going on in this PR and many things may not be that clear. Please ask as many questions as you want.
The TL;DR of this PR would be
* We can now use LKJ prior for the correlation matrix of the prior of the group-specific terms.

* We have an alternative method to compute default priors inspired on rstanarm priors. These are going to be used in the coming implementations for t family, beta family, etc.
Im sorry for missing this @mention. I get so many github emails things get lost. If Im not responding to a PR fast enough message me on slack, or even proactively message. That reduces chances I'll miss PRs tremendously

Thanks for the feedback Ravin. I'll message on Slack next time then! :D

canyon289 · 2021-07-05T13:28:43Z

I also missed more in depth review but will get it in this week :(

tomicapretto · 2021-07-05T22:24:51Z

I've just realized this PR will close #320 since we're changing default priors

tomicapretto added 2 commits June 21, 2021 09:44

Remove match_derived_terms

c3b5023

Improve how we flag constant terms

165b214

tomicapretto added 17 commits June 21, 2021 12:30

black

815dc01

Refactor PriorFactory and config.json

88d2772

Add tests to increase coverage

33a2c02

Add alternative priors. Working for common terms

3d19983

Fix prior preparation

76af2d2

Fixes and speed up some tests

5835fd9

Minor improvements on how we set family priors and links

df20807

Add several tests to increase coverage

7f9b1ed

more tests

1a5c9c8

fix test

2615005

Try to install graphviz in workflow

cd85100

Add test for categorical interactions in group specific terms

e9ac19a

Terms are added to PyMC3 model slightly differently

ba562f5

Alternative scaler now scales group-specific terms too. Also started …

6a22b46

…with machinery to use correlated priors

Correlated priors workinggit status At least for one test case :')

66cc991

Priors for correlation matrices of group-specific terms

ccf6675

Prior for correlation matrices are also working with categorical expr…

32b817e

…essions in group-specific terms

tomicapretto changed the title ~~[WIP] Add alternative default priors~~ Add alternative default priors Jun 28, 2021

Enrich model print method

fdb8dfa

aloctavodia approved these changes Jul 1, 2021

View reviewed changes

tomicapretto added 5 commits July 2, 2021 11:49

sigma prior in gaussian models is halftstudent and not exponential

0c33286

Change default priors

90cb24c

little bug fix and update test

846f06f

pylint?

a0b8bd0

fix a test.. temporary

a22050d

canyon289 reviewed Jul 4, 2021

View reviewed changes

Docs

f77d302

tomicapretto mentioned this pull request Jul 5, 2021

Rethink our Prior class and how to work with priors in general #365

Open

tomicapretto merged commit edb3fe8 into bambinos:master Jul 12, 2021

tomicapretto deleted the priors branch July 13, 2021 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add alternative default priors #360

Add alternative default priors #360

tomicapretto commented Jun 21, 2021 •

edited

Loading

codecov-commenter commented Jun 21, 2021 •

edited

Loading

tomicapretto commented Jun 28, 2021

aloctavodia left a comment

aloctavodia Jul 1, 2021

aloctavodia Jul 1, 2021

tomicapretto Jul 1, 2021

tomicapretto Jul 1, 2021

tomicapretto Jul 2, 2021

aloctavodia Jul 1, 2021

tomicapretto Jul 2, 2021

canyon289 Jul 4, 2021

tomicapretto Jul 5, 2021

canyon289 Jul 4, 2021

tomicapretto Jul 5, 2021

canyon289 Jul 4, 2021

tomicapretto Jul 5, 2021

canyon289 Jul 4, 2021

tomicapretto Jul 5, 2021

canyon289 left a comment

canyon289 commented Jul 4, 2021

tomicapretto commented Jul 5, 2021

canyon289 commented Jul 5, 2021

tomicapretto commented Jul 5, 2021

Add alternative default priors #360

Add alternative default priors #360

Conversation

tomicapretto commented Jun 21, 2021 • edited Loading

codecov-commenter commented Jun 21, 2021 • edited Loading

Codecov Report

tomicapretto commented Jun 28, 2021

aloctavodia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

canyon289 left a comment

Choose a reason for hiding this comment

canyon289 commented Jul 4, 2021

tomicapretto commented Jul 5, 2021

canyon289 commented Jul 5, 2021

tomicapretto commented Jul 5, 2021

tomicapretto commented Jun 21, 2021 •

edited

Loading

codecov-commenter commented Jun 21, 2021 •

edited

Loading