Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting prior, prior predictive, posterior into one inferencedata for hierarchical sbc #12

Open
hyunjimoon opened this issue Nov 4, 2022 · 4 comments

Comments

@hyunjimoon
Copy link
Contributor

hyunjimoon commented Nov 4, 2022

Two large tasks, discussed with @Dashadower.

draws_data_mapping wise

To implement this, first apply this arviz sbc_cmdstanpy by @OriolAbril to stanify for S3R1 case. From commit 3b65e77, I marked three parts for code injection. Then try S3R2 case.

With current implementation (before arviz wrapper code injection), S3R1 runs, creating one generator.nc and three estimator.nc files.

However, S3R2 doesn't run with the following error which requires in-depth exploration for size of each data.

Traceback (most recent call last):
  File "/Users/hyunjimoon/Dropbox/stanify/prey_predator.py", line 35, in <module>
    draws2data2draws('vensim_models/prey_predator/prey_predator.mdl', setting, numeric, prior, S, M, N, R)
  File "/Users/hyunjimoon/Dropbox/stanify/stanify/calibrator/draws_data_mapper.py", line 168, in draws2data2draws
    obs_dict = {k: v.values.flatten().reshape((model.n_t, R)) for (k, v) in obs_s_xr[setting_obs].items()} #xr.concat([ posterior_lst])
  File "/Users/hyunjimoon/Dropbox/stanify/stanify/calibrator/draws_data_mapper.py", line 168, in <dictcomp>
    obs_dict = {k: v.values.flatten().reshape((model.n_t, R)) for (k, v) in obs_s_xr[setting_obs].items()} #xr.concat([ posterior_lst])
ValueError: cannot reshape array of size 200 into shape (200,2)

builder (auto stan file generator) wise

update for three parts:

  1. draws2data gq block
    from
real pred_birth_frac = normal_rng(0.05, 0.005);

to

 for (region in 1:r)
        pred_birth_frac[region] = normal_rng(0.05, 0.005);
  1. data2draws parameters block
    from
real<lower=0> pred_birth_frac;

to

array[region] real<lower=0> pred_birth_frac;
  1. data2draws model block
    from
pred_birth_frac ~ normal(0.05, 0.005);

to

    for (region in 1:r)
        pred_birth_frac[region] ~ normal(0.05, 0.005);
@hyunjimoon
Copy link
Contributor Author

hyunjimoon commented Nov 4, 2022

Niko's monster model with adaptive precision, this line uses ode_bdf where we can learn the output format for integrated result.

I also wonder the reason for using ragged array from ragged.stan file which I questioned in discourse.

@hyunjimoon
Copy link
Contributor Author

hyunjimoon commented Nov 9, 2022

Without hierarchy, integrated_result was able to be declared and initialized at the same time. With hierarchy as for loop for region is coming in which forces it to be pre-declared than value is assigned from ode function.

Before I made the change, I faced identifier integrate_result not in scope error which is not informative error message.

    array[n_t, r] vector[3] integrated_result;
    for (region in 1:r){
        integrated_result[:, region, :]  = ode_rk45(vensim_ode_func, initial_outcome, initial_time, times, prey_birth_frac, pred_birth_frac[region], time_step, process_noise_scale);
    }
    array[n_t] vector [r] prey = integrated_result[1];
    array[n_t] vector [r] process_noise = integrated_result[ 2];
    array[n_t] vector [r] predator = integrated_result[3];

@hyunjimoon
Copy link
Contributor Author

hyunjimoon commented Nov 15, 2022

Treatment of hierarchical and SBC multi S should be the same.

Questions from sbc-cmdstanpy

  • cause of scale parameter 0; need for a zero avoiding prior?
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/home/oriol/Public/arviz/doc/source/user_guide/wrappers/linreg_model.stan', line 27, column 2 to column 64)
  • benefit of write_stan_json? Faster? (I am using dict, and "Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format." from cmdstanpy).
  • Rdump?

prior_fit = prior_model.sample(data="linreg_prior.data.json", chains=2, iter_sampling=100)

idata_kwargs = dict(
    prior_predictive=["slack_comments_hat","github_commits_hat"],
    posterior_predictive=["slack_comments_hat","github_commits_hat"],
    log_likelihood={
        "slack_comments": "log_likelihood_slack_comments",
        "github_commits": "log_likelihood_github_commits"
    },
    coords={"developer": names},
    dims={
        "slack_comments": ["developer"],
        "github_commits" : ["developer"],
        "slack_comments_hat": ["developer"],
        "github_commits_hat": ["developer"],
        "time_since_joined": ["developer"],
    }
)
idata_orig = az.from_cmdstanpy(
    prior=prior_fit, **idata_kwargs
).stack(prior_draw=["chain", "draw"], groups="prior_groups")
  • why MultiIndex? (multi means what and what?)
prior
Dimensions:      (prior_draw: 200)
Coordinates:
  * prior_draw   (prior_draw) object MultiIndex
  * chain        (prior_draw) int64 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1
  * draw         (prior_draw) int64 0 1 2 3 4 5 6 7 ... 92 93 94 95 96 97 98 99
Data variables:
    b0           (prior_draw) float64 104.4 -239.1 -249.9 ... 222.8 -46.17
    b1           (prior_draw) float64 116.8 -175.4 -71.92 ... -162.4 7.715 274.3
    log_b_sigma  (prior_draw) float64 6.163 6.122 5.205 ... 3.633 6.172 6.076
    b_sigma      (prior_draw) float64 474.6 455.6 182.3 ... 37.84 479.3 435.3
    c0           (prior_draw) float64 -2.87 -6.738 11.82 ... -10.68 -11.76
    c1           (prior_draw) float64 18.08 -7.793 4.934 ... -2.836 -14.52
    log_c_sigma  (prior_draw) float64 6.163 6.122 5.205 ... 3.633 6.172 6.076
    c_sigma      (prior_draw) float64 9.529 6.93 6.605 ... 1.642 4.487 3.449

prior_predictive
Dimensions:             (developer: 5, prior_draw: 200)
Coordinates:
  * developer           (developer) object 'Alice' 'Bob' ... 'Danielle' 'Erika'
  * prior_draw          (prior_draw) object MultiIndex
  * chain               (prior_draw) int64 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
  * draw                (prior_draw) int64 0 1 2 3 4 5 6 ... 94 95 96 97 98 99
Data variables:
    slack_comments_hat  (developer, prior_draw) float64 701.6 ... 4.755e+03
    github_commits_hat  (developer, prior_draw) float64 82.57 -39.29 ... -270.4

sample_stats_prior
Dimensions:          (prior_draw: 200)
Coordinates:
  * prior_draw       (prior_draw) object MultiIndex
  * chain            (prior_draw) int64 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1
  * draw             (prior_draw) int64 0 1 2 3 4 5 6 7 ... 93 94 95 96 97 98 99
Data variables:
    lp               (prior_draw) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    acceptance_rate  (prior_draw) float64 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
  • for next (week, month) meeting, need help in
  1. loglik-based plots
  2. rvar support from posterior? Mike's comment

The nc-to-rvars conversion is done by the class_composition R6 class here, specifically the nc_to_rvar method here

  1. plots for ode model (e.g. degeneracy diagnosis for prey-predator and inventory model)
  • funnel only from hierarchical model? (trying to introduce hierarchical prey-predator) + funnel degeneracy is included in..?
    a. multimodality of hierarchical prey-predator from this issue
    image

This adds 4 regions with variety in the alpha/beta/gamma/delta parameters, loosely structured after https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html that @hyunjimoon linked. Part of the intent was to create the opportunity for funnels. I haven't run it much yet, but I think it does show evidence of multimodality, or of pathologies that cause Powell to get stuck occasionally. I haven't tested MCMC either, but will get there soon.

b. how to set # of estimated parameters in ode model? inventory model from this issue

  1. how often do you use diagnose() function, and does arviz provide plot which use the output of diagnose()?
02:49:23 - cmdstanpy - WARNING - Some chains may have failed to converge.
	Chain 1 had 48 divergent transitions (96.0%)
	Chain 2 had 47 divergent transitions (94.0%)
	Use function "diagnose()" to see further information.

@hyunjimoon hyunjimoon changed the title Making hierarchy Putting prior, prior predictive, posterior into one inferencedata for hierarchical sbc Nov 20, 2022
@hyunjimoon
Copy link
Contributor Author

One example of prior predictive (synthetic data generated from s'th prior draw) are as follows, on which posterior is fit.

Dimensions:             (predator_obs_dim_0: 200, prey_obs_dim_0: 200)
Coordinates:
  * predator_obs_dim_0  (predator_obs_dim_0) int64 0 1 2 3 4 ... 196 197 198 199
  * prey_obs_dim_0      (prey_obs_dim_0) int64 0 1 2 3 4 ... 195 196 197 198 199
    prior_draw          object (0, 0)
    chain               int64 0
    draw                int64 0
Data variables:
    predator_obs        (predator_obs_dim_0) float64 4.027 4.049 ... 8.142 7.973
    prey_obs            (prey_obs_dim_0) float64 30.16 30.63 ... 6.133 6.205
Attributes: (4)

Error I am facing is:

ValueError: different number of dimensions on data and dims: 3 vs 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant