Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider defaulting to mh posterior_samples arguement rather than gaussian. #95

Open
Tracked by #42
jonathonmellor opened this issue Dec 3, 2024 · 8 comments
Open
Tracked by #42
Labels

Comments

@jonathonmellor
Copy link

[disclosure] If you've been thinking about this already please ignore! But it's one of those things that has bitten us in the past so raising an issue to discuss.

Splines have challenging extrapolation properties when informed by limited data, this problem is exacerbated by the gaussian approximation for posterior sampling in gratia.

Describe the solution you'd like
The current gratia::posterior_samples and gratia::fitted_samples defaults to method = "gaussian", which can be problematic on the edge of predictions (such as near a forecast) or the tail of a delay distribution. This becomes a bigger problem when modelling hierarchically and or with a data triangle.

A solution would be to set the default method to mh in the functions within this package.

Describe alternatives you've considered
Instead, there could be documentation that the user can change the sampling method desired (this is already coded up, so is a simpler change). Alternatively, there could be a diagnostic for when the gaussian approximation method has failed (some posterior draws and in the implausibly high values).

Additional context
Gavin has outlined the issue here.

@jonathonmellor
Copy link
Author

And if not already aware there is an approach for getting derivatives from gratia for the growth rate / Rt derivative approach. Obviously there's more control coding up your own finite difference scheme, but if gratia is already a package dependency then could be nice to use it's method.

https://gavinsimpson.github.io/gratia/reference/derivative_samples.html

We user the method here, the nice bit being you can get the derivative for a specific variable (such as the temporal component of a nowcast) which Maria can speak to more about in the measles work.

@zsusswein
Copy link
Collaborator

Thanks for the quick and thoughtful feedback Jon!

We'll handle the M-H sampler in the next release.

For the derivative samples, we ran into exactly the same issue as you did here gavinsimpson/gratia#293. I was hesitant to force a dependence on the Github version of gratia.

Either way, excited to hear more about the measles work!

@jonathonmellor
Copy link
Author

@zsusswein sounds good!

@zsusswein
Copy link
Collaborator

In working on this issue, the tests uncovered a bug where calling gratia::*_samples(..., n = 1, method = "mh") causes an internal gratia error:

Screenshot 2024-12-09 at 8 44 20 PM

I can reliably reproduce the bug:

Screenshot 2024-12-09 at 8 45 49 PM

Thoughts @seabbs? I don't see a great way of patching this that isn't super hacky (i.e., if n = 1, n +=1, thin at the end)

@seabbs
Copy link
Contributor

seabbs commented Dec 10, 2024

Do you have any idea what parts of a mgcv model cause this bug? I assume its some edge case in our use case if everyone isn't running into it.

@zsusswein
Copy link
Collaborator

It's an error in gratia, not mgcv and occurs with a minimal reprex:

fit <- mgcv::gam(y ~ s(x), data = list(y = 1:10 + rnorm(10), x = 1:10))

gratia::fitted_samples(fit, n = 1, method = "gaussian")
# A tibble: 10 × 4
    .row .draw .parameter .fitted
   <int> <int> <chr>        <dbl>
 1     1     1 location      1.81
 2     2     1 location      2.63
 3     3     1 location      3.44
 4     4     1 location      4.26
 5     5     1 location      5.08
 6     6     1 location      5.89
 7     7     1 location      6.71
 8     8     1 location      7.53
 9     9     1 location      8.34
10    10     1 location      9.16


gratia::fitted_samples(fit, n = 1, method = "mh")
Error in betas[, lss_loc, drop = FALSE] : incorrect number of dimensions

@zsusswein
Copy link
Collaborator

Stepping through with the debugger, it looks like the offending line is in gratia's fitted_samples.gam method:

fitted_samples.gam(betas)

 betas
     (Intercept)    s(timestep).1    s(timestep).2    s(timestep).3 
      7.79489966      -1.62233890      -0.79350657      -0.28999486 
   s(timestep).4    s(timestep).5    s(timestep).6    s(timestep).7 
      0.16740137       0.66096996       0.97932203       1.26868179 
   s(timestep).8    s(timestep).9   s(timestep).10   s(timestep).11 
      1.25728181       1.23197312       1.14937605       1.11560953 
  s(timestep).12   s(timestep).13   s(timestep).14   s(timestep).15 
      0.97296386       0.77191219       0.61295051       0.49893579 
  s(timestep).16   s(timestep).17   s(timestep).18   s(timestep).19 
      0.48924434       0.44196940       0.37559775       0.30176837 
  s(timestep).20   s(timestep).21   s(timestep).22   s(timestep).23 
      0.32920075       0.36566637       0.40864730       0.23157060 
  s(timestep).24 s(day_of_week).1 s(day_of_week).2 s(day_of_week).3 
     -0.03944733      -0.03862237       0.52471717      -0.40646693 
s(day_of_week).4 s(day_of_week).5 s(day_of_week).6 s(day_of_week).7 
      0.66223748      -0.04885560       0.43981059       0.01357301 

t(betas[, lss_loc, drop = FALSE])
Error in betas[, lss_loc, drop = FALSE] : incorrect number of dimensions

lss_loc
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32

betas[, lss_loc]
Error in betas[, lss_loc] : incorrect number of dimensions

 betas[lss_loc]
     (Intercept)    s(timestep).1    s(timestep).2    s(timestep).3 
      7.79489966      -1.62233890      -0.79350657      -0.28999486 
   s(timestep).4    s(timestep).5    s(timestep).6    s(timestep).7 
      0.16740137       0.66096996       0.97932203       1.26868179 
   s(timestep).8    s(timestep).9   s(timestep).10   s(timestep).11 
      1.25728181       1.23197312       1.14937605       1.11560953 
  s(timestep).12   s(timestep).13   s(timestep).14   s(timestep).15 
      0.97296386       0.77191219       0.61295051       0.49893579 
  s(timestep).16   s(timestep).17   s(timestep).18   s(timestep).19 
      0.48924434       0.44196940       0.37559775       0.30176837 
  s(timestep).20   s(timestep).21   s(timestep).22   s(timestep).23 
      0.32920075       0.36566637       0.40864730       0.23157060 
  s(timestep).24 s(day_of_week).1 s(day_of_week).2 s(day_of_week).3 
     -0.03944733      -0.03862237       0.52471717      -0.40646693 
s(day_of_week).4 s(day_of_week).5 s(day_of_week).6 s(day_of_week).7 
      0.66223748      -0.04885560       0.43981059       0.01357301 

It looks to me like the code expects betas to be a data frame of samples from the M-H sampler. But because it's only one row R has reduced it to a list and the index slicing is failing. It's just a problem with the n=1 case.

@jonathonmellor
Copy link
Author

It must be xmas... https://gavinsimpson.github.io/gratia/news/index.html new package version with the scale problem fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants