Understand how to predict new subjects in a mixed effect model #12

danielinteractive · 2022-06-28T09:50:18Z

Background: What if we have a new patient in the test set that has a new random effect e.g. SLD parameters (e.g. ks)?

To do:

think about methods side
- if we have a much simpler linear mixed model e.g. fitted with lme4 package in R - how do the predictions work for new subjects?
- e.g. random intercept and random slope for a continuous covariate -> how does the prediction work?
Consider the covariates that are modelled with random effects (corresponding to our SLD) as something we provide / have, then predict from that

gowerc · 2022-06-30T08:45:40Z

@danielinteractive , Apologies if I'm mistaken but I thought the general practice of prediction with random effects was to just set the random effects to 0? At least this makes intuiative sense to me as they are essentially nuisance parameters that are unique to the subject that you have no prior information on so the best you can do is assume the subject lies in the centre of the distribution (i.e. 0).

I would have thought the only case where we wouldn't set to 0 is if we wanted to do simulation from the model (i.e. some form of say parametric bootstrapping).

danielinteractive · 2022-06-30T08:50:22Z

Thanks @gowerc for looking at this - yeah generally I think the same. We just need to understand how it works here. Since here SLD is a covariate for the OS prediction. And I guess we need to somehow fit the SLD curve to the SLD observations of this individual to obtain meaningful OS predictions.

gowerc · 2022-06-30T09:02:02Z

I feel like I'm potentially not understanding this properly :) Wouldn't you just apply both models i.e. take the model coeificients and use that to predict the patients SLD values based on their baseline covariates. Then you can use the SLD values to predict the patients OS hazard and then convert that into S(t) = exp (-H(t)) ?

I guess one challenge would be extracting the final OS hazard model from the stan code to calculate the hazard values. I think rstan has functions for exporting stan functions into R but I'm not sure if we have the same for cmdstan, perhaps we can use manipulate the stan code to create a stand alone stan program that simply takes OS inputs and returns the hazard values from day 0 - 1000 ?

danielinteractive · 2022-06-30T09:06:51Z

No we don't to predict SLD values for patients. The application here is that we observe SLD values and want to predict OS values. SLD is not part of the outcomes in that sense.

gowerc · 2022-06-30T10:52:03Z

O I see, sorry I thought we meant new patient as in we had no information about them other than baseline covariates. I guess if they are a brand new patient but have SLD data then we would need a way of re-fitting the model to them keeping the parameters & Hyperparameters fixed to just estimate their individual random effects for the kinetics model. I wonder if this is even possible..

danielinteractive · 2022-06-30T11:00:37Z

Yeah exactly something like that

gowerc · 2023-03-16T13:00:39Z

This is a methods question, potentially need to involve MCO / Francois

gowerc · 2024-02-08T15:25:23Z

Need to clarify what high level steps are required to answer actual questions from trials application. E.g. how is it intended for the model to be used in practice e.g. do people need to be able to pass in longitudinal data into an already fit OS model? Need to clarify with @mercifr1 what the intended applications of the model for prediction are.

Potentially need to add a workflow vignette to clarify the desired application. e.g. if you are a new statistician using this package how should they be expcted to use. E.g. what problem are we solving and how to use the package to solve that problem

gowerc · 2024-04-12T06:05:25Z

@danielinteractive , Talking with @mercifr1 about this yesterday we were discussing making this more general so that its up to the end user to decide what they put through the model (this is also kinda linked to #296 ).

That is for predicting OS the user would specify the baseline covariates and the TGI parameter values that they want to use to make the predictions. They are then free to set the TGI parameter values either to the population medians or to a arbitrary value if wanting to predict a hypothetical patient.

Perhaps then an interface could be something like:

predict(
    model,
    new.data = data.frame(sex = "F", age = 30, ECOG = "3" ),
    link.data = data.frame("s" = 0.6, "g" = 0.3, "phi" = 0.4, "b" = 60)
)

danielinteractive · 2024-04-12T09:30:45Z

Yeah this could work nicely I think

gowerc · 2024-05-08T12:46:50Z

This is more just an FYI...

Just to say it appears lme4 provides no functionality for manually setting the random effects values; looks like you can only predict based on (a) "the population level" e.g. setting all random effects to 0 or (b) setting the random effects to the values of a specific patient. Their api for (b) is:

predict(
    mod,
    newdata = tibble(
        age = 0.5,
        pt = "pt_000003"
    ),
    re.form = ~ (1 + age | pt)
)

In particular the re.form argument allows you to specify which random effects you want to assume, for example setting re.form = ~ (0 + age | pt) would keep the random effect for age but set the random intercept to 0.

From what I've been reading there is some theory and supporting packages for bayesian extensions that, assuming you have some observed data for a new subject, can be used to calculate the posterior distribution for the subjects individual random effects given their observed values. I've not looked that deeply into this though.

gowerc · 2024-05-13T11:21:41Z

With #313 being merged I am going to push this onto the backlog as we have enough of a basic feature set even if the full feature isn't available.

gowerc mentioned this issue Jun 30, 2022

Create predict function to generate survival / sld estimates for new subjects #9

Closed

gowerc added the help wanted Stuck or unsure how to handle label Jul 15, 2022

danielinteractive added the methods Question relating to methods and/or theoretical details label May 10, 2023

mercifr1 self-assigned this Nov 2, 2023

gowerc added this to the Initial Internal Release milestone Feb 8, 2024

This was referenced May 8, 2024

Refactor of QuantityGenerators #312

Merged

Support for Survival Predictions #313

Merged

gowerc unassigned mercifr1 May 13, 2024

gowerc removed this from the Initial Internal Release milestone May 13, 2024

gowerc added the Priority Is a priority issue label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand how to predict new subjects in a mixed effect model #12

Understand how to predict new subjects in a mixed effect model #12

danielinteractive commented Jun 28, 2022

gowerc commented Jun 30, 2022

danielinteractive commented Jun 30, 2022

gowerc commented Jun 30, 2022

danielinteractive commented Jun 30, 2022

gowerc commented Jun 30, 2022

danielinteractive commented Jun 30, 2022

gowerc commented Mar 16, 2023

gowerc commented Feb 8, 2024

gowerc commented Apr 12, 2024

danielinteractive commented Apr 12, 2024

gowerc commented May 8, 2024

gowerc commented May 13, 2024

Understand how to predict new subjects in a mixed effect model #12

Understand how to predict new subjects in a mixed effect model #12

Comments

danielinteractive commented Jun 28, 2022

gowerc commented Jun 30, 2022

danielinteractive commented Jun 30, 2022

gowerc commented Jun 30, 2022

danielinteractive commented Jun 30, 2022

gowerc commented Jun 30, 2022

danielinteractive commented Jun 30, 2022

gowerc commented Mar 16, 2023

gowerc commented Feb 8, 2024

gowerc commented Apr 12, 2024

danielinteractive commented Apr 12, 2024

gowerc commented May 8, 2024

gowerc commented May 13, 2024