Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understand how to predict new subjects in a mixed effect model #12

Open
4 tasks
danielinteractive opened this issue Jun 28, 2022 · 12 comments
Open
4 tasks
Labels
help wanted Stuck or unsure how to handle methods Question relating to methods and/or theoretical details Priority Is a priority issue

Comments

@danielinteractive
Copy link
Collaborator

Background: What if we have a new patient in the test set that has a new random effect e.g. SLD parameters (e.g. ks)?

To do:

  • think about methods side
    • if we have a much simpler linear mixed model e.g. fitted with lme4 package in R - how do the predictions work for new subjects?
    • e.g. random intercept and random slope for a continuous covariate -> how does the prediction work?
  • Consider the covariates that are modelled with random effects (corresponding to our SLD) as something we provide / have, then predict from that
@gowerc
Copy link
Collaborator

gowerc commented Jun 30, 2022

@danielinteractive , Apologies if I'm mistaken but I thought the general practice of prediction with random effects was to just set the random effects to 0? At least this makes intuiative sense to me as they are essentially nuisance parameters that are unique to the subject that you have no prior information on so the best you can do is assume the subject lies in the centre of the distribution (i.e. 0).

I would have thought the only case where we wouldn't set to 0 is if we wanted to do simulation from the model (i.e. some form of say parametric bootstrapping).

@danielinteractive
Copy link
Collaborator Author

Thanks @gowerc for looking at this - yeah generally I think the same. We just need to understand how it works here. Since here SLD is a covariate for the OS prediction. And I guess we need to somehow fit the SLD curve to the SLD observations of this individual to obtain meaningful OS predictions.

@gowerc
Copy link
Collaborator

gowerc commented Jun 30, 2022

I feel like I'm potentially not understanding this properly :) Wouldn't you just apply both models i.e. take the model coeificients and use that to predict the patients SLD values based on their baseline covariates. Then you can use the SLD values to predict the patients OS hazard and then convert that into S(t) = exp (-H(t)) ?

I guess one challenge would be extracting the final OS hazard model from the stan code to calculate the hazard values. I think rstan has functions for exporting stan functions into R but I'm not sure if we have the same for cmdstan, perhaps we can use manipulate the stan code to create a stand alone stan program that simply takes OS inputs and returns the hazard values from day 0 - 1000 ?

@danielinteractive
Copy link
Collaborator Author

No we don't to predict SLD values for patients. The application here is that we observe SLD values and want to predict OS values. SLD is not part of the outcomes in that sense.

@gowerc
Copy link
Collaborator

gowerc commented Jun 30, 2022

O I see, sorry I thought we meant new patient as in we had no information about them other than baseline covariates. I guess if they are a brand new patient but have SLD data then we would need a way of re-fitting the model to them keeping the parameters & Hyperparameters fixed to just estimate their individual random effects for the kinetics model. I wonder if this is even possible..

@danielinteractive
Copy link
Collaborator Author

Yeah exactly something like that

@gowerc gowerc added the help wanted Stuck or unsure how to handle label Jul 15, 2022
@gowerc
Copy link
Collaborator

gowerc commented Mar 16, 2023

This is a methods question, potentially need to involve MCO / Francois

@danielinteractive danielinteractive added the methods Question relating to methods and/or theoretical details label May 10, 2023
@mercifr1 mercifr1 self-assigned this Nov 2, 2023
@gowerc
Copy link
Collaborator

gowerc commented Feb 8, 2024

Need to clarify what high level steps are required to answer actual questions from trials application. E.g. how is it intended for the model to be used in practice e.g. do people need to be able to pass in longitudinal data into an already fit OS model? Need to clarify with @mercifr1 what the intended applications of the model for prediction are.

Potentially need to add a workflow vignette to clarify the desired application. e.g. if you are a new statistician using this package how should they be expcted to use. E.g. what problem are we solving and how to use the package to solve that problem

@gowerc gowerc added this to the Initial Internal Release milestone Feb 8, 2024
@gowerc
Copy link
Collaborator

gowerc commented Apr 12, 2024

@danielinteractive , Talking with @mercifr1 about this yesterday we were discussing making this more general so that its up to the end user to decide what they put through the model (this is also kinda linked to #296 ).

That is for predicting OS the user would specify the baseline covariates and the TGI parameter values that they want to use to make the predictions. They are then free to set the TGI parameter values either to the population medians or to a arbitrary value if wanting to predict a hypothetical patient.

Perhaps then an interface could be something like:

predict(
    model,
    new.data = data.frame(sex = "F", age = 30, ECOG = "3" ),
    link.data = data.frame("s" = 0.6, "g" = 0.3, "phi" = 0.4, "b" = 60)
)

@danielinteractive
Copy link
Collaborator Author

Yeah this could work nicely I think

@gowerc
Copy link
Collaborator

gowerc commented May 8, 2024

This is more just an FYI...

Just to say it appears lme4 provides no functionality for manually setting the random effects values; looks like you can only predict based on (a) "the population level" e.g. setting all random effects to 0 or (b) setting the random effects to the values of a specific patient. Their api for (b) is:

predict(
    mod,
    newdata = tibble(
        age = 0.5,
        pt = "pt_000003"
    ),
    re.form = ~ (1 + age | pt)
)

In particular the re.form argument allows you to specify which random effects you want to assume, for example setting re.form = ~ (0 + age | pt) would keep the random effect for age but set the random intercept to 0.

From what I've been reading there is some theory and supporting packages for bayesian extensions that, assuming you have some observed data for a new subject, can be used to calculate the posterior distribution for the subjects individual random effects given their observed values. I've not looked that deeply into this though.

@gowerc
Copy link
Collaborator

gowerc commented May 13, 2024

With #313 being merged I am going to push this onto the backlog as we have enough of a basic feature set even if the full feature isn't available.

@gowerc gowerc removed this from the Initial Internal Release milestone May 13, 2024
@gowerc gowerc added the Priority Is a priority issue label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Stuck or unsure how to handle methods Question relating to methods and/or theoretical details Priority Is a priority issue
Projects
None yet
Development

No branches or pull requests

3 participants