-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use of data argument in broom::augment() is unnecessary and potentially misleading #292
Comments
Where did you find examples of |
Every use of glm(
exposure ~ confounder_1 + confounder_2,
data = df,
family = [binomial](https://rdrr.io/r/stats/family.html)()
) |>
augment(type.predict = "response", data = df) |
Also here in chapter 2: library(rsample)
fit_ipw <- function(.split, ...) {
# get bootstrapped data frame
.df <- as.data.frame(.split)
# fit propensity score model
propensity_model <- glm(
net ~ income + health + temperature,
data = .df,
family = binomial()
)
# calculate inverse probability weights
.df <- propensity_model |>
augment(type.predict = "response", data = .df) |>
mutate(wts = wt_ate(.fitted, net))
# fit correctly bootstrapped ipw model
lm(malaria_risk ~ net, data = .df, weights = wts) |>
tidy()
} |
Chapter 9 mostly uses library(broom)
library(touringplans)
seven_dwarfs <- seven_dwarfs_train_2018 |>
filter(wait_hour == 9) |>
mutate(park_extra_magic_morning = factor(
park_extra_magic_morning,
labels = c("No Magic Hours", "Extra Magic Hours")
))
seven_dwarfs_with_ps <- glm(
park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,
data = seven_dwarfs,
family = binomial()
) |>
augment(type.predict = "response", data = seven_dwarfs) |
I let myself get tripped up here because it is indeed all a little confusing. I think I am settled on we should be using |
Since glm models store the data used to fit them, the use of the
data
argument toaugment()
is not needed when computing propensity scores. The more interesting argument tobroom::augment()
isnewdata
, which allows you to compute propensity scores to a different data set from the one used to fit the model (for example to the matched pairs after matching, or to any other data set you like).From the help for
augment()
:data
A base::data.frame or tibble::tibble() containing the original data that was used to produce the object x. Defaults to stats::model.frame(x) so that augment(my_fit) returns the augmented original data. Do not pass new data to the data argument. Augment will report information such as influence and cooks distance for data passed to the data argument. These measures are only defined for the original training data.
newdata
A base::data.frame() or tibble::tibble() containing all the original predictors used to create x. Defaults to NULL, indicating that nothing has been passed to newdata. If newdata is specified, the data argument will be ignored.
The text was updated successfully, but these errors were encountered: