use of data argument in broom::augment() is unnecessary and potentially misleading #292

rpruim · 2024-11-09T19:19:11Z

Since glm models store the data used to fit them, the use of the data argument to augment() is not needed when computing propensity scores. The more interesting argument to broom::augment() is newdata, which allows you to compute propensity scores to a different data set from the one used to fit the model (for example to the matched pairs after matching, or to any other data set you like).

From the help for augment():

data
A base::data.frame or tibble::tibble() containing the original data that was used to produce the object x. Defaults to stats::model.frame(x) so that augment(my_fit) returns the augmented original data. Do not pass new data to the data argument. Augment will report information such as influence and cooks distance for data passed to the data argument. These measures are only defined for the original training data.

newdata
A base::data.frame() or tibble::tibble() containing all the original predictors used to create x. Defaults to NULL, indicating that nothing has been passed to newdata. If newdata is specified, the data argument will be ignored.

The text was updated successfully, but these errors were encountered:

malcolmbarrett · 2024-11-10T19:26:20Z

Where did you find examples of data?

rpruim · 2024-11-11T02:07:06Z

Every use of augment() in chapter 8, for example. This includes the template for adding propensity scores to data:

glm(
  exposure ~ confounder_1 + confounder_2,
  data = df,
  family = [binomial](https://rdrr.io/r/stats/family.html)()
) |>
  augment(type.predict = "response", data = df)

rpruim · 2024-11-11T02:11:25Z

Also here in chapter 2:

library(rsample)

fit_ipw <- function(.split, ...) {
  # get bootstrapped data frame
  .df <- as.data.frame(.split)

  # fit propensity score model
  propensity_model <- glm(
    net ~ income + health + temperature,
    data = .df,
    family = binomial()
  )

  # calculate inverse probability weights
  .df <- propensity_model |>
    augment(type.predict = "response", data = .df) |>
    mutate(wts = wt_ate(.fitted, net))

  # fit correctly bootstrapped ipw model
  lm(malaria_risk ~ net, data = .df, weights = wts) |>
    tidy()
}

rpruim · 2024-11-11T02:13:38Z

Chapter 9 mostly uses newdata, but there is one example using data:

library(broom)
library(touringplans)

seven_dwarfs <- seven_dwarfs_train_2018 |>
  filter(wait_hour == 9) |>
  mutate(park_extra_magic_morning = factor(
    park_extra_magic_morning,
    labels = c("No Magic Hours", "Extra Magic Hours")
  ))

seven_dwarfs_with_ps <- glm(
  park_extra_magic_morning ~ park_ticket_season + park_close + park_temperature_high,
  data = seven_dwarfs,
  family = binomial()
) |>
  augment(type.predict = "response", data = seven_dwarfs)

malcolmbarrett · 2024-12-23T22:51:58Z

I let myself get tripped up here because it is indeed all a little confusing. I think I am settled on we should be using data and not newdata or just using the default argument value. The reason is that we are providing the original data but we also want the other columns still (for instance, the outcome column, which would not be in the propensity score model frame). It's philosophically consistent with the argument description for data and, in fact, I learned we depend on using that argument for a few outputs of augment() that you get with the original data

malcolmbarrett added this to the Chapter 08: Propensity scores milestone Nov 11, 2024

malcolmbarrett mentioned this issue Dec 23, 2024

Make code choices more consistent #301

Merged

5 tasks

malcolmbarrett closed this as completed in #301 Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use of data argument in broom::augment() is unnecessary and potentially misleading #292

use of data argument in broom::augment() is unnecessary and potentially misleading #292

rpruim commented Nov 9, 2024 •

edited

Loading

malcolmbarrett commented Nov 10, 2024

rpruim commented Nov 11, 2024

rpruim commented Nov 11, 2024

rpruim commented Nov 11, 2024

malcolmbarrett commented Dec 23, 2024 •

edited

Loading

use of data argument in broom::augment() is unnecessary and potentially misleading #292

use of data argument in broom::augment() is unnecessary and potentially misleading #292

Comments

rpruim commented Nov 9, 2024 • edited Loading

malcolmbarrett commented Nov 10, 2024

rpruim commented Nov 11, 2024

rpruim commented Nov 11, 2024

rpruim commented Nov 11, 2024

malcolmbarrett commented Dec 23, 2024 • edited Loading

rpruim commented Nov 9, 2024 •

edited

Loading

malcolmbarrett commented Dec 23, 2024 •

edited

Loading