Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for proportion missing #12

Closed
gorcha opened this issue Apr 16, 2020 · 6 comments
Closed

Add test for proportion missing #12

gorcha opened this issue Apr 16, 2020 · 6 comments
Labels
Effort: ⭐⭐ Intermediate Importance: ❗ Desirable Task: Enhancement New feature or request
Milestone

Comments

@gorcha
Copy link
Collaborator

gorcha commented Apr 16, 2020

expect_prop_miss() or similar, to capture variables that allow some level of missingness, but not too much.

@gorcha gorcha added the Task: Enhancement New feature or request label Apr 16, 2020
@kinto-b
Copy link
Contributor

kinto-b commented Feb 2, 2021

Something like this?

expect_prop_miss <- function(var, prop = 0.05, miss = getOption("testdat.miss_text"), flt = TRUE, data = get_testdata()) {
  prop_miss <- sum(chk_text_miss(data[[var]], miss = miss)) / length(data[[var]]))
  
  expect_custom(
    prop_miss <= prop,
    "Some error message",
    ...
  )
}

@gorcha
Copy link
Collaborator Author

gorcha commented Feb 5, 2021

Yeah something along those lines.

Just make sure you deal with the flt argument and do all the quoting and stuff to get the var name etc. for error messaging. Easiest to just copy an existing expectation as a guide.

@kinto-b kinto-b added this to the testdat 0.2.0 milestone Feb 8, 2021
@kinto-b kinto-b added Effort: ⭐ Straightforward Importance: ❗ Desirable labels Feb 8, 2021
@kinto-b
Copy link
Contributor

kinto-b commented Feb 9, 2021

Going to use getOption("testdat.miss") rather than getOption("testdat.miss_text") as the default for miss

@kinto-b
Copy link
Contributor

kinto-b commented Feb 9, 2021

On second thought, we could define a generic expect_prop() function:

expect_prop <- function(var, func, prop, cmp, flt = TRUE, data = get_testdata(), args = list(), func_desc = NULL) {
  act <- quasi_label(enquo(data))
  act$func_desc <- if (is.null(func_desc)) paste0("`", as_label(enquo(func)), "`") else func_desc
  act$var_desc  <- as_label_vars(enquo(var))
  act$flt_desc  <- as_label_flt(enquo(flt))
  act$args_desc <- as_label_repl(args, "(^list\\()|(\\)$)", "")

  act$result <- data %>%
    filter(!!flt) %>%
    mutate(across(!!var, func, !!!args)) %>%
    pull(!!var)

  act$result_prop <- sum(act$result, na.rm = TRUE) / length(act$result)

  expect_custom(
    cmp(act$result_prop, prop),
    glue("{act$lab} has {sum(act$result, na.rm = TRUE)} records \\
         ({signif(act$result_prop, 2)} of total) satisfying {act$func_desc} \\
         on variable `{act$var_desc}`
         Filter: {act$flt_desc}
         Arguments: `{act$args_desc}`"),
    failed_count = sum(act$result, na.rm = TRUE),
    total_count = length(act$result),
    result = act$result
  )

  invisible(act$result)
}

expect_prop_lte <- function(var, func, prop, flt = TRUE, data = get_testdata(), args = list(), func_desc = NULL) {
  expect_prop(var, func, cmp = `<=`, prop, flt, data, args, func_desc)
}

expect_prop_gte <- function(var, func, prop, flt = TRUE, data = get_testdata(), args = list(), func_desc = NULL) {
  expect_prop(var, func, cmp = `>=`, prop, flt, data, args, func_desc)
}

And then define expect_prop_miss as

expect_prop_miss <- function(var, prop, miss = getOption("testdat.miss"), flt = TRUE, data = get_testdata()) {
  var <- enquo(var)
  expect_prop_lte(
    var = var, 
    func = function(x) chk_text_miss(x, miss = miss), 
    prop = prop, 
    flt = flt, 
    data = data, 
    func_desc = "missing check"
  )
}

This would be easy to extend to the other chk_*() functions:

expect_prop_values <- function(var, prop, ..., flt = TRUE, data = get_testdata()) {
  var <- enquo(var)
  expect_prop_gte(
    var = var, 
    func = function(x) chk_values(x, ...), 
    prop = prop, 
    flt = flt, 
    data = data, 
    func_desc = "missing check"
  )
}

@kinto-b
Copy link
Contributor

kinto-b commented Feb 9, 2021

For the record, this was the function I initially had:

expect_prop_miss <- function(var, prop, miss = getOption("testdat.miss"),
                             flt = TRUE, data = get_testdata()) {
  act <- quasi_label(enquo(data))
  act$var_desc <- as_label_vars(enquo(var))
  act$flt_desc <- as_label_flt(enquo(flt))
  var <- enquo(var)

  act$result <- data %>%
    filter(!!flt) %>%
    pull(!!var) %>%
    chk_text_miss(miss = miss)

  act$result_prop <- sum(act$result, na.rm = TRUE) / length(act$result)


  expect_custom(
    act$result_prop <= prop,
    glue("{act$lab} has {sum(act$result, na.rm = TRUE)} missing records \\
         ({signif(act$result_prop, 2)} of total) on variable `{act$var_desc}`
         Filter: {act$flt_desc}"),
    failed_count = sum(act$result, na.rm = TRUE),
    total_count = length(act$result),
    result = act$result
  )
}

@kinto-b kinto-b added Effort: ⭐⭐ Intermediate and removed Effort: ⭐ Straightforward labels Feb 9, 2021
@kinto-b
Copy link
Contributor

kinto-b commented Mar 1, 2021

@gorcha What do you think about this approach? (See the commit above)

@kinto-b kinto-b closed this as completed in cbccfc2 Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Effort: ⭐⭐ Intermediate Importance: ❗ Desirable Task: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants