Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement expect_unique_combine #22

Closed
kinto-b opened this issue Jan 21, 2021 · 3 comments
Closed

Implement expect_unique_combine #22

kinto-b opened this issue Jan 21, 2021 · 3 comments
Labels
Effort: ⭐ Straightforward Importance: ❗ Desirable Task: Enhancement New feature or request
Milestone

Comments

@kinto-b
Copy link
Contributor

kinto-b commented Jan 21, 2021

Currently this is sitting as a TODO in script. Will remove from the script. Need to implement at some point

testdat/R/expect-value.R

Lines 168 to 193 in 9f3ef72

# TODO - check if any values exist more than once across multiple variables in
# the entire dataset
# #' @export
# #' @rdname value-expectations
expect_unique_combine <- function(vars, flt = TRUE, data = get_testdata()) {
browser()
act <- quasi_label(enquo(data))
act$var_desc <- as_label_vars(enquo(vars))
act$flt_desc <- as_label_flt(enquo(flt))
flt <- enquo(flt)
act$result <- data %>%
filter(!!flt) %>%
select(!!!vars)
expect_custom(
all(act$result, na.rm = TRUE),
glue("{act$lab} has {sum(!act$result, na.rm = TRUE)} records with \\
duplicates across variables `{act$var_desc}`.
Filter: {act$flt_desc}"),
failed_count = sum(!act$result, na.rm = TRUE),
total_count = sum(!is.na(act$result))
)
invisible(act$result)
}

@kinto-b kinto-b added the Task: Enhancement New feature or request label Jan 21, 2021
kinto-b pushed a commit that referenced this issue Jan 21, 2021
@kinto-b kinto-b added the Discussion Further information is requested label Feb 2, 2021
@kinto-b
Copy link
Contributor Author

kinto-b commented Feb 2, 2021

What's the envisioned use-case for this function?

@gorcha
Copy link
Collaborator

gorcha commented Feb 5, 2021

This is for e.g. checking that no phone number appears more than once in the dataset.

There are 3 kinds of uniqueness check:

  • Check that each record is unique for a given set of variables (expect_unique())
  • Check that a set of variables each have unique values for each individual record (expect_unique_across())
  • Check that a set of variables have unique values across the entire dataset (expect_unique_combined())

A little confusing, but naming things is hard 😛

@kinto-b
Copy link
Contributor Author

kinto-b commented Feb 5, 2021

Nah, I think those are good names. I just couldn't think of an instance where you'd want to check that any(duplicated(c(dat$x, dat$y, dat$z))) is false, but the phone numbers check makes sense to me : )

@kinto-b kinto-b removed the Discussion Further information is requested label Feb 5, 2021
@kinto-b kinto-b added this to the testdat 0.2.0 milestone Feb 8, 2021
@kinto-b kinto-b added Effort: ⭐ Straightforward Importance: ❗ Desirable labels Feb 8, 2021
@kinto-b kinto-b closed this as completed in 2f1064c Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Effort: ⭐ Straightforward Importance: ❗ Desirable Task: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants