Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions #15

Closed
kinto-b opened this issue Jan 18, 2021 · 7 comments
Closed

Questions #15

kinto-b opened this issue Jan 18, 2021 · 7 comments
Assignees
Labels
Discussion Further information is requested

Comments

@kinto-b
Copy link
Contributor

kinto-b commented Jan 18, 2021

I have been going through the package, trying to get acquainted with all of the functions, and I have been coming across some that I don't understand / I don't think work / I don't see the utility of.

  • filter_expect (link) - is supposed to receive an expectation function as an argument and return the input data frame filtered to records failing that expectation. But it doesn't work as expected with any of the expectation functions defined (see below). It doesn't appear to be used anywhere else in the package. Consider putting on the chopping block?
filter_expect(mtcars, expect_range, min = 0, max = 100)
# > [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
# > <0 rows> (or 0-length row.names)
  • expect_similar (link) - looks like it's supposed to take two dataframes, join them, and then check categorical variables to see whether the proportion of data falling into each level matches. But it doesn't work (see below). It doesn't appear to be used anywhere else in the package. Consider putting on the chopping block?
testdat::expect_similar(
    data = mtcars,
    var = am, 
    data2 = data.frame(my_am = rbinom(10, 1, 0.5)), 
    var2 = my_am
)
#> Error: Join columns must be present in data.
#> x Problem with `my_am`.
  • expect_func (link)- I don't know what this function is supposed to do. It seems not to work? It's not used anywhere else in the package.
testdat::set_testdata(mtcars)
testdat::expect_func(vars = disp, func = chk_range, args = list(min = 0, max = 100))
#> Error in check_dot_cols(.vars, .cols): object 'disp' not found
  • expect-value functions (link) - these functions are defined explicitly, but I don't see why. They have corresponding chk_* functions, so why not use the expect-make approach?

I'll keep adding to this issue as I find more functions like this

@kinto-b kinto-b added the Discussion Further information is requested label Jan 18, 2021
@gorcha
Copy link
Collaborator

gorcha commented Jan 20, 2021

Yo! Hope you guys are well.

A few comments:

  • filter_expect() is very rough and hasn't been used much. The call above isn't quite right since it doesn't reference a variable, but I tried and it still doesn't quite do what's expected since the expectation throws an error.
filter_expect(mtcars, expect_range, disp, min = 0, max = 100)
#> Error: `data` has 27 records failing range check on variable `disp`.
#> Filter: None
#> Arguments: `<named list>`
  • expect_similar() is very rough so not super surprised it's not working. I probably only tested it with vars that have the same name across both data frames.

  • expect_func() is a convenience function for calling a chk function on an individual variable, although the same can be achieved using expect_all() or expect_any() with a single variable. It currently works but the argument is var instead of vars:

set_testdata(mtcars)
expect_func(var = disp, func = chk_range, args = list(min = 0, max = 100))
#> Error: get_testdata() has 27 records failing `chk_range` on variable `disp`.
#> Filter: None
#> Arguments: `<named list>`
  • expect_values() itself could be converted into an expect_make() call, but the others are doing things across multiple variables (e.g. expect_unique() checking uniqueness based on a combination of variables), where the chk_*() functions only ever check a single vector.

@kinto-b
Copy link
Contributor Author

kinto-b commented Jan 21, 2021

@gorcha

Hey! Thanks for this! That's really helpful.

Given what you've said, my inclination is to fix up expect_similar(), implement expect_values() using expect_make(), and drop filter_expect() and expect_func().

Do you have any thoughts? (Also feel free not to reply to this 😛 Al and I can work it out)

@gorcha
Copy link
Collaborator

gorcha commented Jan 22, 2021

No problemo!

Yeah that sounds good to me, although I would leave expect_func() (doesn't hurt to leave it in)

@kinto-b
Copy link
Contributor Author

kinto-b commented Jan 26, 2021

Just looking at expect_exclusive, having a bit of a tough time figuring out exactly what it's supposed to do.

Is this right:

expect_exclusive(vars = vars(a), exc_vars = vars(a, b, c, d))

fails for records where a = 1 and also b=1 or c=1 or d=1

expect_exclusive(vars(a, b), vars(a, b, c, d))

fails for records where a = 1 & b = 1 and also c = 1 or d = 1.

If so, would the following be equivalent:

my_expect_exclusive <- function(vars, val, flt, data = get_testdata()) {
  # Filter data to flt

  # Check that each record has at most one of `vars` equal to `val`
}

my_expect_exclusive(vars = vars(a, b, c, d), val = 1, flt = a %in% 1) # Equivalent to the first expectation above
my_expect_exclusive(vars = vars(b, c, d), val = 1, flt = a %in% 1) # Not possible with the expectation above

The second expectation would fail only if a = 1 and also b = 1 & c = 1 or b = 1 & d = 1 or c = 1 & d = 1.

@gorcha
Copy link
Collaborator

gorcha commented Jan 27, 2021

Yep that's how it should work.

This is for "exclusive" response codes in multi response sets.
For e.g. where a multi-response item has a "None of the above" option it should never be selected alongside other responses.

So a common call might look like this:

expect_exclusive(vars = vars(q1_11, q1_98, q1_99), exc_vars = vars(matches("^q1_")))

I prefer the current setup because using a filter isn't quite what we're doing conceptually (although that might be how it's done internally) - the current API call feels more natural, saying "make sure these variables are exclusive given this variable set". Could do with better argument naming though.

@kinto-b
Copy link
Contributor Author

kinto-b commented Feb 1, 2021

Yeah, I see what you mean. It was initially counter-intuitive to me, and I had to look at the internals to figure out what it was doing. But I've added in documentation for it now, and I reckon it's fine as is with that as supplementation.

Circling back to expect_func, I still favour dropping it on the basis of parsimony. It only saves writing 6 extra characters (vars()) and costs one extra (func vs all). And using a custom expectation isn't something that one would do all that often anyway

@kinto-b
Copy link
Contributor Author

kinto-b commented Feb 2, 2021

I think everything's been addressed here and I've finished going through testdat. Going to close this out. Can put up specific issues if anything further arises

@kinto-b kinto-b closed this as completed Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants