Questions #15

kinto-b · 2021-01-18T22:28:50Z

I have been going through the package, trying to get acquainted with all of the functions, and I have been coming across some that I don't understand / I don't think work / I don't see the utility of.

filter_expect (link) - is supposed to receive an expectation function as an argument and return the input data frame filtered to records failing that expectation. But it doesn't work as expected with any of the expectation functions defined (see below). It doesn't appear to be used anywhere else in the package. Consider putting on the chopping block?

filter_expect(mtcars, expect_range, min = 0, max = 100)
# > [1] mpg  cyl  disp hp   drat wt   qsec vs   am   gear carb
# > <0 rows> (or 0-length row.names)

expect_similar (link) - looks like it's supposed to take two dataframes, join them, and then check categorical variables to see whether the proportion of data falling into each level matches. But it doesn't work (see below). It doesn't appear to be used anywhere else in the package. Consider putting on the chopping block?

testdat::expect_similar(
    data = mtcars,
    var = am, 
    data2 = data.frame(my_am = rbinom(10, 1, 0.5)), 
    var2 = my_am
)
#> Error: Join columns must be present in data.
#> x Problem with `my_am`.

expect_func (link)- I don't know what this function is supposed to do. It seems not to work? It's not used anywhere else in the package.

testdat::set_testdata(mtcars)
testdat::expect_func(vars = disp, func = chk_range, args = list(min = 0, max = 100))
#> Error in check_dot_cols(.vars, .cols): object 'disp' not found

expect-value functions (link) - these functions are defined explicitly, but I don't see why. They have corresponding chk_* functions, so why not use the expect-make approach?

I'll keep adding to this issue as I find more functions like this

The text was updated successfully, but these errors were encountered:

gorcha · 2021-01-20T09:13:36Z

Yo! Hope you guys are well.

A few comments:

filter_expect() is very rough and hasn't been used much. The call above isn't quite right since it doesn't reference a variable, but I tried and it still doesn't quite do what's expected since the expectation throws an error.

filter_expect(mtcars, expect_range, disp, min = 0, max = 100)
#> Error: `data` has 27 records failing range check on variable `disp`.
#> Filter: None
#> Arguments: `<named list>`

expect_similar() is very rough so not super surprised it's not working. I probably only tested it with vars that have the same name across both data frames.
expect_func() is a convenience function for calling a chk function on an individual variable, although the same can be achieved using expect_all() or expect_any() with a single variable. It currently works but the argument is var instead of vars:

set_testdata(mtcars)
expect_func(var = disp, func = chk_range, args = list(min = 0, max = 100))
#> Error: get_testdata() has 27 records failing `chk_range` on variable `disp`.
#> Filter: None
#> Arguments: `<named list>`

expect_values() itself could be converted into an expect_make() call, but the others are doing things across multiple variables (e.g. expect_unique() checking uniqueness based on a combination of variables), where the chk_*() functions only ever check a single vector.

kinto-b · 2021-01-21T00:17:24Z

@gorcha

Hey! Thanks for this! That's really helpful.

Given what you've said, my inclination is to fix up expect_similar(), implement expect_values() using expect_make(), and drop filter_expect() and expect_func().

Do you have any thoughts? (Also feel free not to reply to this 😛 Al and I can work it out)

gorcha · 2021-01-22T03:29:28Z

No problemo!

Yeah that sounds good to me, although I would leave expect_func() (doesn't hurt to leave it in)

kinto-b · 2021-01-26T23:25:02Z

Just looking at expect_exclusive, having a bit of a tough time figuring out exactly what it's supposed to do.

Is this right:

expect_exclusive(vars = vars(a), exc_vars = vars(a, b, c, d))

fails for records where a = 1 and also b=1 or c=1 or d=1

expect_exclusive(vars(a, b), vars(a, b, c, d))

fails for records where a = 1 & b = 1 and also c = 1 or d = 1.

If so, would the following be equivalent:

my_expect_exclusive <- function(vars, val, flt, data = get_testdata()) {
  # Filter data to flt

  # Check that each record has at most one of `vars` equal to `val`
}

my_expect_exclusive(vars = vars(a, b, c, d), val = 1, flt = a %in% 1) # Equivalent to the first expectation above
my_expect_exclusive(vars = vars(b, c, d), val = 1, flt = a %in% 1) # Not possible with the expectation above

The second expectation would fail only if a = 1 and also b = 1 & c = 1 or b = 1 & d = 1 or c = 1 & d = 1.

gorcha · 2021-01-27T02:16:50Z

Yep that's how it should work.

This is for "exclusive" response codes in multi response sets.
For e.g. where a multi-response item has a "None of the above" option it should never be selected alongside other responses.

So a common call might look like this:

expect_exclusive(vars = vars(q1_11, q1_98, q1_99), exc_vars = vars(matches("^q1_")))

I prefer the current setup because using a filter isn't quite what we're doing conceptually (although that might be how it's done internally) - the current API call feels more natural, saying "make sure these variables are exclusive given this variable set". Could do with better argument naming though.

kinto-b · 2021-02-01T02:28:32Z

Yeah, I see what you mean. It was initially counter-intuitive to me, and I had to look at the internals to figure out what it was doing. But I've added in documentation for it now, and I reckon it's fine as is with that as supplementation.

Circling back to expect_func, I still favour dropping it on the basis of parsimony. It only saves writing 6 extra characters (vars()) and costs one extra (func vs all). And using a custom expectation isn't something that one would do all that often anyway

kinto-b · 2021-02-02T01:02:26Z

I think everything's been addressed here and I've finished going through testdat. Going to close this out. Can put up specific issues if anything further arises

kinto-b added the Discussion Further information is requested label Jan 18, 2021

kinto-b assigned wilcoxa and kinto-b Jan 18, 2021

kinto-b mentioned this issue Jan 21, 2021

expect_similar() fails when comparing variables of different names #17

Closed

kinto-b mentioned this issue Feb 1, 2021

Soft deprecate expect_func #24

Closed

kinto-b closed this as completed Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions #15

Questions #15

kinto-b commented Jan 18, 2021 •

edited

Loading

gorcha commented Jan 20, 2021

kinto-b commented Jan 21, 2021

gorcha commented Jan 22, 2021

kinto-b commented Jan 26, 2021 •

edited

Loading

gorcha commented Jan 27, 2021

kinto-b commented Feb 1, 2021

kinto-b commented Feb 2, 2021

Questions #15

Questions #15

Comments

kinto-b commented Jan 18, 2021 • edited Loading

gorcha commented Jan 20, 2021

kinto-b commented Jan 21, 2021

gorcha commented Jan 22, 2021

kinto-b commented Jan 26, 2021 • edited Loading

gorcha commented Jan 27, 2021

kinto-b commented Feb 1, 2021

kinto-b commented Feb 2, 2021

kinto-b commented Jan 18, 2021 •

edited

Loading

kinto-b commented Jan 26, 2021 •

edited

Loading