Deprecate expect_similar() #18

kinto-b · 2021-01-21T01:16:43Z

Suppose we compare the following,

df1 <- data.frame(key = 1:100, binom = rbinom(100, 1, 0.5))
df2 <- data.frame(key = 1:100, binom = rbinom(100, 1, 0.9))
testdat::expect_similar(binom, df2, binom, data = df1, min = 0)

How it currently works

What does expect_similar() do internally? First it does this

testdat/R/expect-datacomp.R

Lines 33 to 34 in 9f3ef72

    
           data_tb  <- data  %>% group_by(!!var)  %>% summarise(freq = n()) 
        
           data2_tb <- data2 %>% group_by(!!var2) %>% summarise(freq = n())

which yields

Browse[2]> data_tb
# A tibble: 2 x 2
  binom  freq
  <int> <int>
1     0    56
2     1    44
Browse[2]> data2_tb
# A tibble: 2 x 2
  binom  freq
  <int> <int>
1     0     8
2     1    92

Then it does this

testdat/R/expect-datacomp.R

Lines 36 to 40 in 9f3ef72

    
           by_var <- structure(as_name(var), names = as_name(var2)) 
        
           act$result <- 
        
             left_join(data_tb, data2_tb, by = by_var) %>% 
        
             mutate(prop_diff = abs(.data$freq.x - .data$freq.y) / .data$freq.x, 
        
                    pass = .data$prop_diff < threshold | .data$freq.x < min)

which yields

Browse[2]> act$result
# A tibble: 2 x 5
  binom freq.x freq.y prop_diff pass 
  <int>  <int>  <int>     <dbl> <lgl>
1     0     56      8     0.857 FALSE 
2     1     44     92     1.09  FALSE

How I thought it would work

But I was expecting it to run the comparison in this way:

 data_tb  <- data  %>% group_by(!!var)  %>% summarise(freq = n(), .groups = "drop") %>% mutate(prop = freq/sum(freq))
 data2_tb <- data2 %>% group_by(!!var2) %>% summarise(freq = n(), .groups = "drop") %>% mutate(prop = freq/sum(freq))

  by_var <- structure(as_name(var), names = as_name(var2))
  act$result <-
    left_join(data_tb, data2_tb, by = by_var) %>%
    mutate(prop_diff = abs(.data$prop.x - .data$prop.y),
           pass = .data$prop_diff < threshold | .data$freq.x < min | .data$freq.y < min)

which yields

Browse[2]> data_tb
# A tibble: 2 x 3
  binom  freq  prop
  <int> <int> <dbl>
1     0    56  0.56
2     1    44  0.44

Browse[2]> data2_tb
# A tibble: 2 x 3
  binom  freq  prop
  <int> <int> <dbl>
1     0     8  0.08
2     1    92  0.92

Browse[2]> act$result
# A tibble: 2 x 7
  binom freq.x prop.x freq.y prop.y prop_diff pass 
  <int>  <int>  <dbl>  <int>  <dbl>     <dbl> <lgl>
1     0     56   0.56      8   0.08      0.48 FALSE
2     1     44   0.44     92   0.92      0.48 FALSE

The text was updated successfully, but these errors were encountered:

gorcha · 2021-01-22T03:47:59Z

To answer the issue title - no :P

This was a hacky rough attempt to incorporate the original frequency (to allow more leeway for high frequency responses, since for e.g. a jump from 2 to 12 percent is usually very different in checking than a jump from 72 to 82).

Should probably use an actual statistical measure instead.

kinto-b · 2021-01-22T05:04:28Z

Yeah, good point.

What about chisq.test()?

kinto-b · 2021-02-10T04:27:54Z

Coming back to this, chisq.test() is very sensitive to differences. For instance:

df1 <- data.frame(
  a = sample(1:5, 100000, TRUE),
  b = sample(c(rep(1:5, 5), 1:3), 100000, TRUE),
  c = sample(c(rep(1:5, 25), 1:3), 100000, TRUE),
  d = sample(c(rep(1:5, 125), 1:3), 100000, TRUE)  
 )

df1 %>% group_by(level = a) %>% summarise(n_a = n()) %>% 
  left_join(
    df1 %>% group_by(level = b) %>% summarise(n_b = n()), "level"
  ) %>% 
  left_join(
    df1 %>% group_by(level = c) %>% summarise(n_c = n()), "level"
  ) %>% 
  left_join(
    df1 %>% group_by(level = d) %>% summarise(n_d = n()), "level"
  )

#> # A tibble: 5 x 5
#>   level   n_a   n_b   n_c   n_d
#>   <int> <int> <int> <int> <int>
#> 1     1 20180 21234 20241 20026
#> 2     2 19932 21382 20398 20159
#> 3     3 19820 21494 20255 20050
#> 4     4 20031 17956 19654 19905
#> 5     5 20037 17934 19452 19860

chisq.test(table(df1$a), p = table(df1$b), rescale.p = TRUE)

#> 	Chi-squared test for given probabilities
#> 
#> data:  table(df1$a)
#> X-squared = 767.42, df = 4, p-value < 2.2e-16

chisq.test(table(df1$a), p = table(df1$c), rescale.p = TRUE)

#> 	Chi-squared test for given probabilities
#> 
#> data:  table(df1$a)
#> X-squared = 44.997, df = 4, p-value = 3.982e-09

chisq.test(table(df1$a), p = table(df1$d), rescale.p = TRUE)

#> 	Chi-squared test for given probabilities
#> 
#> data:  table(df1$a)
#> X-squared = 8.7539, df = 4, p-value = 0.06755

If we consider the p-values, we would never say that a is similar to b or c. But to my eyes c looks very similar to a.

We could potentially use the chi-squared statistic directly. But a small chi-square value doesn't necessarily indicate that the distributions are similar, only that we can't confidently tell them apart: that could be because they are similar or because there aren't many data points.

kinto-b · 2021-03-01T00:51:32Z

Spoke to Andrew about this and he recommended using chi-square.

I'm now questioning this function. It might be best to leave similarity testing to users given that it's fairly hairy

kinto-b · 2021-03-30T01:46:57Z

@wilcoxa @tonoplast RE: similarity testing discussion this morning

kinto-b · 2021-04-07T04:04:17Z

@wilcoxa I'm leaning towards deprecating this one. I've slapped an experimental badge on it anyway.

Once this is sorted out, 0.2.0 will be ready to review/merge/release. I think the other issues that are currently outstanding can wait

wilcoxa · 2021-04-08T00:18:53Z

@kinto-b Yeah experimental is good for the moment - worth keeping track of the progress made at the very least. There's definitely a need for this type of function, but needs some work to make this useful in the wild.

kinto-b · 2021-09-09T23:25:58Z

@gorcha I think this one should be deprecated before CRAN-ing

gorcha · 2021-09-10T01:01:21Z

Yep, agreed

kinto-b added the Discussion Further information is requested label Jan 21, 2021

kinto-b assigned wilcoxa and kinto-b Jan 21, 2021

kinto-b added this to the testdat 0.2.0 milestone Feb 2, 2021

kinto-b added the Importance: ❗❗ Important label Feb 8, 2021

kinto-b mentioned this issue Apr 7, 2021

0.2.0 #31

Merged

kinto-b unassigned wilcoxa Apr 8, 2021

paddytobias added the Requires further information label May 3, 2021

kinto-b modified the milestones: testdat 0.2.0, CRAN Sep 9, 2021

kinto-b removed the Requires further information label Sep 9, 2021

kinto-b changed the title ~~Is the expect_similar() comparison the best?~~ Deprecate expect_similar() Sep 10, 2021

gorcha added a commit that referenced this issue Sep 13, 2021

Soft deprecate expect_similar(). Close #18

d2b1d1c

gorcha mentioned this issue Sep 13, 2021

Updates for CRAN submission #45

Merged

gorcha closed this as completed in d12eaa8 Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate expect_similar() #18

Deprecate expect_similar() #18

kinto-b commented Jan 21, 2021

gorcha commented Jan 22, 2021

kinto-b commented Jan 22, 2021 •

edited

Loading

kinto-b commented Feb 10, 2021

kinto-b commented Mar 1, 2021

kinto-b commented Mar 30, 2021

kinto-b commented Apr 7, 2021

wilcoxa commented Apr 8, 2021

kinto-b commented Sep 9, 2021 •

edited

Loading

gorcha commented Sep 10, 2021

Deprecate expect_similar() #18

Deprecate expect_similar() #18

Comments

kinto-b commented Jan 21, 2021

How it currently works

How I thought it would work

gorcha commented Jan 22, 2021

kinto-b commented Jan 22, 2021 • edited Loading

kinto-b commented Feb 10, 2021

kinto-b commented Mar 1, 2021

kinto-b commented Mar 30, 2021

kinto-b commented Apr 7, 2021

wilcoxa commented Apr 8, 2021

kinto-b commented Sep 9, 2021 • edited Loading

gorcha commented Sep 10, 2021

kinto-b commented Jan 22, 2021 •

edited

Loading

kinto-b commented Sep 9, 2021 •

edited

Loading