Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate expect_similar() #18

Closed
kinto-b opened this issue Jan 21, 2021 · 9 comments
Closed

Deprecate expect_similar() #18

kinto-b opened this issue Jan 21, 2021 · 9 comments
Assignees
Labels
Discussion Further information is requested Importance: ❗❗ Important

Comments

@kinto-b
Copy link
Contributor

kinto-b commented Jan 21, 2021

Suppose we compare the following,

df1 <- data.frame(key = 1:100, binom = rbinom(100, 1, 0.5))
df2 <- data.frame(key = 1:100, binom = rbinom(100, 1, 0.9))
testdat::expect_similar(binom, df2, binom, data = df1, min = 0)

How it currently works

What does expect_similar() do internally? First it does this

data_tb <- data %>% group_by(!!var) %>% summarise(freq = n())
data2_tb <- data2 %>% group_by(!!var2) %>% summarise(freq = n())

which yields

Browse[2]> data_tb
# A tibble: 2 x 2
  binom  freq
  <int> <int>
1     0    56
2     1    44
Browse[2]> data2_tb
# A tibble: 2 x 2
  binom  freq
  <int> <int>
1     0     8
2     1    92

Then it does this

by_var <- structure(as_name(var), names = as_name(var2))
act$result <-
left_join(data_tb, data2_tb, by = by_var) %>%
mutate(prop_diff = abs(.data$freq.x - .data$freq.y) / .data$freq.x,
pass = .data$prop_diff < threshold | .data$freq.x < min)

which yields

Browse[2]> act$result
# A tibble: 2 x 5
  binom freq.x freq.y prop_diff pass 
  <int>  <int>  <int>     <dbl> <lgl>
1     0     56      8     0.857 FALSE 
2     1     44     92     1.09  FALSE 

How I thought it would work

But I was expecting it to run the comparison in this way:

 data_tb  <- data  %>% group_by(!!var)  %>% summarise(freq = n(), .groups = "drop") %>% mutate(prop = freq/sum(freq))
 data2_tb <- data2 %>% group_by(!!var2) %>% summarise(freq = n(), .groups = "drop") %>% mutate(prop = freq/sum(freq))

  by_var <- structure(as_name(var), names = as_name(var2))
  act$result <-
    left_join(data_tb, data2_tb, by = by_var) %>%
    mutate(prop_diff = abs(.data$prop.x - .data$prop.y),
           pass = .data$prop_diff < threshold | .data$freq.x < min | .data$freq.y < min)

which yields

Browse[2]> data_tb
# A tibble: 2 x 3
  binom  freq  prop
  <int> <int> <dbl>
1     0    56  0.56
2     1    44  0.44

Browse[2]> data2_tb
# A tibble: 2 x 3
  binom  freq  prop
  <int> <int> <dbl>
1     0     8  0.08
2     1    92  0.92

Browse[2]> act$result
# A tibble: 2 x 7
  binom freq.x prop.x freq.y prop.y prop_diff pass 
  <int>  <int>  <dbl>  <int>  <dbl>     <dbl> <lgl>
1     0     56   0.56      8   0.08      0.48 FALSE
2     1     44   0.44     92   0.92      0.48 FALSE
@kinto-b kinto-b added the Discussion Further information is requested label Jan 21, 2021
@gorcha
Copy link
Collaborator

gorcha commented Jan 22, 2021

To answer the issue title - no :P

This was a hacky rough attempt to incorporate the original frequency (to allow more leeway for high frequency responses, since for e.g. a jump from 2 to 12 percent is usually very different in checking than a jump from 72 to 82).

Should probably use an actual statistical measure instead.

@kinto-b
Copy link
Contributor Author

kinto-b commented Jan 22, 2021

Yeah, good point.

What about chisq.test()?

@kinto-b kinto-b added this to the testdat 0.2.0 milestone Feb 2, 2021
@kinto-b kinto-b added the Importance: ❗❗ Important label Feb 8, 2021
@kinto-b
Copy link
Contributor Author

kinto-b commented Feb 10, 2021

Coming back to this, chisq.test() is very sensitive to differences. For instance:

df1 <- data.frame(
  a = sample(1:5, 100000, TRUE),
  b = sample(c(rep(1:5, 5), 1:3), 100000, TRUE),
  c = sample(c(rep(1:5, 25), 1:3), 100000, TRUE),
  d = sample(c(rep(1:5, 125), 1:3), 100000, TRUE)  
 )

df1 %>% group_by(level = a) %>% summarise(n_a = n()) %>% 
  left_join(
    df1 %>% group_by(level = b) %>% summarise(n_b = n()), "level"
  ) %>% 
  left_join(
    df1 %>% group_by(level = c) %>% summarise(n_c = n()), "level"
  ) %>% 
  left_join(
    df1 %>% group_by(level = d) %>% summarise(n_d = n()), "level"
  )

#> # A tibble: 5 x 5
#>   level   n_a   n_b   n_c   n_d
#>   <int> <int> <int> <int> <int>
#> 1     1 20180 21234 20241 20026
#> 2     2 19932 21382 20398 20159
#> 3     3 19820 21494 20255 20050
#> 4     4 20031 17956 19654 19905
#> 5     5 20037 17934 19452 19860

chisq.test(table(df1$a), p = table(df1$b), rescale.p = TRUE)

#> 	Chi-squared test for given probabilities
#> 
#> data:  table(df1$a)
#> X-squared = 767.42, df = 4, p-value < 2.2e-16

chisq.test(table(df1$a), p = table(df1$c), rescale.p = TRUE)

#> 	Chi-squared test for given probabilities
#> 
#> data:  table(df1$a)
#> X-squared = 44.997, df = 4, p-value = 3.982e-09

chisq.test(table(df1$a), p = table(df1$d), rescale.p = TRUE)

#> 	Chi-squared test for given probabilities
#> 
#> data:  table(df1$a)
#> X-squared = 8.7539, df = 4, p-value = 0.06755

If we consider the p-values, we would never say that a is similar to b or c. But to my eyes c looks very similar to a.

We could potentially use the chi-squared statistic directly. But a small chi-square value doesn't necessarily indicate that the distributions are similar, only that we can't confidently tell them apart: that could be because they are similar or because there aren't many data points.

@kinto-b
Copy link
Contributor Author

kinto-b commented Mar 1, 2021

Spoke to Andrew about this and he recommended using chi-square.

I'm now questioning this function. It might be best to leave similarity testing to users given that it's fairly hairy

@kinto-b
Copy link
Contributor Author

kinto-b commented Mar 30, 2021

@wilcoxa @tonoplast RE: similarity testing discussion this morning

@kinto-b
Copy link
Contributor Author

kinto-b commented Apr 7, 2021

@wilcoxa I'm leaning towards deprecating this one. I've slapped an experimental badge on it anyway.

Once this is sorted out, 0.2.0 will be ready to review/merge/release. I think the other issues that are currently outstanding can wait

@kinto-b kinto-b mentioned this issue Apr 7, 2021
@wilcoxa
Copy link
Contributor

wilcoxa commented Apr 8, 2021

@kinto-b Yeah experimental is good for the moment - worth keeping track of the progress made at the very least. There's definitely a need for this type of function, but needs some work to make this useful in the wild.

@kinto-b
Copy link
Contributor Author

kinto-b commented Sep 9, 2021

@gorcha I think this one should be deprecated before CRAN-ing

@gorcha
Copy link
Collaborator

gorcha commented Sep 10, 2021

Yep, agreed

@kinto-b kinto-b changed the title Is the expect_similar() comparison the best? Deprecate expect_similar() Sep 10, 2021
gorcha added a commit that referenced this issue Sep 13, 2021
@gorcha gorcha closed this as completed in d12eaa8 Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Further information is requested Importance: ❗❗ Important
Projects
None yet
Development

No branches or pull requests

4 participants