Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression versus failure #834

Closed
lionel- opened this issue Dec 28, 2018 · 9 comments · Fixed by #906
Closed

Regression versus failure #834

lionel- opened this issue Dec 28, 2018 · 9 comments · Fixed by #906
Labels
feature a feature request or enhancement wip work in progress

Comments

@lionel-
Copy link
Member

lionel- commented Dec 28, 2018

This is about better support for tests that should not cause R CMD check failures because they are testing output that is not under the developer's control, at least not entirely. For instance, rlang has a lot of expect_known_output() tests for the backtraces. These outputs sometimes change across R versions, e.g. because eval() produces a different backtrace. Another case is vdiffr tests, because the appearance of plots is sensitive to upstream code such as computation of margins or spacing between elements.

Such tests should fail on platforms where the CI envvar is set (Travis, Appveyor), or where the NOT_CRAN envvar is set (tests run locally). This allows the developer to monitor and assess regressions during development.

@hadley suggested calling these tests regression tests. They could be implemented with a new expectation class expectation_regression.

@jimhester
Copy link
Member

jimhester commented Dec 28, 2018

I don't really see the advantage to what you describe over using skip_on_cran() for these blocks.

@lionel-
Copy link
Member Author

lionel- commented Dec 28, 2018

This is what I planned to do initially but Hadley suggested using a special expectation. Perhaps it is about giving more structure to this class of expectation?

Is NOT_CRAN set on Appveyor as well (I'm assuming it is set on Travis since you suggest it)?

@hadley
Copy link
Member

hadley commented Dec 28, 2018

I think there's a class of expectations that should be ignored on CRAN by default (i.e. regression tests). I don't have any strong feelings about how it should be implemented.

@lionel-
Copy link
Member Author

lionel- commented Dec 28, 2018

If these tests are mostly about outputs, it would be useful to send a structured expectation with before/after fields. We could then provide tools to review and validate regressions similar to what we have in vdiffr.

@jimhester
Copy link
Member

Would it make sense to use the standard testing machinery, but name the test files different? e.g. reg-xyz.R and these tests only get run when NOT_CRAN is set, or some other regression test specific switch?

@lionel-
Copy link
Member Author

lionel- commented Dec 28, 2018

Interesting idea. Advantages:

  • It works with all expectations.
  • It would be easy to move a test from one file to the other to change its semantics.

Disadvantages:

  • Most users would probably just keep using the test- files, even for things that should clearly be in reg- files.
  • This would multiply the number of files in the test folders. This could be a problem in ggplot2 which has already many files, though I guess that's an extreme case. Could also be mitigated by having support for setup, helper, teardown, and reg folders.

About NOT_CRAN, I think using a different skip that checks for both NOT_CRAN || CI would be better, because r-appveyor is documented not to set NOT_CRAN by default.

@lionel-
Copy link
Member Author

lionel- commented Dec 30, 2018

How about using a special block instead of a different file?

see_that("this thing looks like this", {
  expect_true(...)

  expect_known_output(...)

  vdiffr::expect_doppelganger(...)
})

@hadley
Copy link
Member

hadley commented Dec 31, 2018

Oh yeah, I like that idea too. Maybe check_that()?

@lionel-
Copy link
Member Author

lionel- commented Jan 6, 2019

skip_on_cran() (or similar) might not be right because we still need the code to be evaluated. If the code throws an unexpected error, this should be a hard failure.

The motivation for this is that while we cannot always control how an exact output evolves over time, the code generating the output should be robust and always work.

@hadley hadley added the feature a feature request or enhancement label Mar 28, 2019
hadley added a commit that referenced this issue Jul 18, 2019
@hadley hadley added the wip work in progress label Jul 18, 2019
hadley added a commit that referenced this issue Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement wip work in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants