Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we really have skipped tests in the assignments? #856

Closed
etrepum opened this issue Oct 13, 2013 · 25 comments
Closed

Should we really have skipped tests in the assignments? #856

etrepum opened this issue Oct 13, 2013 · 25 comments

Comments

@etrepum
Copy link

etrepum commented Oct 13, 2013

When completing exercises for languages that skip tests, it's easy to forget that you must remove all of the skips. What's the benefit to having these skips at all?

Another reason to remove them would be to make it easier to do continuous integration for our assignments. If we were to do this and leave skips in, we'd have to transform the code in Ruby to remove the lines with skip, in Javascript we'd have to rewrite some functions that start with x, in some other languages we'd have to remove comments (but only the comments that are really code!), etc.

@kytrinyx
Copy link
Member

For people who are very new to programming, it helps simulate the rhythm of TDD, and reduces the noise and panic that come with lots of failures.

I'd be open to hearing more thoughts and anecdotes about the pros and cons of this.

I'm currently playing with an experiment to extract the inputs/outputs to a common format, which all the languages could use to generate test suites. In that case we could pass a flag to generate with or without skips: https://github.com/exercism/xmetadata

@etrepum
Copy link
Author

etrepum commented Oct 14, 2013

I suppose I don't really believe in TDD, at least not in a "pass one unit test at a time" sort of way. In many of the exercises here, a unit test doesn't really encapsulate a feature, just some expectation for a particular input. That said, I don't have much experience teaching people who are very new to programming, and when I have done it they were doing more open ended graphics programming tasks which aren't very easy to unit test.

With regard to xmetadata, I am not convinced that it will be so helpful for all the exercises to try and automatically generate test suites. If we have the test suites parse the JSON at runtime then it will be hard for programmers new to the language to decipher how the test suites work. If we try and generate the test suite code from the JSON, then in many cases it will require multiple custom code generators and templates even for a single language, which significantly raises the barrier to building the paths. On top of that, JSON isn't very good at data types, and in languages other than JavaScript we will want to use idiomatic data types like symbols and classes in Ruby, abstract data types in Haskell, etc. and I don't see an obvious way to encode that kind of thing in a portable way that will result in idiomatic code.

Perhaps it would be much easier to shake out edge cases if property based testing were used. The way this works is that you declaratively specify how tests can be generated and how to verify the result, and the infrastructure can automatically generate specific tests that way. It's used to great effect in the Haskell and Erlang communities with QuickCheck (also available commercially for Erlang) or one of the open source clones such as PropEr or triq. There are ports to other languages of varying quality, but I only have experience with these.

I used this for the octal and trinary test suites because I could very easily test the output using functions from the standard library. Here's an example of what that looks like: https://github.com/kytrinyx/exercism.io/blob/master/assignments/haskell/octal/octal_test.hs
Note that Haskell "cheats" quite a bit here because the type signature is used to encode how to generate the tests, but you can also write code to do this sort of thing when you are testing something more specific than a type. Also at least in the Haskell and Erlang cases, when a failure occurs it can automatically shrink the failing input to a simpler failing input (e.g. a smaller number, a list with fewer elements, a tree with fewer nodes, etc.) so that it's often easier for the developer to understand why it went sideways.

It would be possible to use QuickCheck or something like it to automatically generate these sort of input files for other languages like what xmetadata, but able to scale to generating as many cases as you want with a bit less manual effort.

@kytrinyx
Copy link
Member

For what it's worth, I am not a rabid believer in TDD. The thing that tests help teach new programmers is to take small steps, break big problems down into smaller problems, and know what you're trying to achieve.

Beginners are virtually incapable of reading error messages, and anything more than one thing at a time leaves most of them completely flustered. Hence the skips.

With regard to xmetadata, I am not convinced that it will be so helpful for all the exercises to try and automatically generate test suites.

I don't intend to do it for all the exercises, I just want to see if it is possible to do something that is less haphazard than the current approach.

If we have the test suites parse the JSON at runtime then it will be hard for programmers new to the language to decipher how the test suites work.

Yes, that is not my intention.

If we try and generate the test suite code from the JSON, then in many cases it will require multiple custom code generators and templates even for a single language, which significantly raises the barrier to building the paths.

The way I see it, you have to do work to get an exercise set up in a given language anyway. The thing that bugs me right now is that when we find a new edge case it takes a considerable effort to get this fixed across all languages. If it is based on some set of rules and data, it would be trivial to regenerate a test suite.

Using the metadata would be entirely optional

JSON isn't very good at data types, and in languages other than JavaScript we will want to use idiomatic data types like symbols and classes in Ruby, abstract data types in Haskell, etc. and I don't see an obvious way to encode that kind of thing in a portable way that will result in idiomatic code.

Right, I have no intention of trying to automate the entire thing across languages. I'm just going to see if having some boilerplate or templates (even on a per-exercise basis) might make maintenance easier than it is right now.

Perhaps it would be much easier to shake out edge cases if property based testing were used.

That sounds promising. I have no experience with it, though, so I will need to spend some time reading up on it to have any meaningful discussion about it.

@etrepum
Copy link
Author

etrepum commented Oct 14, 2013

With regard to the undecipherable error messages, what about a friendlier face for the test runners?

This would be a test runner that, by default, will run all of the tests but display only the first error. It would also make a best effort to reformat it in a beginner-friendly way. This could be distributed with or be part of the command-line tool. This way, the tests don't need to be edited when completing an exercise and we won't have any inconsistencies where some test suites/languages skip and others don't.

@kytrinyx
Copy link
Member

Yeah, that would be brilliant. Much, much better than skips etc.

@seanhussey
Copy link

The runner you describe reminds me of the one used in Jim Weirich's Ruby Koans:

http://rubykoans.com
https://github.com/neo/ruby_koans

There's one command to run. When you first run it (without having completed any koans), the output looks like this:

$ rake test
/Users/shussey/.rvm/rubies/ruby-2.0.0-p247/bin/ruby path_to_enlightenment.rb
AboutAsserts#test_assert_truth has damaged your karma.

The Master says:
  You have not yet reached enlightenment.

The answers you seek...
  Failed assertion.

Please meditate on the following code:
  /Users/shussey/Downloads/koans/about_asserts.rb:10:in `test_assert_truth'

mountains are merely mountains
your path thus far [X_________________________________________________] 0/282
$

It's nicely colored red and green, which doesn't show up here.

Anyway, what about something like that?

@kytrinyx
Copy link
Member

That would be perfect! I'll take a look at how they've implemented that. Thanks :)

@rsslldnphy
Copy link

Having worked through a number of the Ruby exercises with a group very new to Ruby, having the skips has been brilliant - it would have been almost unworkable without them (other than by adding them in manually).

However, I'm not sure having a more sophisticated & friendlier runner would be better - one of the great things about having just plain tests with skips is that it gets people used to seeing and acting on real test output, and it encourages them to look at and understand the test code itself while removing the skips. Maybe a friendlier test runner for the first few exercises (as you kind of already have with the first error with Ruby Bob), then a migration path to real tests?

@kytrinyx
Copy link
Member

kytrinyx commented Jan 2, 2014

@rsslldnphy - what if the test runner was just a fail early type thing? You get the same output, but you only get one failure (the first one).

@rsslldnphy
Copy link

Hmm, what's the advantage over skips? I like that having to remove skips encourages you to interact with the test file a bit more, and my conjecture is that this is more likely to lead to people thinking about tests as an integral part of writing the code, rather than the test runner just being "how exercism works". If you know what I mean.

I know it's only a tiny thing, and I'm only guessing, but my bet is that just editing a file a few times to remove the word skip is going to make you more comfortable with that type of file than just reading it would.

@kytrinyx
Copy link
Member

kytrinyx commented Jan 2, 2014

I never thought of it that way. I would be willing to do things differently on a language-by-language basis. In particular, I suspect that a lot more people completely new to programming will do the ruby exercises than the haskell exercises, so perhaps skips make inherently more sense in Ruby (and perhaps JavaScript) than in the other languages for the reasons you mention.

@kytrinyx
Copy link
Member

It's been a while since we had this discussion.

My current take on this is that it should be decided on a language-by-language basis, with the default suggestion being to have it when starting out, unless there's a good argument against it.

I'm going to close the issue, but I'm always open to taking another look at this with a fresh perspective if someone wants to discuss it further.

@katrinleinweber
Copy link

@kytrinyx: Please have another look on this issue. I'm collecting a few items of recent personal experience from 1st-time contact with the kotlin & ruby tracks, after having helped maintain r where we never considered skipping tests until now. And with r being common in the sciences, it probably attracts mostly programming novices.

  1. inconsistency between tracks => confusing for learners
  2. needing to maintain docu about that (failed in the 2nd track I tried: Document the need to remove the @Ignore from other tests in each exercise kotlin#130)
  3. adding complexity to maintainer workflows (https://github.com/exercism/rust/pull/361/files)
  4. learners having to menially remove skip or @Ignore keywords OR overwrite them en mass
  5. learners overlooking inactive tests and submitting incomplete solutions

Each consequence is not a big deal in itself, but in aggregate they seem... I don't know... Removed from the reality towards which learners should approach, no? Is that worth avoiding an (admittedly) intimidating barrage of failures?

Apologies for any long-windedness and in case this sounds slightly ranting(ly?). I intend this list as a comparison to the advantages described above, and to thing about alternative compromises to reduce any intimidation.

  • The custom runner idea seems neatest, but has that happened?

  • Can we maybe reorder the tests, so that always the next useful failure is printed at the end of the console output. In combination with telling learners not to scroll around but focus on one testcase at a time?

  • Or how about throwing our weight as a larger learner community into the ring and submit PRs to the languages or testing tools (or wherever the failure messages are defined) that improve the situation for everyone? For example: we could help implement flags a la --reverse-order-report, so that the outcome of the previous bullet point is achieved on a higher level than exercism.

@iHiD iHiD reopened this Oct 5, 2018
@iHiD
Copy link
Member

iHiD commented Oct 5, 2018

what if the test runner was just a fail early type thing? You get the same output, but you only get one failure (the first one).

So I think it's essential that there is only one test that fails each time. I personally like the whole skip approach. I also like the "fail early" test runner that achieves the same thing. However, we would need to write that for each language and that feels maybe like a lot of work?

I've reopened the discussion as a lot has changed over time and it's worth rechecking the thinking from 4 yrs ago :)

@katrinleinweber
Copy link

Thanks for continuing the discussion :-)

@j2jensen
Copy link

j2jensen commented Nov 5, 2018

My two cents:

  1. I didn't notice anything in the initial instructions about "how to run tests" (C# or Python track) that mentioned I would need to change the test code to activate the skipped tests, and I submitted my Leap exercise in C# before noticing that tests were being skipped. One of the first Leap submissions I mentored included the tests in their submission, and they hadn't removed the Skips from their tests. So that's two out of the two data points I've seen saying it's not extremely obvious that there's an extra step you're expected to take.
  2. The Tests tab in the mentoring interface shows the tests with Skips in them, and if you download a user's solution the tests have skips in them. So in order to do the most basic check that all the tests are passing, mentors have to perform a manual step with every single submission. I'll admit that I resorted to eyeballing the code and assuming they passed all the tests if the logic looked right to me, but there's a decent chance that neither the submitter nor the mentor is actually checking that all tests pass before approving a submission.
  3. If we want to promote TDD, it seems like we should encourage users to write their own tests, separate from the ones we're using to "grade" them by. I don't think that's what we're going for, though.
  4. If people want to follow a TDD-ish pattern without writing their own tests, they're free to do so. The test results are reported in order, so they can just write a simple solution and watch one test pass, then improve it and watch the next test pass, and so on.
  5. If people don't want to follow a TDD-ish pattern, then we're forcing them to take an extra, tedious step to un-skip all the tests. I doubt most users are actually un-skipping one test at a time: they're probably just going into the test file and removing all the Skips before running the tests.
  6. When I came here looking for a discussion about skipping, I noticed that the most recent Issue posted was Javascript leap, 4 tests be skipped. #4455 . I imagine we'll be getting a lot of duplicate issues like this, because having the tests skipped does not appear to be the expected behavior for most users.

So while I recognize the intent in encouraging people to play around with the test class, I think mentioning that they are encouraged to do so in the instructions might be a better approach. And while I understand the desire to avoid hitting people with a barrage of failed test cases right off the bat, I don't think it's that big a deal compared to the down-sides.

@iHiD
Copy link
Member

iHiD commented Nov 5, 2018

I think it's important to note that tracks are different here. Ruby has a very explicit hello-world that talks you through unskipping the tests and explains the process. Many other tracks don't. You can see that in action here.

I doubt most users are actually un-skipping one test at a time: they're probably just going into the test file and removing all the Skips before running the tests.

So I absolutely unskip one test at a time and this whole process was the thing that meant I fell in love with Exercism in the first place. It was really nice just to be solving sequential mini-steps. Again, this might be a language thing. I've not had anyone I've mentored complain about this and maybe one person (out of lots) submit code that doesn't pass the tests - maybe this is because testing is so deeply ingrained in Ruby compared to other languages? I don't even download and run the code in Ruby, because failing tests just aren't an issue I experience.

The test results are reported in order.

Again, this is language dependant. In some worlds, having 20 failing tests could result in pages of text flying by, which is totally intimidating. I'm also not sure that results are reported in order. Having tests report out-of-order is very standard in Ruby - I'm not sure what the settings are for minitest by default now-a-days. I'm very much for a fail-fast test runner, but as I said above, that feels like a lot of work.

I didn't notice anything in the initial instructions about "how to run tests" (C# or Python track) that mentioned I would need to change the test code to activate the skipped tests, and I submitted my Leap exercise in C# before noticing that tests were being skipped.

I think this is the real nub of the issue. If we're not educating people well, then they won't know what to do. So we need to make this a lot clearer somehow.

@iHiD
Copy link
Member

iHiD commented Nov 5, 2018

I appreciate your well considered thoughts btw. I should have started by saying that.

@j2jensen
Copy link

j2jensen commented Nov 6, 2018

Likewise, I'm glad you shared your experiences coming from a mostly ruby perspective. It helps me to get a better feel for things. I'll chat with some of the mentors on the C# track and see if they'd prefer to improve the documentation up front or switch to non-skipped tests.

@iHiD
Copy link
Member

iHiD commented Nov 6, 2018

Re the documentation, this is something we should probably bake into the actual site better, rather than rely on individual tracks to get right. I don't know how we do that though.

I would recommend trying the hello world in ruby, as if you'd not done exercism before. Just download and run the test file and see what happens. It's quite a good process.

@iHiD
Copy link
Member

iHiD commented Nov 6, 2018

I also like the "fail early" test runner that achieves the same thing. However, we would need to write that for each language and that feels maybe like a lot of work?

@kytrinyx What are your thoughts on us spiking this for one or two languages as a prototype to see if it improves things for those languages. If so, we could then have it as a project that the OSS community can roll out for each language?

@katrinleinweber
Copy link

[…] In some worlds, having 20 failing tests could result in pages of text flying by, which is totally intimidating.

That situation could still be used to as training to focus on one failure at a time.

Having tests report out-of-order is very standard in Ruby - I'm not sure what the settings are for minitest by default now-a-days.

Still parallel apparently, but it seems possible to force a sorting. We could prepend z-a to the test names to have the first test be reported last on the console.

@kytrinyx
Copy link
Member

I'm not sure what the settings are for minitest by default now-a-days. I'm very much for a fail-fast test runner, but as I said above, that feels like a lot of work.

For minitest you have several options, none of them are appropriate for us. Default is "run everything in random order". You can set the tests to be in a deterministic order, but that order happens to be "alphabetical by test name". There is no fail fast.

A few years ago I wrote a plugin for minitest (it was gross, I don't think I even published it) which would allow you to keep the order as "whatever was defined in the file", and which also did fail fast.

I hesitate to suggest that we write our own dependency for something like this, because it means we have to maintain it forever and I am already feeling the weight of maintaining existing code... on the other hand I am convinced that having a way to fail fast in the order defined would be a significantly better user experience than the skipping, which is fraught with all sorts of user experience hurdles.

We could prepend z-a to the test names to have the first test be reported last on the console.

That's a good point. This would work for minitest too, which means that if zenspider would be ok with a flag or something that sets fail-fast, we would all set in Ruby.

@kenliu
Copy link

kenliu commented Mar 25, 2019

Personally I really like that the tests are skipped by default on the ruby track, and it's not very cumbersome to unskip them one at a time (ctrl-/ to uncomment in my editor).

@blueglyph
Copy link

blueglyph commented May 30, 2019

Redirected here from Errors in: Leap #301

I'm with the OP on that one.

TDD is an interesting concept but it's not didactic, doesn't encourage someone to focus on a problem but instead to 'mess around' with code until it finally (or hopefuly?) passes the tests. Whereas a specification of a problem to solve gives a direct understanding of what must be done, and the developer has all the data to think of an architecture and execute it properly.

Also in regards with TDD, what if the test suite are incomplete? Will the developer be happy with himself not using a bit of common sense, but following tests and releasing an incomplete product?

As for the 'noise' mentioned in the 2nd post, that's usually how it works in a real job. You get partial information with a lot of noise, from several people with different priorities, and it's up to you to crystallize a specification and validate it with the stakeholders. But this is another story. Giving a clear problem description is no noise, I encourage you to look at the problems on Hackerrank for a good example. If the problem is not clear or incomplete, they tune it after the feedback, the end result is rather clean and efficient.

They also go one step further (which may answer other concerns in this thread), and hide the actual tests except the first obvious ones. When you encounter a failure, you have to think about what is missing instead of hacking your solution from the test code. It helps with one major problem people have today, they don't pay attention to the problem/task description.

Deducing code from a problem description is also how most interviews are conducted. Watch at the Google interviews, they give a problem, let the person explain how to solve it (the architecture is elaborated), write the code (execution), then add more constraints to the problem to push in one or another direction (Agile-like thinking).

So in the end, I see no positive reason to use TDD, especially in a learning environment. And certainly not with a step-by-step approach, which really seems a recipe for chaotic results: how can you possibly choose the correct tools without knowing the whole story, and by guessing from partial results?

Regarding the test suite, it seems from the discussions that it has two purposes, 1) to let the student verify the answer, and 2) to check the exercise module is working in non-regression tests. Either way, it better be complete from the start, ignoring some of the tests can only lead to problems (but that's already been said).

My 2 cents, I see that it has been discussed at length already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants