-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we really have skipped tests in the assignments? #856
Comments
For people who are very new to programming, it helps simulate the rhythm of TDD, and reduces the noise and panic that come with lots of failures. I'd be open to hearing more thoughts and anecdotes about the pros and cons of this. I'm currently playing with an experiment to extract the inputs/outputs to a common format, which all the languages could use to generate test suites. In that case we could pass a flag to generate with or without skips: https://github.com/exercism/xmetadata |
I suppose I don't really believe in TDD, at least not in a "pass one unit test at a time" sort of way. In many of the exercises here, a unit test doesn't really encapsulate a feature, just some expectation for a particular input. That said, I don't have much experience teaching people who are very new to programming, and when I have done it they were doing more open ended graphics programming tasks which aren't very easy to unit test. With regard to xmetadata, I am not convinced that it will be so helpful for all the exercises to try and automatically generate test suites. If we have the test suites parse the JSON at runtime then it will be hard for programmers new to the language to decipher how the test suites work. If we try and generate the test suite code from the JSON, then in many cases it will require multiple custom code generators and templates even for a single language, which significantly raises the barrier to building the paths. On top of that, JSON isn't very good at data types, and in languages other than JavaScript we will want to use idiomatic data types like symbols and classes in Ruby, abstract data types in Haskell, etc. and I don't see an obvious way to encode that kind of thing in a portable way that will result in idiomatic code. Perhaps it would be much easier to shake out edge cases if property based testing were used. The way this works is that you declaratively specify how tests can be generated and how to verify the result, and the infrastructure can automatically generate specific tests that way. It's used to great effect in the Haskell and Erlang communities with QuickCheck (also available commercially for Erlang) or one of the open source clones such as PropEr or triq. There are ports to other languages of varying quality, but I only have experience with these. I used this for the octal and trinary test suites because I could very easily test the output using functions from the standard library. Here's an example of what that looks like: https://github.com/kytrinyx/exercism.io/blob/master/assignments/haskell/octal/octal_test.hs – It would be possible to use QuickCheck or something like it to automatically generate these sort of input files for other languages like what xmetadata, but able to scale to generating as many cases as you want with a bit less manual effort. |
For what it's worth, I am not a rabid believer in TDD. The thing that tests help teach new programmers is to take small steps, break big problems down into smaller problems, and know what you're trying to achieve. Beginners are virtually incapable of reading error messages, and anything more than one thing at a time leaves most of them completely flustered. Hence the skips.
I don't intend to do it for all the exercises, I just want to see if it is possible to do something that is less haphazard than the current approach.
Yes, that is not my intention.
The way I see it, you have to do work to get an exercise set up in a given language anyway. The thing that bugs me right now is that when we find a new edge case it takes a considerable effort to get this fixed across all languages. If it is based on some set of rules and data, it would be trivial to regenerate a test suite. Using the metadata would be entirely optional
Right, I have no intention of trying to automate the entire thing across languages. I'm just going to see if having some boilerplate or templates (even on a per-exercise basis) might make maintenance easier than it is right now.
That sounds promising. I have no experience with it, though, so I will need to spend some time reading up on it to have any meaningful discussion about it. |
With regard to the undecipherable error messages, what about a friendlier face for the test runners? This would be a test runner that, by default, will run all of the tests but display only the first error. It would also make a best effort to reformat it in a beginner-friendly way. This could be distributed with or be part of the command-line tool. This way, the tests don't need to be edited when completing an exercise and we won't have any inconsistencies where some test suites/languages skip and others don't. |
Yeah, that would be brilliant. Much, much better than skips etc. |
The runner you describe reminds me of the one used in Jim Weirich's Ruby Koans: http://rubykoans.com There's one command to run. When you first run it (without having completed any koans), the output looks like this:
It's nicely colored red and green, which doesn't show up here. Anyway, what about something like that? |
That would be perfect! I'll take a look at how they've implemented that. Thanks :) |
Having worked through a number of the Ruby exercises with a group very new to Ruby, having the skips has been brilliant - it would have been almost unworkable without them (other than by adding them in manually). However, I'm not sure having a more sophisticated & friendlier runner would be better - one of the great things about having just plain tests with skips is that it gets people used to seeing and acting on real test output, and it encourages them to look at and understand the test code itself while removing the skips. Maybe a friendlier test runner for the first few exercises (as you kind of already have with the first error with Ruby Bob), then a migration path to real tests? |
@rsslldnphy - what if the test runner was just a fail early type thing? You get the same output, but you only get one failure (the first one). |
Hmm, what's the advantage over skips? I like that having to remove skips encourages you to interact with the test file a bit more, and my conjecture is that this is more likely to lead to people thinking about tests as an integral part of writing the code, rather than the test runner just being "how exercism works". If you know what I mean. I know it's only a tiny thing, and I'm only guessing, but my bet is that just editing a file a few times to remove the word |
I never thought of it that way. I would be willing to do things differently on a language-by-language basis. In particular, I suspect that a lot more people completely new to programming will do the ruby exercises than the haskell exercises, so perhaps |
It's been a while since we had this discussion. My current take on this is that it should be decided on a language-by-language basis, with the default suggestion being to have it when starting out, unless there's a good argument against it. I'm going to close the issue, but I'm always open to taking another look at this with a fresh perspective if someone wants to discuss it further. |
@kytrinyx: Please have another look on this issue. I'm collecting a few items of recent personal experience from 1st-time contact with the
Each consequence is not a big deal in itself, but in aggregate they seem... I don't know... Removed from the reality towards which learners should approach, no? Is that worth avoiding an (admittedly) intimidating barrage of failures? Apologies for any long-windedness and in case this sounds slightly ranting(ly?). I intend this list as a comparison to the advantages described above, and to thing about alternative compromises to reduce any intimidation.
|
So I think it's essential that there is only one test that fails each time. I personally like the whole I've reopened the discussion as a lot has changed over time and it's worth rechecking the thinking from 4 yrs ago :) |
Thanks for continuing the discussion :-) |
My two cents:
So while I recognize the intent in encouraging people to play around with the test class, I think mentioning that they are encouraged to do so in the instructions might be a better approach. And while I understand the desire to avoid hitting people with a barrage of failed test cases right off the bat, I don't think it's that big a deal compared to the down-sides. |
I think it's important to note that tracks are different here. Ruby has a very explicit hello-world that talks you through unskipping the tests and explains the process. Many other tracks don't. You can see that in action here.
So I absolutely unskip one test at a time and this whole process was the thing that meant I fell in love with Exercism in the first place. It was really nice just to be solving sequential mini-steps. Again, this might be a language thing. I've not had anyone I've mentored complain about this and maybe one person (out of lots) submit code that doesn't pass the tests - maybe this is because testing is so deeply ingrained in Ruby compared to other languages? I don't even download and run the code in Ruby, because failing tests just aren't an issue I experience.
Again, this is language dependant. In some worlds, having 20 failing tests could result in pages of text flying by, which is totally intimidating. I'm also not sure that results are reported in order. Having tests report out-of-order is very standard in Ruby - I'm not sure what the settings are for minitest by default now-a-days. I'm very much for a fail-fast test runner, but as I said above, that feels like a lot of work.
I think this is the real nub of the issue. If we're not educating people well, then they won't know what to do. So we need to make this a lot clearer somehow. |
I appreciate your well considered thoughts btw. I should have started by saying that. |
Likewise, I'm glad you shared your experiences coming from a mostly ruby perspective. It helps me to get a better feel for things. I'll chat with some of the mentors on the C# track and see if they'd prefer to improve the documentation up front or switch to non-skipped tests. |
Re the documentation, this is something we should probably bake into the actual site better, rather than rely on individual tracks to get right. I don't know how we do that though. I would recommend trying the hello world in ruby, as if you'd not done exercism before. Just download and run the test file and see what happens. It's quite a good process. |
@kytrinyx What are your thoughts on us spiking this for one or two languages as a prototype to see if it improves things for those languages. If so, we could then have it as a project that the OSS community can roll out for each language? |
That situation could still be used to as training to focus on one failure at a time.
Still parallel apparently, but it seems possible to force a sorting. We could prepend |
For minitest you have several options, none of them are appropriate for us. Default is "run everything in random order". You can set the tests to be in a deterministic order, but that order happens to be "alphabetical by test name". There is no fail fast. A few years ago I wrote a plugin for minitest (it was gross, I don't think I even published it) which would allow you to keep the order as "whatever was defined in the file", and which also did fail fast. I hesitate to suggest that we write our own dependency for something like this, because it means we have to maintain it forever and I am already feeling the weight of maintaining existing code... on the other hand I am convinced that having a way to fail fast in the order defined would be a significantly better user experience than the skipping, which is fraught with all sorts of user experience hurdles.
That's a good point. This would work for minitest too, which means that if zenspider would be ok with a flag or something that sets fail-fast, we would all set in Ruby. |
Personally I really like that the tests are skipped by default on the ruby track, and it's not very cumbersome to unskip them one at a time ( |
Redirected here from Errors in: Leap #301 I'm with the OP on that one. TDD is an interesting concept but it's not didactic, doesn't encourage someone to focus on a problem but instead to 'mess around' with code until it finally (or hopefuly?) passes the tests. Whereas a specification of a problem to solve gives a direct understanding of what must be done, and the developer has all the data to think of an architecture and execute it properly. Also in regards with TDD, what if the test suite are incomplete? Will the developer be happy with himself not using a bit of common sense, but following tests and releasing an incomplete product? As for the 'noise' mentioned in the 2nd post, that's usually how it works in a real job. You get partial information with a lot of noise, from several people with different priorities, and it's up to you to crystallize a specification and validate it with the stakeholders. But this is another story. Giving a clear problem description is no noise, I encourage you to look at the problems on Hackerrank for a good example. If the problem is not clear or incomplete, they tune it after the feedback, the end result is rather clean and efficient. They also go one step further (which may answer other concerns in this thread), and hide the actual tests except the first obvious ones. When you encounter a failure, you have to think about what is missing instead of hacking your solution from the test code. It helps with one major problem people have today, they don't pay attention to the problem/task description. Deducing code from a problem description is also how most interviews are conducted. Watch at the Google interviews, they give a problem, let the person explain how to solve it (the architecture is elaborated), write the code (execution), then add more constraints to the problem to push in one or another direction (Agile-like thinking). So in the end, I see no positive reason to use TDD, especially in a learning environment. And certainly not with a step-by-step approach, which really seems a recipe for chaotic results: how can you possibly choose the correct tools without knowing the whole story, and by guessing from partial results? Regarding the test suite, it seems from the discussions that it has two purposes, 1) to let the student verify the answer, and 2) to check the exercise module is working in non-regression tests. Either way, it better be complete from the start, ignoring some of the tests can only lead to problems (but that's already been said). My 2 cents, I see that it has been discussed at length already. |
When completing exercises for languages that skip tests, it's easy to forget that you must remove all of the skips. What's the benefit to having these skips at all?
Another reason to remove them would be to make it easier to do continuous integration for our assignments. If we were to do this and leave skips in, we'd have to transform the code in Ruby to remove the lines with skip, in Javascript we'd have to rewrite some functions that start with x, in some other languages we'd have to remove comments (but only the comments that are really code!), etc.
The text was updated successfully, but these errors were encountered: