Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Testing 'invalid' input. #902

Open
Insti opened this issue Sep 16, 2017 · 12 comments
Open

Discussion: Testing 'invalid' input. #902

Insti opened this issue Sep 16, 2017 · 12 comments

Comments

@Insti
Copy link
Contributor

Insti commented Sep 16, 2017

"Given a DNA string, compute how many times each nucleotide occurs in the string."

-- nucleotide-count description

This issue is a discussion about what kind of invalid test input it is appropriate to have as part of the canonical-data for an exercise.

Part of the PR "nucleotide-count: refactoring tests (discussion)" (#895) proposed adding tests for various "error" input cases, null input, strings that contain non ascii characters and other tests that involve strings containing various other non-DNA characters.

In the discussion that arose around that, @Vankog wrote: #895 (comment)

But as far as I experienced it, test suits should check for edge cases. null checks are especially important (at least in almost all languages).
In every educational piece of material the advice is not to only check for usual inputs, but for all possible (equivalent kinds of) inputs. null checks, empty checks and others are always on top of the example lists given. Just take any arbitrary course about QA and this will be part of it.

I think this is especially important, because we are in an educational context. The users should learn from the getgo what it means to write (arguably good) tests (first), because it is still a major problem, even upon seasoned programmers.
From my own business experience I know that new programmers are completely lost when it comes to writing tests. But also experienced programmers fall in the traps of "I'll do it later", "This is a trivial method", "I can't test this", "There are too many dependencies" etc. Just take any talk or tutorial about writing tests or even TDD and you will see the same old counter-arguments or questions again and again from the comments or the audience. This has a reason: Writing good tests is hard! Very hard to be exact and it is a skill that needs exercise. And where better to begin than right from the start?

I agree that writing good tests is hard. One of the traps that is often fallen in to, is testing too much, and the skill is in knowing where to draw the line between what is important to test and what is not important to test.

Part of the challenge of programming is designing clean and self contained abstractions so that you have to worry about as few things as possible at any one time.

The exercises on Exercism help by taking away a lot of the ambiguity and uncertainty by providing small, self contained and well defined exercises, for which the 'interesting' part of the problem is implementing the algorithm to solve a specific problem, rather than working out what the problem is and splitting it up.

In this case, where the problem is "counting nucleotides in a DNA string" it is OK to assume that all the input to the function will be valid DNA strings, and testing against things that are NOT valid DNA strings is inappropriate.

Think of it at it as there was another step prior to every problem that insures that the input is valid. In this case:

Unvalidated input -> DNA Parser -> nucleotide-count

We should be able to trust that the DNA Parser is doing its job and turning whatever it gets as input into a valid DNA string before passing it to nucleotide-count.

Input validation and String parsing are legitimate problems on their own, but they should be split out rather than mixed in to every problem.

I encourage the creation of a 'string-cleaning' exercise that takes all kinds of wacky input values and ensures that the result is a clean string.

But these tests do not belong in nucleotide-count, or any of the other exercises that happen to take strings as input.

See also: #428 where we discussed whether it was appropriate for every test that handled strings to also have to deal with strings that contained non-ascii characters.

@Vankog
Copy link
Contributor

Vankog commented Sep 16, 2017

Thanks for opening a dedicated discussion. :-)

I understand your point:

  • Users should focus on the task at hand instead of worrying about minor details.
  • Checking for the same input errors over and over again is tedious and boring.

I really understand this. Sometimes I want to do a puzzle for the puzzle's sake and don't care about edge cases that might never occur.

Now, let me set some ground work first. Here are some quotes from the exercism homepage:

Level up your programming skills
Download and solve practice problems in over 30 different languages.
Submit the solution to the site for feedback (beta).
For code newbies and experienced programmers.

First: "Level up your programming skills"
This implies that you learn something new and get better significantly.

Second: "solve practice problems"
These problems are for practice purposes. Which means they should provide a challenge, but also provide a lesson to be learned.

Third: "[get] feedback"
Meaning, you might get ideas on how to improve from others.
BUT: This is currently a joke to be honest. Nobody cares about other submissions at the moment. I tried to comment on at least three other submissions for the first few challenges. A reply almost never came and feedback for my own submissions never came at all.
SO: In reality you can't improve through feedback, but only through the given practices at hand.

Fourth: "For code newbies and experienced programmers"
So the practices should provide enough input to be of value for new and seasoned programmers as well.

And one last thing:

Are you hooked on clean code? [...]
The exercises encourage you to explore trade-offs and best-practices.


OK, now that we are on the same page, let me provide you my point of view:

Exercism uses TDD in a broad sense. It is not TDD from the book, because we provide the tests, one at a time, and the user should complete the cycle by making it green and refactoring. That's fine.

TDD is generally considered a good practice nowadays. It gets more and more traction. However, I'd wager most of the seasoned programmers and nearly all of new programmers are still inexperienced when it comes to writing particularly good tests, especially so when it comes to writing tests first. For example, it is an ongoing and everlasting topic in almost every developer conference.

This has reasons: On the one hand, the benefits to a project, architecture, craft and business are fact.
On the other hand, it is still underutilized, because it is very, very hard to do well. Not least due to the traditional distinction between requirements engineers, implementing engineers and QA engineers that gets more and more softened.

This means:

  • It needs practice and commitment.
  • Besides coding knowledge, it also needs basics in Quality Assurance. This includes knowledge about what to test, how much to test, what a Critical Path is, what a unit test in comparison to an integration test etc. is, what equivalence classes are, what good testing design patterns and practices are (like proxy pattern, dependency injection/reversion, mocking, pure functions/methods...) and so on.
  • And it needs basics in Requirement Engineering. This includes how to write unambiguous specifications, how to make sure that all stakeholders and developers have a common understanding, what parameters are valid, and what behaviour is explicitly wished for, while also keeping an eye on special cases.

Now, we provide programming exercises for new and experienced programmers. This includes at least to provide interesting challenges to let the users increase their problem solving skills.
But it should also include lessons in how to write good tests (first) and what tests a specification should include. Giving an example about how real TDD can be accomplished.

As I already said, TDD or writing tests in general is a notoriously underutilized method for new and experienced programmers as well. So providing good examples and well defined test specifications is the least we can do teach programmers of all levels about it.
New programmers should be in contact with well defined tests and specs from the beginning, while seasoned programmers need new impulses to improve upon their old habits.

So now the core question:
Should we provide tests for input validation?

Well, just look at some exercise submissions randomly and you will see something interesting:

  • Beginners (or at least code that looks like being from beginners) usually don't care about input handling. They trust in the specified inputs of the test alone. Within the context of these exercises this is generally fine. However, I think everyone agrees that in a real projects input handling and edge cases do matter a lot. Newbies should learn about these common traps and should be motivated to make a habit out of this. To be clear: I don't mean with full-blown Unicode or Date/Time-madness or alike, but common sense traps like null pointers or invalid inputs or wrong (primitive) types in general. So we need in general a mechanism to specify which inputs are valid and which not. Thus leading beginners to a mindset that involves such cases.
  • Seasoned Programmers on the other hand usually already handle invalid inputs and common mistakes in their implementations.

But I said "usually". From experience in production, I know that implementation of such checks greatly depend on the dev's mood and the depth of the current implementation level. The deeper a method is buried, the higher the confidence grows that a null check is probably not necessary. On the contrary: The more complex a project is, the fear to hit a null pointer exception grows, because "who knows if this methods already checks for it, so better we check it here as well...". This does not occur with well documented pieces of software that tell exactly what they expect, what they can handle and what they return. Everyone who ever read such a JDoc for example, is grateful to have this information.
You can't always assume in daily programming life, that inputs are always valid. Sure, you can add a validator in front. But this only helps as long as the validator is in place. Code is modular and will be reused and recomposed. With this in mind you need to specify where a validation takes places. It does not have to be in the current implementation, but it has to be somewhere. However, our exercises don't have such a guaranteed layer.

So, I urge you to make these constraints explicit:
-> It helps new programmers increase their awareness of such checks.
-> Seasoned programmers usually do them already, so we could as well make them explicit.
-> User's implementations of these tests are usually very trivial and do not need much effort to pass. So it is unlikely it will get boring or tedious too fast.
-> Well defined inputs and outputs and edge case handling improve confidence in their usage and in a user's implementation. We want to promote such behaviour.
-> Showing what test suites generally should test for in daily work, helps new users and seasoned developers get into TDD or writing tests in general. More often than not test suites are testing unnecessary stuff while missing important cases.

So I have got two suggestions:

  • Either: Explicitly specify guaranteed input and output constraints in all the exercise descriptions.
    Something like: "Assume a guaranteed input that consists of at least one x of type foo and is not greater than n in length."
    It's easy to add and users know exactly what they can expect.

  • Or: Well... If already considering the first suggestion, why not explicitly specify this as a test instead? It is not much more code than the sentence above, but more precise and has the benefits mentioned in the previous paragraph.

To be clear:
I don't intend to introduce complex input validation tests and error handlers. But general ones like:

  • checking for null,
  • checking for empty object/list/etc.,
  • checking for logically unexpected values or types,
  • checking for over- and underflows (especially important for security reasons as well)

And if so, just let them throw an error for simplicity sake.
Easy and fast tests with easy and simple implementations.

e.g. For the nucleotide-count exercise: DNA strands and RNA strands consist of one different nucleotide. We want to check DNA strands only, so an RNA strand should throw an error, as well as all other inputs that are not exactly A, a, C, c, G, g or T, t.

You don't need tests or checks for special unicode characters in particular, because these are usually invalid anyways, but it does not hurt to have them neither. Such test cases are usually so immensely fast that it does not matter if you have them executed or not.
e.g. In the pangram exercise: Just the user's implementation of some form of check for a-zA-Z can invalidate all other characters, unicode included with one line of code. No need of special handling, just add a test for some invalid chars, including unicode and its fine.

However, string handling, proper escaping of forms, date and time etc. should be part of a separate exercise.

@petertseng
Copy link
Member

Unfortunately the only guidance I found on this was in https://github.com/exercism/docs/blob/master/you-can-help/improve-exercise-metadata.md#extracting-canonical-test-data

All tests should be essential to the problem.

So, if we need some better documentation on what is or is not essential, it would be appropriate.

I was once told not to add a case in #287 (comment).

I know that if I am told a function has a documented precondition as its contract, I feel no obligation to test what happens if the precondition is violated. If our problem descriptions need more documented preconditions, that would be helpful in any case since #869 alleges (and I agree) that the number of descriptions that are lacking is non-zero.

@Vankog
Copy link
Contributor

Vankog commented Sep 16, 2017

Interesting.
So I am quoting @IanWhitney here from your reference:

I don't think we need to test empty input. I've definitely seen some exercises suffer from a case of Edge-itis, where we try to test every weird edge case. Some edges can be interesting, but it also helps to remember that we're not writing production-ready code here.

and the read from @derifatives is also a nice one.

To summarize we have two... maybe three problems:

  1. The specs (either prose or test specs) are far from good descriptions.
  2. There is confusion about what context should be assumed for the exercises. Just a playground? Production-level code? Non-production-level code? Educational pragmatism? Educational dogmatism?
  3. (The main problem about what to test, results from these.)

@NobbZ
Copy link
Member

NobbZ commented Sep 16, 2017

First things first: I'm tired as hell and will only read the full thread after I got some hours of sleep, but I want to give a quickshot right now:

In general I do consider error handling a good thing which should be tested, but in the given example the purpose of the exercise is counting occurences of a letter in a string, where the string is only allowed to four different characters. So either we restrict ourselfs only to valid input or check only for strings which do not contain those simple characters.


null input,

In some languages not possible at all, and in my opinion, we should assume some safetynet in the early languages. Also those languages that constantly deal with null-errors should are free to add them as necessary.

strings that contain non ascii characters

We agreed to restrict ourself to 7bit US-ASCII a while ago because all those exercises that dealt with characters outside of this range were subject of questions and problems due to difficulties getting the encoding in the editor/language/library right. We considered 7bit ASCII as a safe subset of the most commonly used encodings. We also decided that we could create exercises that deal with Unicode and/or I18n and/or L21n later on as the need arises.

and other tests that involve strings containing various other non-DNA characters.

In my opinion a single test for an invalid DNA and for an invalid nucleotide were sufficient.

@Vankog
Copy link
Contributor

Vankog commented Sep 17, 2017

@NobbZ

So either we restrict ourselfs only to valid input or check only for strings which do not contain those simple characters.

That's the question. :-)

null input,

In some languages not possible at all, and in my opinion, we should assume some safetynet in the early languages. Also those languages that constantly deal with null-errors should are free to add them as necessary.

I am not aware how many languages do not have a null-concept or something alike. But I think only a few. Following this logic we also have to remove all the "expected: error" specs, because not all languages have a concept to throw exceptions. Generalization is good for canonical data, but hard to fully comply with. If a canonical test is not applicable to very few tracks, then it just can be omitted by them, can't they?

We agreed to restrict ourself to 7bit US-ASCII a while ago

Correct, that's out of question.
(Well, to be precise, not exactly as you said. The discussion agreed upon not to use non-ASCII characters in tests (unless integral). So the full 8bit-ASCII should be considered in my understanding: #428 (comment))

@NobbZ
Copy link
Member

NobbZ commented Sep 17, 2017

So the full 8bit-ASCII should be considered in my understanding:

Unless otherwise specified, I do read ASCII as 7bit, since this is what it originally was. The 8 bit version is called extended for a reason.

I am not aware how many languages do not have a null-concept or something alike

Haskell, Rust, OCaml, Erlang, Elixir, maybe other.

Even those that do have a null value do treat it different in their idioms…

In go it seems as if null is often used synonym with the corresponding zero-value, especially when dealing with arrays, slices or strings.

In other languages I've seen it as "please insert your default value here" when calling a function or the presence of a computation error/argument error when returned from a function.

So in other languages we do not even have strict typing to enforce an input string, shall we therefore do create a canonical test that throws when given an integer when strings are expected?

In some languages we could decide to even use the typesystem of the language to keep out invalid input and create a datatype date Nucleotide = A | C | G | T. Again this would make much of the tests obselete…

Therefore, as I said, the canonical data should only contain a small limited set of test data which deals with correct input and expectations, while handling errornous input should be in the responsibility of the track.

Another reason why it should be in the tracks responsibility are different idioms and possibilities of error signaling. In go we have a multi-return and return errors as a value when they occur or nil if not. In Haskell, Rust, OCaml, Elixir, Erlang we have the habbit of returning Maybe, Either, Result, Some, :ok, and :error tuples or whatnot. In Java and Ruby we throw or raise exceptions (which we do in erlang/elixir as well if we consider an error irrecoverable). In C we return a magic value which is documented (I hope) and also do sometimes even some global state which gives numerical and human readable info about the error.

@Vankog
Copy link
Contributor

Vankog commented Sep 17, 2017

So you say we should indeed test for invalid inputs, correct?
But delegate null checks and expected error handling to the tracks, right?

@Insti
Copy link
Contributor Author

Insti commented Sep 17, 2017

@NobbZ wrote:

... the canonical data should only contain a small limited set of test data which deals with correct input and expectations, while handling erroneous input should be in the responsibility of the track.

Well said, I agree with this.

@Vankog wrote:

So you say we should indeed test for invalid inputs, correct?

(incorrect) I did not interpret what Nobbz said this way.

But delegate null checks and expected error handling to the tracks, right?

It's not about delegating, the tracks choose to use the canonical data to aid the writing of their own test suites. They are also free to not use the canonical data at all, and to add and/or change tests which are present in the canonical data.
So even the tests you are proposing are considered relevant for a particular track, they do not belong as part of the canonical data.

@Vankog
Copy link
Contributor

Vankog commented Sep 17, 2017

I think you misunderstood me ^^

... handling erroneous input should be in the responsibility of the track.

means in my eyes exactly what I inferred here (see annotations):

So you say we should indeed test for invalid inputs [meaning: in general, not necessarily in canonical], correct? But delegate [meaning: entrusting the tracks to do it] null checks and expected error handling to the tracks, right?

The word "delegate" was not particularly well chosen.


the tracks choose to use the canonical data to aid the writing of their own test suites. They are also free to not use the canonical data at all, and to add and/or change tests which are present in the canonical data.

That's what I thought, too. However, I was asked to change the canonical tests first, when I proposed to change the track tests. Only language specific tests should be the exemption.

@Dysp
Copy link
Contributor

Dysp commented Sep 28, 2017

I have no expertise as to discuss the programming part of the problem, but I do want to point out something not quite as obvious and almost as unimportant to the problem as it can be...

DNA is double stranded. If you are trying to count the amount of nucleotides in DNA, you most consider that each nucleotide exists only in pair. Each time you have T, you also have A on the opposing strand.
If the problem is 'count how many times each nucleotide occurs in the DNA string', then you are underestimating the total nucleotide count by half.
This makes the problem more complex, so I think it would be easier to just specificy that the problem is about counting in a single strand.

@rpottsoh
Copy link
Member

@Dysp thanks for pointing this out.

so I think it would be easier to just specificy that the problem is about counting in a single strand.

I think the problem as is OK as is. But, that doesn't mean anything, I might just be feeling lazy. 😃 If you have not already, take a look at its description.md and see what you think. If you think the wording could be improved we would all be very grateful for any changes you might suggest. A PR would be the best way to go about this. Let us know if you think there could be improvements made to the wording and if you need any help with the PR process.

@Dysp
Copy link
Contributor

Dysp commented Sep 28, 2017

I created a PR: #913
Let me know if you need any further help (or if you disagree or want me to clarify something).
Also feel free to fix my potential grammar errors. English isn't my first language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants