series: Improve documentation #1020

N-Parsons · 2017-11-21T14:21:50Z

Resolves #584 by implementing canonical data for series.

I've noticed that lots of implementations expect the output to be a list of lists of integers rather than a list of strings:

However, I would argue that a list of strings is more reasonable, particularly given the current description, which mentions substrings. Some tracks have implemented it this way:

Go
Ruby
Python, subject to series: Update exercise to canonical-data v1.0.0 python#1117 (as a result of Exercise named series: Poor Test Cases ? python#1109)

As a result, this implementation of the canonical data expects the output to be a list of strings.

ErikSchierboom

I think moving to use a string representation is actually a very sensible thing to do. Overall a great effort, with some small nits. Thanks for working on this!

ErikSchierboom · 2017-11-21T15:25:23Z

exercises/series/canonical-data.json

+            "description": "slices of one",
+            "property": "slices",
+            "series": "01234",
+            "slice_length": 1,


We are in the process of enforcing that all keys should be lowerCamelCase (see #987). Could you rename this property to sliceLength (and also in the other cases)?

ErikSchierboom · 2017-11-21T15:26:18Z

exercises/series/canonical-data.json

+        {
+            "description": "slices of one",
+            "property": "slices",
+            "series": "01234",


We are in the processing of ensuring that all input properties are gathered in a single input key (see #996). Could you put the series and slice_length properties in an input field (and also in the other cases)?

ErikSchierboom · 2017-11-21T15:27:09Z

exercises/series/canonical-data.json

+            "expected": ["123", "231", "312", "123"]
+        },
+        {
+            "description": "overly long slice",


Maybe rename to "too long slice"?

ErikSchierboom · 2017-11-21T15:28:27Z

exercises/series/canonical-data.json

+            }
+        },
+        {
+            "description": "overly short slice",


Maybe rename to "slice length cannot be zero" or something like that?

N-Parsons · 2017-11-21T16:04:21Z

Thanks for the feedback @ErikSchierboom! I've made the suggested changes.

I've also changed the indentation level to 2-spaces instead of 4 for consistency with the rest of the repo.

ErikSchierboom · 2017-11-21T17:52:45Z

Looking much better already! I think the next step is to think about the various test cases. Ideally, each subsequent test case should introduce one new concept, that the previous tests didn't cover. And also in ascending difficulty.

As an example, consider the following two test cases:

{
  "description": "slices of three",
  "property": "slices",
  "input": {
    "series": "01234",
    "sliceLength": 3
  },
  "expected": ["012", "123", "234"]
},
...
{
  "description": "order preserved with descending numbers",
  "property": "slices",
  "input": {
    "series": "43210",
    "sliceLength": 3
  },
  "expected": ["432", "321", "210"]
},

Does the second test case actually introduce a new concept that the first test doesn't cover? I don't think it does in this case (the preservation of the ordering is also tested in the first test case). So perhaps we can try to come up with a set of test cases that gradually introduce more complex concepts not covered previously?

Insti · 2017-11-21T22:03:42Z

Looks good.

Should there be a test for a series made up of a single repeating digit? 7777777
Most of the tested series are length 5, what about different lengths?
Maybe start with some really easy cases ("1",1), ("12",1)?

Insti · 2017-11-21T22:03:58Z

https://github.com/exercism/problem-specifications/blob/master/exercises/series/description.md

ErikSchierboom · 2017-11-22T08:00:30Z

One way to come up with test cases is to look at the description. E.g. the description notes:

"the digits need not be numerically consecutive."

So maybe one test could be with consecutive numbers, whereas a later test could test with non-consecutive numbers.

AtelesPaniscus · 2017-11-22T11:43:42Z

One way to come up with test cases is to look at the description.

Hello. It was me who kicked off this review of this exercise. Why ? Because I looked at the description and saw that it said "substrings" and was miffed because it did not occur to me that whoever wrote the test cases for Python must have read a version that read "lists of integers".

I read the description and it said nothing about handling duplicates (not tested), nothing about the order in which results were to be returned and nothing about returning a list.

Ergo: the test cases were expecting a set containing results in no particular order with no duplicates.

Perhaps the choice of return type is, or should be, track specific. I would suggest that all 'generated' READMEs add a track specific note of "what to return unless the generic read me specifies otherwise".

AtelesPaniscus · 2017-11-22T12:14:38Z

(the preservation of the ordering is also tested in the first test case)

It is but on its own does not prove the solution preserves ordering in the general case ... a pass for one test may simply be a coincidence.

I suspect somewhere in the TDD bible it is written that each test case should test only one requirement. Amen. That is not the same as each requirement should have only one test case.

What one test case does is test your assumptions about the solution's implementation. I would suggest that a good test suite makes no such assumptions.

Even two test cases don't prove ordering, they simply reduce the chance of a buggy solution passing all tests.

Consider again the 'set'. Contents may be ordered (I don't assume so). An ordered set converted back to a list might pass one of the two tests case but not both.

It is possible that converting a list (without duplicates) to an unordered set and back again preserves order (because nothing happened to alter the initial order) . This would pass all the original tests.

For an unordered set. In some languages, a duplicate is discarded, in others it replaces the original. That, I expect, would make a difference to the order of results.

The general principle for all systems (road networks for example) including software systems and also test suites for software is that optimisation tends to eliminate redundancy. The corollary of this is that is makes the system 'brittle'. Road networks experience gridlock, test suites pass buggy code.

Redundancy is necessary for robust and, ultimately, secure systems.

AtelesPaniscus · 2017-11-22T12:17:53Z

I've noticed that lots of implementations expect the output to be a list of lists of integers rather than a list of strings:

I suspect this exercise was imported from somewhere and something in the description was lost in the process. After all, there must be a reason the input consists of only digits.

N-Parsons · 2017-12-01T10:14:36Z

Sorry for the delay in getting back to this - I should have some time to look at this again at the weekend.

rpottsoh · 2017-12-15T19:57:07Z

This PR is changing two files. I think the description.md changes should be moved to a separate PR or the title of this PR should be re-worded. It doesn't matter to me which occurs.

ErikSchierboom · 2018-01-04T07:54:36Z

@N-Parsons Small bump. Are you still working on this?

N-Parsons · 2018-01-18T11:56:08Z

Sorry for the delay, @ErikSchierboom - I've been quite ill recently. I'll take another look at this in the next few days.

ErikSchierboom · 2018-01-18T12:16:10Z

@N-Parsons Sorry to hear that! I hope you have recovered fully. Good luck continuing the work on this!

ErikSchierboom

Overall I think this is starting to look very, very nice! The one remaining thing I feel is selecting the correct set of test cases. Ideally, the test cases have increasing difficulty, which I think you handle quite well. Another thing we aim for in the canonical data is to have each test case introduce a new concept/difficulty. I think that the current canonical data doesn't completely adhere to this. I'm hoping for some feedback of @petertseng or @rpottsoh, which often have great insight in these matters.

ErikSchierboom · 2018-02-02T07:12:25Z

exercises/series/canonical-data.json

+      }
+    },
+    {
+      "description": "empty string is invalid even if slice 0",


This is basically a combination of the "empty string is invalid" and "slice length cannot be zero" test cases, so I think this can be removed.

This is basically a combination of the "empty string is invalid" and "slice length cannot be zero" test cases, so I think this can be removed.

Done.

ErikSchierboom · 2018-02-02T07:13:10Z

exercises/series/canonical-data.json

+        "sliceLength": 1
+      },
+      "expected": {
+        "error": "string cannot be empty"


Maybe instead of "string cannot be empty" we could use "series cannot be empty'?

Maybe instead of "string cannot be empty" we could use "series cannot be empty'?

Done.

ErikSchierboom · 2018-02-02T07:14:08Z

exercises/series/canonical-data.json

+      "expected": ["123", "231", "312", "123"]
+    },
+    {
+      "description": "strings can have repeating digits",


Does this test case add anything to the "slices can include duplicates" test case?

These two cases are essentially the same in my opinion.

Does this test case add anything to the "slices can include duplicates" test case?

It seemed to be duplicating efforts. I combined "slices can include duplicates" and "strings can have repeating digits".

rpottsoh · 2018-02-02T20:48:50Z

In the original README the input is described as a string of digits and the output appears to be in integer form. 0540d50 saw fit to redefine the output as strings. What was the reasoning behind this? I see from the initial comment that some tracks seem to have implemented the exercise as described and other tracks didn't. I understand the comment regarding the word substring appearing in the README. Another way that the README could have been changed would have been to replace the word output with the word find. Now I am lead to think the expected output should be in integer form since the sample output is in integer form.

I don't like the word digit in the README. Strings contain characters, not digits. To me, digits are numbers. https://www.eskimo.com/~scs/cclass/progintro/sx6.html

I am not saying that the change to the README should be reversed. I have not completed the exercise myself nor have I implemented it. I think requiring integer output over strings is more difficult (interesting maybe), but only in a trivial way. If we are talking about only strings, why limit the input to only digits? Since the exercise is only interested in digits, why does the input have to be a string? Enough rambling and babbling from me for now.

I'll try to take some time over the weekend and read through the JSON.

Thanks for the vote of confidence @ErikSchierboom 👍

rpottsoh · 2018-02-04T16:10:11Z

exercises/series/canonical-data.json

+        "sliceLength": 2
+      },
+      "expected": ["35"]
+    },


Since we are dealing with characters and not numbers / digits how is "slices of two" any different than "slices of two with nonsequential numbers"?

I don't think this test case adds anything over the previous one.

@ErikSchierboom I agree. Would a slice of 3 from 3 be worth adding to replace one of these two, or am I just beating a dead horse? I am fine with just removing one of the two instances and not adding in a replacement case.

ErikSchierboom · 2018-02-09T15:02:19Z

If we are talking about only strings, why limit the input to only digits? Since the exercise is only interested in digits, why does the input have to be a string? Enough rambling and babbling from me for now.

This captures my thoughts nicely. The current exercise seems to be leaning on two ideas:

Capturing slices of characters from a string.
Capturing a slice of digits from a number.

Although I think the second case is slightly harder (and thus more interesting), I would actually be fine with just going with the first option: capturing a slice of characters from a string. In that case, the test cases can probably be significantly reduced and should also include not just digits but also alphabetic characters (and even other characters like spaces, punctuation, etc.).

petertseng · 2018-02-09T16:16:09Z

I wouldn't object to straying from https://github.com/exercism/problem-specifications/blob/master/exercises/series/metadata.yml because we shouldn't force people on how to solve largest-series-product anyway, so that's fine. On the other hand, that also means I'm not at all interested in this problem standalone.

This PR is changing two files. I think the description.md changes should be moved to a separate PR or the title of this PR should be re-worded. It doesn't matter to me which occurs.

Agreed. If the description.md changes are NOT moved to a separate PR, I only am comfortable with the following arrangement of commits:

one commit "series: clarify description"; 0540d50 is fine as-is
one commit "series: add canonical-data.json"

The reason for this:

clarifying the description and adding canonical data don't depend on each other, so should be different commits
the net effect of canonical-data.json was to add a new file to this repo, so we don't need to see all the drafts that needed changes

I would object to anything else, including without limitation:

squashing the description change together along with the canonical-data addition
multiple commits for canonical-data.json

Feel free to delay rearranging commits until the very end, of course.

petertseng · 2018-02-09T16:33:45Z

The only choices to really make are:

Negative slice length? Probably error. Rather than [].
Slice length too large? [] or error?
Slice length 0? [[]]? [[]] * (series.length + 1)? error?

substitute [[]] with [""] if using strings, of course.

I don't particularly care which choice is made in any of the above cases

ErikSchierboom · 2018-02-22T08:17:47Z

@N-Parsons What do you think about the above comments made by @petertseng?

petertseng · 2018-03-04T01:29:12Z

I have decided to justify each alternative I present.

The only choices to really make are:

Negative slice length?
- Probably should be an error.
- Probably unjustifiable alternative: [], but I'd rather not.
Slice length too large?
- Justifiable alternative: []. There are zero ways to make a slice of length six from five elements, for example.
- Justifiable alternative: error. If we define the acceptable range of inputs to be bounded above at the slice length.
Slice length 0?
- Justifiable alternative: [[]] * (series.length + 1). Consider three elements. There are:
  - One way to make a slice of length three.
  - Two ways to make a slice of length two.
  - Three ways to make a slice of length one.
  - Therefore, to fit the pattern, there should be four slices of length zero in three elements.
- Justifiable alternative: error. If we define the acceptable range of inputs to be bounded below at 1.
- Unjustifiable alternative: []. This implies there are zero ways to make a slice of length zero, which is trivially demonstrable to be false: [] is a slice of length zero, so there must be at least one way to do it.
- Unjustifiable alternative: [[]]. Just as slices("555", 2) results in two instances of [5, 5], so too should slices("555", 0) result in four instances of [], not just one.

substitute [[]] with [""] if using strings, of course.

I don't particularly care which justifiable alternative is chosen in any of the above decisions.

ErikSchierboom · 2018-03-05T07:10:03Z

To me, all three scenarios presented above should result in an error.

coriolinus · 2018-03-05T09:52:17Z

I disagree: exceptions should be exceptional, whether they're implemented as throws or Result monads or whatever. If there exists a justifiable non-error alternative, it is to be preferred over an error. GIGO applies as always, but it's the caller's responsibility to watch for that. In the Rust track, this exercise is implemented such that the function cannot return an error: unsigned inputs mean that there can be no negative slice length, and in the other cases the slice-based return value is used. I believe that this is fundamentally more elegant than designing it to return an error. If we specify here that the function should return errors in these scenarios, that means that the Rust track's implementation of this exercise is not in compliance, and should be changed to a fundamentally worse design in order to conform with the canonical data. I believe that the case testing negative slice length should return an error, as there is no other justifiable return for that case, and languages which support unsigned ints can simply discard the test case. The other two cases should return the slices as described by @petertseng.

…

On Mon, Mar 5, 2018 at 8:10 AM, Erik Schierboom ***@***.***> wrote: To me, all three scenarios presented above should result in an error. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1020 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHdeTgDdqDDDziq34f00KLoYnySuceVoks5tbOTOgaJpZM4Ql6hv> .

ErikSchierboom · 2018-03-20T13:20:05Z

Can we perhaps come to a decision here? I'm fine with any option.

rpottsoh · 2018-03-20T13:44:04Z

I'm fine with any option.

As am I. @N-Parsons what do you think? Have you been able to follow all of the discussion that has been happening since the beginning of February?

ErikSchierboom · 2018-04-13T05:59:47Z

@rpottsoh @petertseng @coriolinus There has not been a response for 24 days. I prefer not to have long standing PR's open. Any ideas on how to proceed?

coriolinus · 2018-04-13T12:04:36Z

My stance remains:

I believe that the case testing negative slice length should return an
error, as there is no other justifiable return for that case, and languages
which support unsigned ints can simply discard the test case. The other two
cases should return the slices as described by @petertseng.

ErikSchierboom · 2018-04-13T12:09:03Z

@coriolinus Okay, clear. After considering those changes, I'm fine with them. However, there being no response for a while, begs the question if we want to wait for someone to work on this PR, or should someone create a new PR?

coriolinus · 2018-04-13T12:14:06Z

I'm fine leaving this open and waiting for someone with interest to come in and implement the requested changes.

petertseng · 2018-04-13T17:55:06Z

I am not able to offer my time to make any changes personally.

In general, from my point of view, I have no qualms about leaving any PR open indefinitely as long as:

our total number of open PRs does not exceed one page (as defined by GitHub pagination)
any PR that I don't need to spend further time on has Changes Requested, because that status is visible from https://github.com/exercism/problem-specifications/pulls.
the branch is on a fork instead of on Exercism's copy of the repo

If some means of clearly indicating that an interested party may work on this PR would make this more attractive to those interested parties, I support applying those means. For example, that might be to either: Label, comment, or close this PR. I don't feel a need to express a preference between these options at this time.

rpottsoh · 2018-04-16T23:43:01Z

I had an hour this evening so I have donated it here. I am not sure that how I have committed my changes is kosher. If there is another / better way I should have gone about contributing to this PR / issue please let me know.

rpottsoh · 2018-04-17T14:30:41Z

exercises/series/description.md

- 4914
- 9142
+- "4914"
+- "9142"

 And if you ask for a 6-digit series from a 5-digit string, you deserve
 whatever you get.


I interpret this last statement to mean that each track can decide how to deal with potentially erroneous input. I know that has been debated to an extent in this PR. If the consent were to treat erroneous input a certain way then this statement should probably be reworded to reflect that decision.

rpottsoh · 2018-04-17T14:40:04Z

At present there are three different error messages related to "sliceLength". Would it be preferred to have one error message that states what the criteria is for a good "sliceLength" or to leave things as they are, pointing out what is wrong with "sliceLength" each time there is an error? I don't really have a strong preference either way.

cmccandless · 2018-04-17T15:34:55Z

Personally, I prefer specific feedback as it makes it easier to immediately see what the current issue is.

If I am playing a game, and I violate a rule, it is not useful for another player who has observed the violation to simply hand me the rule book and tell me "you can't do that" (ignoring the fact this is in itself is a rule in some games). It is much better to cite the violated rule and reference the rules (or, to bring the example back home, the README) for further clarification.

rpottsoh · 2018-04-17T15:45:08Z

Personally, I prefer specific feedback as it makes it easier to immediately see what the current issue is.

Thanks for the feedback @cmccandless.

ErikSchierboom · 2018-04-18T06:50:27Z

exercises/series/canonical-data.json

+        "sliceLength": 2
+      },
+      "expected": ["35"]
+    },


I don't think this test case adds anything over the previous one.

ErikSchierboom · 2018-04-18T06:51:22Z

I think this is looking great, with just one single nit. Thanks for doing this @rpottsoh!

ErikSchierboom · 2018-04-19T09:04:21Z

@petertseng Could you perhaps also review this PR, as you have also given it some thought previously.

petertseng · 2018-04-19T09:07:49Z

Please do not wait for my review, other than I can vouch for the fact that the following code verifies the proposed JSON file:

require 'json'
require_relative '../../verify'

json = JSON.parse(File.read(File.join(__dir__, 'canonical-data.json')))

verify(json['cases'], property: 'slices') { |i, _|
  i['series'].each_char.each_cons(i['sliceLength']).map(&:join).tap { |x|
    raise 'Nope, empty is bad' if x.empty?
  }
}

I have no additional commentary to add at this time.

ErikSchierboom · 2018-04-19T09:16:46Z

Then I find this ready to be merged. Thanks a lot @N-Parsons for starting with this and @rpottsoh for finishing it!

series: Clarify description

0540d50

N-Parsons mentioned this pull request Nov 21, 2017

series: Update exercise to canonical-data v1.0.0 exercism/python#1117

Closed

ErikSchierboom requested changes Nov 21, 2017

View reviewed changes

N-Parsons force-pushed the series-canonical branch 3 times, most recently from 8510ed3 to 97f42dc Compare November 21, 2017 15:58

N-Parsons changed the title ~~series: Implement canonical-data.json~~ series: Implement canonical data and clarify description Feb 1, 2018

ErikSchierboom requested changes Feb 2, 2018

View reviewed changes

rpottsoh reviewed Feb 4, 2018

View reviewed changes

rpottsoh changed the title ~~series: Implement canonical data and clarify description~~ series: Improve documentation of exercise Feb 4, 2018

rpottsoh assigned N-Parsons Feb 4, 2018

rpottsoh force-pushed the series-canonical branch from c4b4d02 to 41c9c6e Compare April 16, 2018 23:25

cmccandless approved these changes Apr 17, 2018

View reviewed changes

rpottsoh reviewed Apr 17, 2018

View reviewed changes

ErikSchierboom requested changes Apr 18, 2018

View reviewed changes

ErikSchierboom approved these changes Apr 18, 2018

View reviewed changes

rpottsoh force-pushed the series-canonical branch from 5cbb311 to 048c7a7 Compare April 18, 2018 22:59

series: Add canonical-data.json

01a7bb3

rpottsoh force-pushed the series-canonical branch from 048c7a7 to 01a7bb3 Compare April 18, 2018 23:00

rpottsoh changed the title ~~series: Improve documentation of exercise~~ series: Improve documentation Apr 18, 2018

ErikSchierboom merged commit 2c43f88 into exercism:master Apr 19, 2018

petertseng mentioned this pull request Apr 19, 2018

series 1.0.0.3: Add duplicates/long series; Expect (n+1) zero-length series; Change test from ab to 01 exercism/haskell#679

Merged

series: Improve documentation #1020

series: Improve documentation #1020

Conversation

N-Parsons commented Nov 21, 2017 • edited Loading

ErikSchierboom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

N-Parsons commented Nov 21, 2017

ErikSchierboom commented Nov 21, 2017

Insti commented Nov 21, 2017

Insti commented Nov 21, 2017

ErikSchierboom commented Nov 22, 2017

AtelesPaniscus commented Nov 22, 2017

AtelesPaniscus commented Nov 22, 2017

AtelesPaniscus commented Nov 22, 2017 • edited Loading

N-Parsons commented Dec 1, 2017

rpottsoh commented Dec 15, 2017

ErikSchierboom commented Jan 4, 2018

N-Parsons commented Jan 18, 2018

ErikSchierboom commented Jan 18, 2018

ErikSchierboom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rpottsoh commented Feb 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ErikSchierboom commented Feb 9, 2018

petertseng commented Feb 9, 2018 • edited Loading

petertseng commented Feb 9, 2018 • edited Loading

ErikSchierboom commented Feb 22, 2018

petertseng commented Mar 4, 2018

ErikSchierboom commented Mar 5, 2018

coriolinus commented Mar 5, 2018 via email

ErikSchierboom commented Mar 20, 2018

rpottsoh commented Mar 20, 2018

ErikSchierboom commented Apr 13, 2018 • edited Loading

coriolinus commented Apr 13, 2018

ErikSchierboom commented Apr 13, 2018

coriolinus commented Apr 13, 2018

petertseng commented Apr 13, 2018

rpottsoh commented Apr 16, 2018 • edited Loading

rpottsoh Apr 17, 2018 • edited Loading

Choose a reason for hiding this comment

rpottsoh commented Apr 17, 2018

cmccandless commented Apr 17, 2018

rpottsoh commented Apr 17, 2018

Choose a reason for hiding this comment

ErikSchierboom commented Apr 18, 2018

ErikSchierboom commented Apr 19, 2018

petertseng commented Apr 19, 2018

ErikSchierboom commented Apr 19, 2018

N-Parsons commented Nov 21, 2017 •

edited

Loading

AtelesPaniscus commented Nov 22, 2017 •

edited

Loading

petertseng commented Feb 9, 2018 •

edited

Loading

petertseng commented Feb 9, 2018 •

edited

Loading

ErikSchierboom commented Apr 13, 2018 •

edited

Loading

rpottsoh commented Apr 16, 2018 •

edited

Loading

rpottsoh Apr 17, 2018 •

edited

Loading