Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

series: Improve documentation #1020

Merged
merged 2 commits into from
Apr 19, 2018

Conversation

N-Parsons
Copy link
Contributor

@N-Parsons N-Parsons commented Nov 21, 2017

Resolves #584 by implementing canonical data for series.

I've noticed that lots of implementations expect the output to be a list of lists of integers rather than a list of strings:

However, I would argue that a list of strings is more reasonable, particularly given the current description, which mentions substrings. Some tracks have implemented it this way:

As a result, this implementation of the canonical data expects the output to be a list of strings.

Copy link
Member

@ErikSchierboom ErikSchierboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving to use a string representation is actually a very sensible thing to do. Overall a great effort, with some small nits. Thanks for working on this!

"description": "slices of one",
"property": "slices",
"series": "01234",
"slice_length": 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the process of enforcing that all keys should be lowerCamelCase (see #987). Could you rename this property to sliceLength (and also in the other cases)?

{
"description": "slices of one",
"property": "slices",
"series": "01234",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the processing of ensuring that all input properties are gathered in a single input key (see #996). Could you put the series and slice_length properties in an input field (and also in the other cases)?

"expected": ["123", "231", "312", "123"]
},
{
"description": "overly long slice",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to "too long slice"?

}
},
{
"description": "overly short slice",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to "slice length cannot be zero" or something like that?

@N-Parsons N-Parsons force-pushed the series-canonical branch 3 times, most recently from 8510ed3 to 97f42dc Compare November 21, 2017 15:58
@N-Parsons
Copy link
Contributor Author

Thanks for the feedback @ErikSchierboom! I've made the suggested changes.

I've also changed the indentation level to 2-spaces instead of 4 for consistency with the rest of the repo.

@ErikSchierboom
Copy link
Member

Looking much better already! I think the next step is to think about the various test cases. Ideally, each subsequent test case should introduce one new concept, that the previous tests didn't cover. And also in ascending difficulty.

As an example, consider the following two test cases:

{
  "description": "slices of three",
  "property": "slices",
  "input": {
    "series": "01234",
    "sliceLength": 3
  },
  "expected": ["012", "123", "234"]
},
...
{
  "description": "order preserved with descending numbers",
  "property": "slices",
  "input": {
    "series": "43210",
    "sliceLength": 3
  },
  "expected": ["432", "321", "210"]
},

Does the second test case actually introduce a new concept that the first test doesn't cover? I don't think it does in this case (the preservation of the ordering is also tested in the first test case). So perhaps we can try to come up with a set of test cases that gradually introduce more complex concepts not covered previously?

@Insti
Copy link
Contributor

Insti commented Nov 21, 2017

Looks good.

Should there be a test for a series made up of a single repeating digit? 7777777
Most of the tested series are length 5, what about different lengths?
Maybe start with some really easy cases ("1",1), ("12",1)?

@ErikSchierboom
Copy link
Member

One way to come up with test cases is to look at the description. E.g. the description notes:

"the digits need not be numerically consecutive."

So maybe one test could be with consecutive numbers, whereas a later test could test with non-consecutive numbers.

@AtelesPaniscus
Copy link

One way to come up with test cases is to look at the description.

Hello. It was me who kicked off this review of this exercise. Why ? Because I looked at the description and saw that it said "substrings" and was miffed because it did not occur to me that whoever wrote the test cases for Python must have read a version that read "lists of integers".

I read the description and it said nothing about handling duplicates (not tested), nothing about the order in which results were to be returned and nothing about returning a list.

Ergo: the test cases were expecting a set containing results in no particular order with no duplicates.

Perhaps the choice of return type is, or should be, track specific. I would suggest that all 'generated' READMEs add a track specific note of "what to return unless the generic read me specifies otherwise".

@AtelesPaniscus
Copy link

(the preservation of the ordering is also tested in the first test case)

It is but on its own does not prove the solution preserves ordering in the general case ... a pass for one test may simply be a coincidence.

I suspect somewhere in the TDD bible it is written that each test case should test only one requirement. Amen. That is not the same as each requirement should have only one test case.

What one test case does is test your assumptions about the solution's implementation. I would suggest that a good test suite makes no such assumptions.

Even two test cases don't prove ordering, they simply reduce the chance of a buggy solution passing all tests.

Consider again the 'set'. Contents may be ordered (I don't assume so). An ordered set converted back to a list might pass one of the two tests case but not both.

It is possible that converting a list (without duplicates) to an unordered set and back again preserves order (because nothing happened to alter the initial order) . This would pass all the original tests.

For an unordered set. In some languages, a duplicate is discarded, in others it replaces the original. That, I expect, would make a difference to the order of results.

The general principle for all systems (road networks for example) including software systems and also test suites for software is that optimisation tends to eliminate redundancy. The corollary of this is that is makes the system 'brittle'. Road networks experience gridlock, test suites pass buggy code.

Redundancy is necessary for robust and, ultimately, secure systems.

@AtelesPaniscus
Copy link

AtelesPaniscus commented Nov 22, 2017

I've noticed that lots of implementations expect the output to be a list of lists of integers rather than a list of strings:

I suspect this exercise was imported from somewhere and something in the description was lost in the process. After all, there must be a reason the input consists of only digits.

@N-Parsons
Copy link
Contributor Author

Sorry for the delay in getting back to this - I should have some time to look at this again at the weekend.

@rpottsoh
Copy link
Member

This PR is changing two files. I think the description.md changes should be moved to a separate PR or the title of this PR should be re-worded. It doesn't matter to me which occurs.

@ErikSchierboom
Copy link
Member

@N-Parsons Small bump. Are you still working on this?

@N-Parsons
Copy link
Contributor Author

Sorry for the delay, @ErikSchierboom - I've been quite ill recently. I'll take another look at this in the next few days.

@ErikSchierboom
Copy link
Member

@N-Parsons Sorry to hear that! I hope you have recovered fully. Good luck continuing the work on this!

@N-Parsons N-Parsons changed the title series: Implement canonical-data.json series: Implement canonical data and clarify description Feb 1, 2018
Copy link
Member

@ErikSchierboom ErikSchierboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is starting to look very, very nice! The one remaining thing I feel is selecting the correct set of test cases. Ideally, the test cases have increasing difficulty, which I think you handle quite well. Another thing we aim for in the canonical data is to have each test case introduce a new concept/difficulty. I think that the current canonical data doesn't completely adhere to this. I'm hoping for some feedback of @petertseng or @rpottsoh, which often have great insight in these matters.

}
},
{
"description": "empty string is invalid even if slice 0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically a combination of the "empty string is invalid" and "slice length cannot be zero" test cases, so I think this can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically a combination of the "empty string is invalid" and "slice length cannot be zero" test cases, so I think this can be removed.

Done.

"sliceLength": 1
},
"expected": {
"error": "string cannot be empty"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of "string cannot be empty" we could use "series cannot be empty'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of "string cannot be empty" we could use "series cannot be empty'?

Done.

"expected": ["123", "231", "312", "123"]
},
{
"description": "strings can have repeating digits",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test case add anything to the "slices can include duplicates" test case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two cases are essentially the same in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test case add anything to the "slices can include duplicates" test case?

It seemed to be duplicating efforts. I combined "slices can include duplicates" and "strings can have repeating digits".

@rpottsoh
Copy link
Member

rpottsoh commented Feb 2, 2018

In the original README the input is described as a string of digits and the output appears to be in integer form. 0540d50 saw fit to redefine the output as strings. What was the reasoning behind this? I see from the initial comment that some tracks seem to have implemented the exercise as described and other tracks didn't. I understand the comment regarding the word substring appearing in the README. Another way that the README could have been changed would have been to replace the word output with the word find. Now I am lead to think the expected output should be in integer form since the sample output is in integer form.

I don't like the word digit in the README. Strings contain characters, not digits. To me, digits are numbers. https://www.eskimo.com/~scs/cclass/progintro/sx6.html

I am not saying that the change to the README should be reversed. I have not completed the exercise myself nor have I implemented it. I think requiring integer output over strings is more difficult (interesting maybe), but only in a trivial way. If we are talking about only strings, why limit the input to only digits? Since the exercise is only interested in digits, why does the input have to be a string? Enough rambling and babbling from me for now.

I'll try to take some time over the weekend and read through the JSON.

Thanks for the vote of confidence @ErikSchierboom 👍

"sliceLength": 2
},
"expected": ["35"]
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are dealing with characters and not numbers / digits how is "slices of two" any different than "slices of two with nonsequential numbers"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test case adds anything over the previous one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ErikSchierboom I agree. Would a slice of 3 from 3 be worth adding to replace one of these two, or am I just beating a dead horse? I am fine with just removing one of the two instances and not adding in a replacement case.

@rpottsoh rpottsoh changed the title series: Implement canonical data and clarify description series: Improve documentation of exercise Feb 4, 2018
@ErikSchierboom
Copy link
Member

If we are talking about only strings, why limit the input to only digits? Since the exercise is only interested in digits, why does the input have to be a string? Enough rambling and babbling from me for now.

This captures my thoughts nicely. The current exercise seems to be leaning on two ideas:

  1. Capturing slices of characters from a string.
  2. Capturing a slice of digits from a number.

Although I think the second case is slightly harder (and thus more interesting), I would actually be fine with just going with the first option: capturing a slice of characters from a string. In that case, the test cases can probably be significantly reduced and should also include not just digits but also alphabetic characters (and even other characters like spaces, punctuation, etc.).

@petertseng
Copy link
Member

petertseng commented Feb 9, 2018

I wouldn't object to straying from https://github.com/exercism/problem-specifications/blob/master/exercises/series/metadata.yml because we shouldn't force people on how to solve largest-series-product anyway, so that's fine. On the other hand, that also means I'm not at all interested in this problem standalone.

This PR is changing two files. I think the description.md changes should be moved to a separate PR or the title of this PR should be re-worded. It doesn't matter to me which occurs.

Agreed. If the description.md changes are NOT moved to a separate PR, I only am comfortable with the following arrangement of commits:

  • one commit "series: clarify description"; 0540d50 is fine as-is
  • one commit "series: add canonical-data.json"

The reason for this:

  • clarifying the description and adding canonical data don't depend on each other, so should be different commits
  • the net effect of canonical-data.json was to add a new file to this repo, so we don't need to see all the drafts that needed changes

I would object to anything else, including without limitation:

  • squashing the description change together along with the canonical-data addition
  • multiple commits for canonical-data.json

Feel free to delay rearranging commits until the very end, of course.

@petertseng
Copy link
Member

petertseng commented Feb 9, 2018

The only choices to really make are:

  • Negative slice length? Probably error. Rather than [].
  • Slice length too large? [] or error?
  • Slice length 0? [[]]? [[]] * (series.length + 1)? error?

substitute [[]] with [""] if using strings, of course.

I don't particularly care which choice is made in any of the above cases

@ErikSchierboom
Copy link
Member

@N-Parsons What do you think about the above comments made by @petertseng?

@petertseng
Copy link
Member

I have decided to justify each alternative I present.

The only choices to really make are:

  • Negative slice length?
    • Probably should be an error.
    • Probably unjustifiable alternative: [], but I'd rather not.
  • Slice length too large?
    • Justifiable alternative: []. There are zero ways to make a slice of length six from five elements, for example.
    • Justifiable alternative: error. If we define the acceptable range of inputs to be bounded above at the slice length.
  • Slice length 0?
    • Justifiable alternative: [[]] * (series.length + 1). Consider three elements. There are:
      • One way to make a slice of length three.
      • Two ways to make a slice of length two.
      • Three ways to make a slice of length one.
      • Therefore, to fit the pattern, there should be four slices of length zero in three elements.
    • Justifiable alternative: error. If we define the acceptable range of inputs to be bounded below at 1.
    • Unjustifiable alternative: []. This implies there are zero ways to make a slice of length zero, which is trivially demonstrable to be false: [] is a slice of length zero, so there must be at least one way to do it.
    • Unjustifiable alternative: [[]]. Just as slices("555", 2) results in two instances of [5, 5], so too should slices("555", 0) result in four instances of [], not just one.

substitute [[]] with [""] if using strings, of course.

I don't particularly care which justifiable alternative is chosen in any of the above decisions.

@ErikSchierboom
Copy link
Member

To me, all three scenarios presented above should result in an error.

@coriolinus
Copy link
Member

coriolinus commented Mar 5, 2018 via email

@ErikSchierboom
Copy link
Member

Can we perhaps come to a decision here? I'm fine with any option.

@rpottsoh
Copy link
Member

I'm fine with any option.

As am I. @N-Parsons what do you think? Have you been able to follow all of the discussion that has been happening since the beginning of February?

@ErikSchierboom
Copy link
Member

ErikSchierboom commented Apr 13, 2018

@rpottsoh @petertseng @coriolinus There has not been a response for 24 days. I prefer not to have long standing PR's open. Any ideas on how to proceed?

@coriolinus
Copy link
Member

My stance remains:

I believe that the case testing negative slice length should return an
error, as there is no other justifiable return for that case, and languages
which support unsigned ints can simply discard the test case. The other two
cases should return the slices as described by @petertseng.

@ErikSchierboom
Copy link
Member

@coriolinus Okay, clear. After considering those changes, I'm fine with them. However, there being no response for a while, begs the question if we want to wait for someone to work on this PR, or should someone create a new PR?

@coriolinus
Copy link
Member

I'm fine leaving this open and waiting for someone with interest to come in and implement the requested changes.

@petertseng
Copy link
Member

I am not able to offer my time to make any changes personally.

In general, from my point of view, I have no qualms about leaving any PR open indefinitely as long as:

  1. our total number of open PRs does not exceed one page (as defined by GitHub pagination)
  2. any PR that I don't need to spend further time on has Changes Requested, because that status is visible from https://github.com/exercism/problem-specifications/pulls.
  3. the branch is on a fork instead of on Exercism's copy of the repo

If some means of clearly indicating that an interested party may work on this PR would make this more attractive to those interested parties, I support applying those means. For example, that might be to either: Label, comment, or close this PR. I don't feel a need to express a preference between these options at this time.

@rpottsoh
Copy link
Member

rpottsoh commented Apr 16, 2018

I had an hour this evening so I have donated it here. I am not sure that how I have committed my changes is kosher. If there is another / better way I should have gone about contributing to this PR / issue please let me know.

- 4914
- 9142
- "4914"
- "9142"

And if you ask for a 6-digit series from a 5-digit string, you deserve
whatever you get.
Copy link
Member

@rpottsoh rpottsoh Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret this last statement to mean that each track can decide how to deal with potentially erroneous input. I know that has been debated to an extent in this PR. If the consent were to treat erroneous input a certain way then this statement should probably be reworded to reflect that decision.

@rpottsoh
Copy link
Member

At present there are three different error messages related to "sliceLength". Would it be preferred to have one error message that states what the criteria is for a good "sliceLength" or to leave things as they are, pointing out what is wrong with "sliceLength" each time there is an error? I don't really have a strong preference either way.

@cmccandless
Copy link
Contributor

Personally, I prefer specific feedback as it makes it easier to immediately see what the current issue is.

If I am playing a game, and I violate a rule, it is not useful for another player who has observed the violation to simply hand me the rule book and tell me "you can't do that" (ignoring the fact this is in itself is a rule in some games). It is much better to cite the violated rule and reference the rules (or, to bring the example back home, the README) for further clarification.

@rpottsoh
Copy link
Member

Personally, I prefer specific feedback as it makes it easier to immediately see what the current issue is.

Thanks for the feedback @cmccandless.

"sliceLength": 2
},
"expected": ["35"]
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test case adds anything over the previous one.

@ErikSchierboom
Copy link
Member

I think this is looking great, with just one single nit. Thanks for doing this @rpottsoh!

@rpottsoh rpottsoh changed the title series: Improve documentation of exercise series: Improve documentation Apr 18, 2018
@ErikSchierboom
Copy link
Member

@petertseng Could you perhaps also review this PR, as you have also given it some thought previously.

@petertseng
Copy link
Member

Please do not wait for my review, other than I can vouch for the fact that the following code verifies the proposed JSON file:

require 'json'
require_relative '../../verify'

json = JSON.parse(File.read(File.join(__dir__, 'canonical-data.json')))

verify(json['cases'], property: 'slices') { |i, _|
  i['series'].each_char.each_cons(i['sliceLength']).map(&:join).tap { |x|
    raise 'Nope, empty is bad' if x.empty?
  }
}

I have no additional commentary to add at this time.

@ErikSchierboom ErikSchierboom merged commit 2c43f88 into exercism:master Apr 19, 2018
@ErikSchierboom
Copy link
Member

Then I find this ready to be merged. Thanks a lot @N-Parsons for starting with this and @rpottsoh for finishing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants