Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Schema proposal: All inputs should appear under the input key #996

Closed
petertseng opened this issue Nov 8, 2017 · 16 comments
Closed
Labels

Comments

@petertseng
Copy link
Member

petertseng commented Nov 8, 2017

In our schema, we know that description, expected, and property are fixed, but the inputs to each test case depend on the problem!
That means we can't know ahead of time what key(s) constitute the inputs; we have to examine the JSON in order to do so.

According to those test generators listed at exercism/discussions#155 the current solutions to this problem take on one of two forms:

  • Look for any key that isn't description, expected, property, or comments. That key (or those keys) must be the input, by process of elimination.
  • Per-exercise configuration or code describing what keys hold the input for each exercise.

Well, what if we could do better?

Let's look at https://github.com/exercism/problem-specifications/blob/master/exercises/change/canonical-data.json where each case has two inputs: coins and target. Let's say that instead a case looks like:

{
  "description": "single coin change",
  "property": "findFewestCoins",
  "input": {
    "coins": [1, 5, 10, 25, 100],
    "target": 25
  },
  "expected": [25]
}

As you can see, the key input contains an object whose keys and values represent the inputs.

And what about https://github.com/exercism/problem-specifications/blob/master/exercises/leap/canonical-data.json where each case has only one input? One might consider two possible ways to do this:

The first way is actually how the JSON is already: The key input contains a scalar, the one and only input.

{
  "description": "year not divisible by 4: common year",
  "property": "leapYear",
  "input": 2015,
  "expected": false
}

The second way we might consider is that it will also be an object, for consistency. So then:

{
  "description": "year not divisible by 4: common year",
  "property": "leapYear",
  "input": {
    "year": 2015
  },
  "expected": false
}

One could consider making that a one-liner: "input": {"year": 2015}) if vertical space savings are desired.
This has the advantage of consistency, of course: input always contains an object, but the file also becomes more verbose.

Were one of the two forms of this proposal to be implemented, would this benefit you and/or your track? Would it instead cause harm?

Alternative proposals to solve the problem of "we don't know the names of the keys that hold the input(s) ahead of time"?

@petertseng
Copy link
Member Author

I thought about trying to do this, but my current reaction to "would this benefit you and/or your track" is "meh" since I don't maintain any track that uses generators, and the humans reading the JSON files seem to have no trouble understanding which keys represent the inputs. But hey, it's here if people want it, and maybe it'll be good to have.

@Insti
Copy link
Contributor

Insti commented Nov 8, 2017

Having input as a key with an object containing the well-named arguments is my preference.

@ErikSchierboom
Copy link
Member

Were one of the two forms of this proposal to be implemented, would this benefit you and/or your track?

Absolutely! It would make processing the JSON simpler and less error-prone.

Having input as a key with an object containing the well-named arguments is my preference.

Same here.

@NobbZ
Copy link
Member

NobbZ commented Nov 8, 2017

I would be happy about consilidating those keys under input as well!

@petertseng
Copy link
Member Author

I see a comment https://github.com/exercism/problem-specifications/blob/master/canonical-schema.json#L6-L10 in the schema file that explains why input is not a key: To allow for tests that are not example-based (for example, property-based).

I allege that since the vast majority of our specifications are example-based that we can proceed with input. In general, it is hard to see how to encode a property-based test in JSON, and to date we have not made an attempt at doing so.

So I think all we will ask is that this schema does not prevent a future expansion where we might add the ability to encode a property-based test.

I think such an expansion can be added in another key that is not input. So that is achieved.

@petertseng
Copy link
Member Author

petertseng commented Nov 8, 2017

If we take action on this (I would wait until a week has passed since the filing of the proposal), plan what to do with the version.

Assume that an exercise that is converted to the new schema is currently versioned x.y.z.

  • Leave current version scheme in place, which would require a major version bump to (x+1).0.0.
  • Adopt the two-component version proposed in Versioning guidelines #938, which would require a minor version bump to x.(y+1).
  • Adopt the one-component version proposed in Versioning guidelines #938 (comment), which would require simply dropping y and z, resulting in version x.
  • Remove the version number completely.
  • Your suggestion here.

Adopting a different version schema simultaneously allows us to take the interesting shortcut of "an exercise is converted to the new input schema if and only if it has an N-component version", so consider it an incentive to adopt a different version schema simultaneously.

My preferences are as I noted earlier. Single-component version > No version at all > Two-component version with CI enforcement that some version component changes ≈ Two-component version without CI enforcement > current version scheme.

@coriolinus
Copy link
Member

I think that for exercises who have their canonical-data.json restructured by this, it would be appropriate to leave the current version scheme in place, but increment the version: (x+1).0.0. It's possible that a flexibly-written test generator will generate identical output after the fact, but many (including mine for Rust) will generate something new after the fact. That's a breaking change, appropriate for a major version upgrade.

The proposals to reduce the version number to a two- or one-component number discard clarity for simplicity. I can't deny that they're simpler, but I believe that knowledge of standard three-component semantic versioning is fairly widespread in the programming community at this point.

The proposal to abandon versioning in favor of git SHAs or other schemes have the drawback that it becomes harder for humans to tell what's going on at a glance, and impossible for them to estimate the difference between two files at a glance.

@Insti
Copy link
Contributor

Insti commented Nov 9, 2017

Can we try and keep the versioning discussions in #938 so they will be more easily discoverable to future generations.

@petertseng
Copy link
Member Author

Well, to my surprise, in 24 * 7 hours there were no objection and there was also nonzero support. I suppose we can call it open season to convert JSON files to this, then. #998 will be able to tell us which exercises were converted or not. I would make a big checklist like #625 did but since my allotted time is up now, I can't.

@snahor
Copy link

snahor commented Nov 30, 2017

I'm late here, but these are all the invalid exercises using @petertseng #998

  • acronym
  • all-your-base
  • allergies
  • alphametics
  • anagram
  • atbash-cipher
  • beer-song
  • binary-search
  • binary
  • bob
  • book-store
  • bowling
  • bracket-push
  • change
  • circular-buffer
  • clock
  • collatz-conjecture
  • complex-numbers
  • connect
  • crypto-square
  • custom-set
  • diamond
  • difference-of-squares
  • dominoes
  • flatten-array
  • food-chain
  • forth
  • gigasecond
  • grains
  • grep
  • hamming
  • hello-world
  • house
  • isbn-verifier
  • isogram
  • kindergarten-garden
  • largest-series-product
  • leap
  • list-ops
  • luhn
  • markdown
  • meetup
  • minesweeper
  • nth-prime
  • nucleotide-count
  • ocr-numbers
  • palindrome-products
  • pangram
  • pascals-triangle
  • perfect-numbers
  • phone-number
  • pig-latin
  • poker
  • pov
  • prime-factors
  • protein-translation
  • proverb
  • queen-attack
  • rail-fence-cipher
  • raindrops
  • react
  • rectangles
  • reverse-string
  • rna-transcription
  • robot-simulator
  • roman-numerals
  • rotational-cipher
  • run-length-encoding
  • saddle-points
  • say
  • scrabble-score
  • secret-handshake
  • sieve
  • space-age
  • spiral-matrix
  • sublist
  • sum-of-multiples
  • tournament
  • transpose
  • triangle
  • trinary
  • twelve-days
  • two-bucket
  • two-fer
  • variable-length-quantity
  • word-count
  • word-search
  • wordy
  • zebra-puzzle
  • zipper

@petertseng
Copy link
Member Author

This makes some progress on the concerns listed in #336 which were:

  • We don't know which keys are inputs (now we will! They're all the key/value pairs under input!)
  • Are we able to have meaningful names for the inputs, so that they are easy for humans to read? (We are able!)

In fact, having an input object was proposed there. So this was merely a revival of it, 15 months later. Thanks to those who originally proposed it (#336 (comment), #336 (comment), #336 (comment))

The remaining questions to resolve would probably be:

This proposal deliberately chose not to solve that problem because it chooses to take a small step at a time. If you were to predict that I will put forward a proposal to solve that problem, I advise you that that is an extremely unwise prediction to make.

@petertseng
Copy link
Member Author

petertseng commented Feb 10, 2018

Everyone is reminded that, when submitting a PR that purports to move inputs to the input object, you will need to remove the USE_OLD_SCHEMA file in that exercise's directory because I just merged #1074. The CI has got your back in case you forget to do that (I mean the CI will fail if you forget).

@ErikSchierboom
Copy link
Member

Great work @petertseng!

@petertseng
Copy link
Member Author

Since this proposal was accepted and is now being checked by CI, I believe it's safe to close the issue. Tracking of progress is done by counting the number of USE_OLD_SCHEMA files. even though it's a policy there's no need to keep it open since policies can be found with https://github.com/exercism/problem-specifications/issues?utf8=%E2%9C%93&q=label%3Apolicy

@petertseng
Copy link
Member Author

Note that in react (#1130) and circular-buffer (#1186) to accommodate the schema (which requires expected we had to add expected: {} to each case, because the expectations are already encoded in each operation under operations.

If there is desire and demand for it, consider allowing operations to be an alternative to expected. Remember that bank-account will want the same treatment (#554).

petertseng added a commit that referenced this issue Feb 15, 2018
grains 1.1.0

As proposed and accepted in #996

```ruby
have_input = false

ARGF.each_line { |l|
  if l.include?('version')
    ver = l.split(?")[3]
    ver_components = ver.split(?.).map(&:to_i)
    ver_components[1] += 1
    ver_components[2] = 0
    l[ver] = ver_components.join(?.)
  end

  have_input &&= !l.include?('"description"')

  first_non_space = l.index(/\S/)
  if l.include?('"input"')
    have_input = true
    input = l.split(?:).last.to_i
    puts ' ' * first_non_space + '"input": {'
    puts ' ' * first_non_space + '  "square": ' + input.to_s
    puts ' ' * first_non_space + '},'
    next
  end

  puts ' ' * first_non_space + '"input": {},' if l.include?('"expected"') && !have_input

  puts l
}
```
@petertseng
Copy link
Member Author

The remaining questions to resolve would probably be:

My advice to the person who will resolve this question: You will need to be able to account for language differences. Take https://github.com/exercism/problem-specifications/blob/master/exercises/accumulate/description.md for example. This is the higher-order function map. Consider how a few existing languages would order the inputs:

The solution to this problem will need to take into account the possibility that different languages have different conventions.

Whoops, I just typed this up and realized it's already been said before. #336 (comment). I'm sorry that I failed to add anything new to the discussion.

petertseng added a commit that referenced this issue Feb 16, 2018
This reverts commit a866c43.

Observe that there are no more USE_OLD_SCHEMA files in this repository.
This indicates that henceforth all canonical-data.json files should use
the schema proposed in
#996 and
defined in #1074.
rpottsoh added a commit to rpottsoh/exercism-problem-specifications that referenced this issue Mar 7, 2018
rpottsoh added a commit to rpottsoh/exercism-problem-specifications that referenced this issue Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants