Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All tracks that have test generators #155

Closed
petertseng opened this issue Jun 5, 2017 · 7 comments
Closed

All tracks that have test generators #155

petertseng opened this issue Jun 5, 2017 · 7 comments

Comments

@petertseng
Copy link
Member

petertseng commented Jun 5, 2017

Welcome to another issue of "All tracks that have X".
Today's issue is about test generators.

These are anything that use the canonical-data.json file from x-common and generate a test suite to be delivered to students of a given track.

If your track has these, I would be interested to hear about it.

I hope this can help tracks that don't have generators evaluate whether to have them, and allow tracks that already have generators to learn from each other.

Questions I would like to ask:

  • How much additional code must you write to generate tests for each new exercise?
    • On one extreme, zero additional code is needed: A single generator can generate code for every single exercise.
    • On the other extreme, maximal additional code is needed: No code at all is shared between generators of any two exercises.
    • Where on this spectrum is your track currently?
    • Where on this spectrum would you like your track to be, ideally?
  • How do you deal with the fact that the keys/values of a test case are dependent on the exercise? expected as the output is known, but the input values take on many different names.
    • Vimscript: Take the first key that isn't any of comments, description, expected, property (TBD: tests that take multiple inputs). See Add lib/generate.vim vimscript#32.
    • Various other tracks: Additional code required per exercise that specifies what key(s) contain(s) the input.
  • Statically typed languages: How do you deal with the fact that you cannot determine what types the keys/values of a test case will have until you read the value at the property key?
  • Are there any possible changes to the canonical JSON schema that would make generation easier?

This issue may be closed when, in the issue-closer's opinion, there has been enough discussion to get an idea of how some tracks are answering these questions. Of course, even after it is closed, please feel free to comment with any additional answers.

If as a result there are any proposed changes to the schema, an appropriate issue can be created for that.

To give us a head start, here is what I know of some languages' generators.
Please forgive me for being greedy and filling in information for tracks that I am unfamiliar with.
Please correct these or add any additional tracks I missed.
In alphabetical order:

C#

ColdFusion

  • exercism/cfml@ef2544b
  • Probably zero additional config per exercise, given single-input exercises, just didn't read very carefully to verify this statement is true.

Factor

Go

JavaScript

OCaml

Perl 6

Ruby

Edit: Most tests use a common default template.

Rust

Scala

Vimscript

@mhinz
Copy link

mhinz commented Jun 5, 2017

How do you deal with the fact that the keys/values of a test case are dependent on the exercise?

This was my biggest pain point when I wrote the generator for Vimscript. As you said, I'm essentially guessing. I remove the keys comments, description, expected, property and hope that only one key will be left. Then I use its value.

It's easy to see that this approach is very fragile, nonetheless it works in a lot of cases.

I intended to open an issue for this for x-common, because we need a proper "standard" that describes how canonical data should look like. One should never have to guess.

But you also raised other points I didn't encounter, e.g. type issues, so I hope we can compile a list of typical issues here and use those to create a standard for canonical data. This would make generators less complex and more correct.

@kotp
Copy link
Member

kotp commented Jun 5, 2017

The VimScript approach in that regard so far is brilliant... remove the known keys, use the unknown.

@petertseng
Copy link
Member Author

petertseng commented Jun 6, 2017

Statically typed languages: How do you deal with the fact that you cannot determine what types the keys/values of a test case will have until you read the value at the property key?

I need to clarify this and why I am interested in the answer to this question.

In some JSON parsers in statically-typed languages, you must declare the types of all key/value pairs in a JSON object before you can parse it. This poses a challenge for, say, the clock data in https://github.com/exercism/x-common/blob/master/exercises/clock/canonical-data.json that has multiple property but differing types for keys/values depending on property.

An excerpt:

  "cases": [
    {
      "description": "Create a new clock with an initial time",
      "cases": [
        {
          "description": "on the hour",
          "property": "create",
          "hour": 8,
          "minute": 0,
          "expected": "08:00"
        }
      ]
    },
    {
      "description": "Add minutes",
      "cases": [
        {
          "description": "add minutes",
          "property": "add",
          "hour": 10,
          "minute": 0,
          "add": 3,
          "expected": "10:03"
        }
      ]
    },
    {
      "description": "Compare two clocks for equality",
      "cases": [
        {
          "description": "clocks with same time",
          "property": "equal",
          "clock1": {
            "hour": 15,
            "minute": 37
          },
          "clock2": {
            "hour": 15,
            "minute": 37
          },
          "expected": true
        }
      ]
    }
  ]

Well, since the property is the only way we can tell these apart (without any other prior knowledge! more on prior knowledge later!), we are challenged to find a type that describes the possible objects that may lie in cases.

There will now be some examples in Go, but you might imagine how you would do it in your statically-typed language of choice.

The current solution at https://github.com/exercism/xgo/blob/master/exercises/clock/.meta/gen.go is to union all the keys/values.

type js struct {
	Groups TestGroups `json:"Cases"`
}

type TestGroups []struct {
	Description string
	Cases       []OneCase
}

type OneCase struct {
	Description string
	Property    string
	Hour        int // "create"/"add" cases
	Minute      int // "create"/"add" cases
	Add         int // "add" cases only

	Clock1   struct{ Hour, Minute int } // "equal" cases only
	Clock2   struct{ Hour, Minute int } // "equal" cases only
	Expected interface{}                // string or bool
}

How does this compare to the state of the world before property? Before then, the clock tests looked something like https://github.com/exercism/x-common/blob/cda8f9800a33d997f8c6146a10b8caf66e25ec4b/exercises/clock/canonical-data.json:

   "create": {
      "description": [
         "Test creating a new clock with an initial time."
      ],
      "cases": [
         {
            "description": "on the hour",
            "hour": 8,
            "minute": 0,
            "expected": "08:00"
         }
      ]
   },
   "add": {
      "description": [
         "Test adding and subtracting minutes."
      ],
      "cases": [
         {
            "description": "add minutes",
            "hour": 10,
            "minute": 0,
            "add": 3,
            "expected": "10:03"
         }
      ]
   },
   "equal": {
      "description": [
         "Construct two separate clocks, set times, test if they are equal."
      ],
      "cases": [
         {
            "description": "clocks with same time",
            "clock1": {
               "hour": 15,
               "minute": 37
            },
            "clock2": {
               "hour": 15,
               "minute": 37
            },
            "expected": true
         }
      ]
   }

To this, it is possible to use the structure at https://github.com/exercism/xgo/blob/d8dbcece4b6bbdd8f82099645c4defa02daca2c0/exercises/clock/.meta/gen.go

type js struct {
  Create struct {
    Description []string
    Cases       []struct {
      Description  string
      Hour, Minute int
      Expected     string
    }
  }
  Add struct {
    Description []string
    Cases       []struct {
      Description       string
      Hour, Minute, Add int
      Expected          string
    }
  }
  Equal struct {
    Description []string
    Cases       []struct {
      Description    string
      Clock1, Clock2 struct{ Hour, Minute int }
      Expected       bool
    }
  }
}

Let's remind everyone of why we moved away from this approach: exercism/problem-specifications#336 (comment) :

In most of the test suites with more than one type of test, the test's type is encoded
in a property-key, in an object describing a test group.

There are two problem with that approach:

  • It mixes two different concepts regarding the tests: grouping and identification
  • It doesn't allow nesting of test groups. That would be nice to have, but is not really needed.
  • It doesn't allow grouping of test of different types, which would be really great.

Moving the test type near the test data, we solved all the above problem easily. It is theoretically sound and adds functionality.

So what do we do? If you think this problem is insurmountable, then you might think to propose a schema change.

  • Should it go back to the way it once was, with the schema supporting keys on the top level instead of using property? Then the create, add, and equal keys are also exercise-dependent. We would need a strong reason to move back to this way.
  • Should the schema be changed in some other way? Your suggestion is welcome of course since I have not thought of one yet.

There are of course various choices NOT involving schema changes! If every language wanting to parse the JSON finds at least one of these choices satisfactory, the schema doesn't need to change for this reason (it might change for other reasons).

  • parse the JSON object into a generic map of string -> value-of-unknown-type (however this is done in your language). This seems to make it difficult to tell what keys each individual property looks like, and maybe you don't like to use your language's unknown-type type. The C# and Scala tracks are known to use this solution, and I imagine they are satisfied with it. The resulting generators look clean which is why I make that guess.
  • Union the keys/values. A bit ugly, hard to tell what keys each individual property has, and is tricky if a same key can have different types (this problem is not unique to multiple-property exercises though, so I guess that does not detract from this solution).
  • Delay parsing of the JSON object until you have read the property, then parse it into the correct type based on the property. Seems reasonable, and the generators look understandable enough.
  • Use prior knowledge not encoded in the schema. For clock, we know that currently cases[0] has all create cases, currently cases[1] and cases[2] have all add cases, and currently cases[3] has all equal cases. We can use this prior knowledge to parse while being able to declare types. But tracks have to be careful when adopting that approach, because x-common could change in such a way that invalidates this priori knowledge.

@m-dango
Copy link
Member

m-dango commented Jun 6, 2017

The use of prior knowledge has already required me to make changes to the test the last time clock was updated. I would not recommend it unless it's a necessity.

The approach I intend to take instead is a for loop with a switch case for each property. For example with clock:

for @($c-data<cases>) {
  for @(.<cases>) -> $case {
    given $case<property> {
      when 'create' {
        ...
      }
      when 'add' {
        ...
      }
      when 'equal' {
       ...
      }
    }
  }
}

@Stargator
Copy link

Stargator commented Sep 13, 2017

@petertseng is it a goal for all tracks to implement test generators?

@petertseng
Copy link
Member Author

petertseng commented Sep 13, 2017

is it a goal for all tracks to implement test generators?

I won't presume to speak for Exercism (and I shouldn't answer at all because you aren't asking me, you are asking peter seng, but I will answer anyway), but I can say for sure it's not one of my goals.

Each track can do as its maintainers please. Whatever makes their lives easier.

@petertseng
Copy link
Member Author

This issue may be closed when, in the issue-closer's opinion, there has been enough discussion to get an idea of how some tracks are answering these questions

Way overdue.

Reminder that exercism/problem-specifications#996 should make it easy to determine what the inputs are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants