Test Runners have the single responsibility of taking a solution, running all tests and returning a standardized output. All interactions with the Exercism website are handled automatically and are not part of this specification.
- A Test Runner should provide an executable script. You can find more information in the docker.md file.
- The script will receive three parameters:
- The slug of the exercise (e.g.
two-fer
). - A path to an input directory (with a trailing slash) containing the submitted solution file(s) and any other exercise file(s). This directory should be considered as read-only. It's technically possible to write into it but it's better to use
/tmp
for temporary files (e.g. for compiling sources). - A path to an output directory (with a trailing slash). This directory is writable.
- The slug of the exercise (e.g.
- The script must write a
results.json
file to the output directory. - The runner must exit with an exit code of 0 if it has run successfully, regardless of the status of the tests.
The test runner gets 100% CPU with 3GB of memory for a 20 second window per solution. After 20 seconds, the process is halted and reports a time-out.
We highly recommend following our [Performance Best Practices document](/docs/building/tooling/best-practices#h-performance) to reduce the chance of timeouts.
The following fields are supported in results.json
files:
key:
version
, type:number
, presence: required
version: 1, 2, 3
The version of the spec that this file adheres to:
1
: For tracks which test runner cannot provide information on individual tests.2
: For tracks which test runner can output individual test information. Minimal required version for tracks with Concept Exercises.3
: For tracks which test runner can link individual tests to a task.
key:
status
, type:string
, presence: required
version: 1, 2, 3
The following overall statuses are valid:
pass
: All tests passedfail
: At least one test has the statusfail
orerror
error
: No test was executed (this usually means a compile error or a syntax error)
The error
status should only be used if none of the tests were run.
For compiled languages this is generally a result of the code not being able to compile.
For interpreted languages, this is generally the result of a syntax error that stops the file from parsing.
key:
message
, type:string
, presence: required ifstatus
=error
, or whenstatus
=fail
andversion
=1
version: 1, 2, 3
Where the status is error
(no test was executed correctly), the top level message
key should be provided. It should provide the occurring error to the user. As it is the only piece of information a user will receive on how to debug their issue, it must be as clear as possible:
- Simplify the paths to be something like
<solution-dir>/relative/path
instead of/full/path/to
, as that will include unhelpful ECR specific data - When possible or applicable, collapse non-user-code stacks
- Never show call stacks without context (i.e. the error message)
- Don't change the error message (if possible), as it will make it easier to search for the error
In Ruby, in the case of a syntax error, we provide the runtime error and stack trace. In compiled languages, the compilation error should be provided.
The top level message
value is limited to 65535 characters.
The effective maximum length is less if the value contains multibyte characters.
When the status is not error
, either set the value to null
or omit the key entirely.
key:
tests
, type:array
, presence: required ifstatus
=fail
orstatus
=pass
version: 2, 3
This is an array of the test results, specified in the "Per-test" section below.
The tests MUST be returned in the order they are specified in the tests file. For languages that execute tests in a random order, this may mean re-ordering the results in line with the order specified in the tests file.
The rationale for this is that only the first failure is shown to students and therefore it is important that the correct failure is shown. Because tests are generally ordered in the tests file in a TDD way, and because for Practice Exercises the students see the tests file in the editor, aligning the results with the tests file is critical.
key:
name
, type:string
, presence: required
version: 2, 3
This is the name of the test in a human-readable format.
key:
test_code
, type:string
, presence: required if exercise is Concept Exercise
version: 2, 3
This MUST be present for Concept Exercises and SHOULD be present for Practice Exercises.
The difference in this requirement comes from that fact that students are not shown the tests in Concept Exercises, so solving the exercise may be impossible without the test_code
being shown, whereas the tests are shown for Practice Exercises.
This is the body of the command that is being tested. For example, the following Ruby test:
def test_duplicate_items_uniqs_list
cart = ShoppingCart.new
cart.add(:STARIC)
cart.add(:MEDNEW)
cart.add(:MEDNEW)
assert_equal 'Newspaper, Rice', cart.items_list
end
should return a test_code
value of:
"cart = ShoppingCart.new
cart.add(:STARIC)
cart.add(:MEDNEW)
cart.add(:MEDNEW)
assert_equal 'Newspaper, Rice', cart.items_list"
(with linebreaks replaced by \n
to make the JSON valid).
key:
status
, type:string
, presence: required
version: 2, 3
The following per-test statuses are valid:
pass
: The test passedfail
: The test failederror
: The test errored - that is, it did not return a value
key:
message
, type:string
, presence: required ifstatus
isfail
orerror
version: 2, 3
The per-test message
key is used to return the results of a test with a status
of fail
or error
. It should be as human-readable as possible. Whatever is written here will be displayed to the student when their test does not pass. If there is no test failure message or error message, either set the value to null
or omit the key entirely. It is also permissible to output test suite output here. The message
value is not limited in length.
key:
output
, type:string
, presence: optional
version: 2, 3
The per-test output
key should be used to store and output anything that a user deliberately outputs for a test.
- It should be attached to all test results that produce user output.
- Only content outputted by a user manually should show - not automatic output by the test-runner.
- You may either capture content that is output through normal means (e.g.
puts
in Ruby,print
in Python orDebug.WriteLine
in C#), or you may provide a method that the user may use (e.g. the Ruby Test Runner provides a user with a globally availabledebug
method that they can use, which has the same characteristics as the standardputs
method). - The output must be limited to 500 chars. Either truncating with a message of "Output was truncated. Please limit to 500 chars" or returning an error in this situation are acceptable.
key:
task_id
, type:number
, presence: optional
version: 3
Link a test to a specific task via the task's ID, which is the number used at the start of the task heading. Only link a test to a task if it can be linked to precisely one task.
At the moment, only Concept Exercises have well-defined tasks that you can link tests to, but this might change in the future.
For example, consider the following instructions.md
file:
# Instructions
You're going to write some code to help Lucian cook an exquisite lasagna from his favorite cook book.
## 1. Define the expected oven time in minutes
...
## 2. Calculate the remaining oven time in minutes
...
These instructions defined two tasks:
- Define the expected oven time in minutes
- Calculate the remaining oven time in minutes
The results.json
file could then have an entry like this:
{
"name": "Expected oven time in minutes",
"status": "pass",
"task_id": 1,
"test_code": "Assert.Equal(40, Lasagna.ExpectedMinutesInOven());"
}
This test is now linked to the first task: "Define the expected oven time in minutes". Note that the name does not have to match the task's description.
There are various ways tracks could implement this:
- Add metadata to the tests within the test file (e.g. using attributes/annotations/comments) and have the test runner read this metadata when running the tests.
- Store the test name/task ID mapping in a separate file (like the exercise's
.meta/config.json
file) and merge this information into the generatedresults.json
file.
These are example of what a valid results.json
file can look like for the different versions:
{
"version": 1,
"status": "fail",
"message": "Failed: test_answer\nExpected: 42, actual: 3"
}
{
"version": 2,
"status": "fail",
"message": null,
"tests": [
{
"name": "Test that the thing works",
"status": "fail",
"message": "Expected 42 but got 123123",
"output": "Debugging information output by the user",
"test_code": "assert_equal 42, answerToTheUltimateQuestion()"
}
]
}
{
"version": 3,
"status": "fail",
"message": null,
"tests": [
{
"name": "Test that the thing works",
"status": "fail",
"message": "Expected 42 but got 123123",
"output": "Debugging information output by the user",
"test_code": "assert_equal 42, answerToTheUltimateQuestion()",
"task_id": 1
}
]
}
When a student's solution fails a test, it should display something like:
Test Code:
<test_code>
Test Result:
<message>
When the solution passes a test, it should display something like:
Test Code:
<test_code>
All roads lead to Rome and there is no prescribed pattern to arrive at this. There are several approaches taken so far:
- Auxiliary JSON files compiled manually, merged with test results during the test run-time.
- Automated static analysis of the test suite, merged with test results during the test run-time.
- This may be accomplished by AST analysis or text-parsing