WDL Automated Testing Tool
Watt is a command line tool for automating the testing of WDL workflows. It's supports customization of directory structure, testing for failure, full file comparison across File
type outputs (even when gzipped, but not other types of compressed), and high-level summaries including explanations for why tests failed. It can be run concurrently using multiprocessing, and fits nicely into a CI framework by allowing configurable filtering on which tests should be run (provided by the user).
Click to jump to a specific section below.
- Getting Started
- The Test Configuration File
- Usage and Runtime Options 4. Options 5. Example Usage 6. Interpreting the Results
- Testing Watt
- Non-Use Cases
- Developer's Notes
To use this script, follow these steps:
- Copy the
watt.py
script into your repo. - Create test inputs and outputs JSON files (along with any necessary test files).
- Make a
wdl_test_config.yml
file holding metadata for existing tests in your repo. - Set up the environment: make a virtual environment with the required packages. For example, copy the
requirements.txt
here to your own repo'stest_requirements.txt
and then run:
python -m venv venv
source venv/bin/activate
pip install -r test_requirements.txt
- Run using
python watt.py [ARGS]
with possible options described below.
The configuration file controls how Watt accesses the WDL scripts along with the test data. By default, it should be located in the runtime directory named watt_config.yml
, but this can be configured with a flag.
The file uses yaml
format, with each top-level entry corresponding to a different test. An example test might be recorded as:
workflow_name:
path: "/workflows/something.wdl"
tests:
test_name:
test_inputs: "/tests/inputs.json"
expected_outputs: "/tests/outputs.json"
The workflow_name and test_name can be used separately or together to filter which tests you'd like to run when invoking Watt. See below for details.
The path
points to the WDL to be run for the test, and the test_inputs
are given to Cromwell to configure that run. The expected_outputs
is what should match the Cromwell outputs JSON. You can set this field to null
in the configuration to mean you expect the Cromwell job to fail for the given inputs to check e.g. error handling or edge cases in your workflow.
You can set an output in the expected_outputs
json to null
tell Watt not to compare that output.
In this case, the test will still fail if the specified output is not produced by the workflow, but it will not check its value against anything.
Othewise, each key will have its value compared to the matching key in the other file. See Interpreting the Results below for the different types of test outcomes that can happen.
-h
: help menu-e
: [REQUIRED] path to execution engine jar (i.e. Cromwell jar); checksEXECUTION_ENGINE
environment variable by default-w
: restrict to workflow(s) with given name(s)-t
: restrict to test(s) with give name(s)--executor-log-prefix
: prefix to use for writing execution engine logs (defaults towatt_logs/cromwell
)-c
: location of Watt config file (default:watt_config.yml
)-l
: location to write Watt logs (defaults to stdout)-p
: number of processes to use for testing
You can see the watt_config.yml
example provided in this repo to see some example tests. The test data is stored in the /tests
directory. You can run
python watt.py -e <cromwell_path>
to run all tests in this repo.
If you have a lot of tests, you probably don't want to run them all when just developing or modifying a few workflows. You can restrict the tests run either by workflow_name
or test_name
, defined by the config file. For example, this repo has tests organized in the following way:
WDL | workflow_name | test_name |
---|---|---|
workflows/say_hello.wdl | say_hello | simple |
workflows/say_hello.wdl | say_hello | mismatch |
workflows/say_hello.wdl | say_hello | file_type_mismatch |
workflows/extract_stat.wdl | extract_stat | simple |
workflows/extract_stat.wdl | extract_stat | mismatch |
workflows/extract_stat.wdl | extract_stat | expect_fail |
workflows/extract_stat.wdl | extract_stat | array_shape_mismatch |
workflows/bad_workflow.wdl | bad_workflow | bad_workflow |
So for example, if you were to run
python watt.py -e <cromwell> -w say_hello
this would run the first three tests only. You can add the -p 3
flag to run this in 3 processes which runs significantly faster than with just 1 process (default), at the cost of some out-of-order logs. The final summary will still be in the right order.
Another example: if you run
python watt.py -e <cromwell> -t simple
this would run the say_hello/simple
test and the extract_stat/simple
test. You can combine these flags to run a specific test, like
python watt.py -e <cromwell> -w say_hello -t simple
to just run the first test.
Both flags accept a list of inputs, so e.g.
python watt.py -e <cromwell> -w say_hello,extract_stat
would run all tests except the bad_workflow/bad_workflow
test.
If the Watt logs aren't redirected to a file, they will be printed to stdout during runtime. Some logs will tell you the status of current test, but a final summary will be printed at the end in the correct order (especially if running concurrently with multiprocessing). In this repo, the summary should look like:
Final Test Summary (Workflow Name / Test Name: Result)
say_hello/simple
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 1
Mismatches: 0
ArrayShapeMismatches: 0
FileTypeMismatches: 0
say_hello/mismatch
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 0
Mismatches: 1 -- say_hello.announcement do not match <====================!
ArrayShapeMismatches: 0
FileTypeMismatches: 0
say_hello/file_type_mismatch
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 0
Mismatches: 0
ArrayShapeMismatches: 0
FileTypeMismatches: 1 -- say_hello.announcement do not match <====================!
say_hello/compress_file
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 1
Mismatches: 0
ArrayShapeMismatches: 0
FileTypeMismatches: 0
extract_stat/simple
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 4
Mismatches: 0
ArrayShapeMismatches: 0
FileTypeMismatches: 0
extract_stat/mismatch
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 1
Mismatches: 3 -- extract_stat.name extract_stat.stat extract_stat.wdl_table do not match <====================!
ArrayShapeMismatches: 0
FileTypeMismatches: 0
extract_stat/expect_fail
Success (expected no outputs)
extract_stat/array_shape_mismatch
Keys unique to expected output: 0
Keys unique to actual output: 0
Matches: 1
Mismatches: 0
ArrayShapeMismatches: 3 -- extract_stat.name extract_stat.entries extract_stat.wdl_table do not match <====================!
FileTypeMismatches: 0
bad_workflow/bad_workflow
Failure (Cromwell failed to finished unexpectedly)
Some tests failed. See logs for full summary.
Running with -p 9
takes less than 2 minutes. The tool returns exit code 1 if any test failed, and otherwise 0. There are a few types of test results that can happen:
Match
: Every expected output matched the actual Cromwell output, with keys perfectly aligned, file contents agree if appropriate, etc. This is the only way a test passes other than #6 below.Keys unique
: Some keys in the expected or actual outputs JSONs were unique to one file but not the other. If this happens, the keys unique to either side would be printed here.Mismatch
: The keys lined up properly, but some value was not equal, e.g. some file contents were off, integers were not equal, etc.ArrayShapeMismatch
: One side of a comparison with matching keys had an array shape not matching the other side, e.g.[1, 2, 3, 4] != [[1, 2], [3, 4]]
.FileTypeMismatch
: One side of the comparison seemed to be a file while the other side was not, so it did not make sense to try to compare them. File types are inferred heuristically by checking if the raw string exists as a path, and if so the type is assumed to be file.Success (expected no outputs)
: If the config file had anull
for the expected outputs, then the test succeeds only if the job fails.Failure (Cromwell failed to finish unexpectedly)
: In this case, Cromwell failed during runtime, but this wasn't expected (there was a non-null value for expected outputs).
By looking at the examples in the repo, you can see an example test configuration and workflow that would produce any of these errors.
This repository contains a sample watt_config.yml
along with some toy example workflows and test data so you can demo running the tool before commiting to setting it up in your own repo. It also demonstrates some ideas about how you might structure your tests in your own repo. Simply clone this repo, and run python watt.py
, toggling the arguments to select for which tests to run (or leave empty to run them all).
This tool should not be used for:
- validating your WDLs have proper syntax (use womtool)
- linting your WDLs (try sprocket)
- catching regressions in performance for complex workflows (use Carrot)
- testing WDLs which have their outputs change slightly over time and would require a wrapper WDL to test if outputs have only changed within a threshold (use Carrot); Watt only currently supports exact matching on outputs (although could test wrapper test WDLs using the same techniques if you really want)
Hopefully one day this tool will be superseded by more robust tool which does some WDL parsing to infer data types, and other fancy improvements, which is why this tool isn't currently available as a Python package. In the meantime, you can open an issue/submit a PR if you find any bugs or easy extra features to add. I'll tag releases so you can keep your script pinned to a fixed version. You can "watch" the repo to stay up to date on new releases.