A template test workflows repository that works with CPG Flow
Background • Key Features • How To Use • Editing in an IDE • Related • License
The tests_workflows_shared repository serves as a dedicated testing space for both cpg_flow and pipeline developers. It is designed to facilitate manual, integrated end-to-end (E2E) validation of the cpg_flow package, ensuring its robustness and reliability in production-like environments. By interfacing with Metamist and leveraging a cohort from the fewgenomes project, the repository enables testing of new builds and modifications before deployment.
For pipeline developers who are new to cpg_flow, this repository provides a practical trial workflow, offering a hands-on introduction to its core functionalities and best practices. This dual-purpose approach not only supports continuous improvement of cpg_flow but also accelerates onboarding and skill development for new contributors.
Beyond its primary focus on testing, the repository promotes standardization through:
- Enforcement of consistent naming conventions aligned with CPG standards.
- Automated package and dependency updates using Renovate.
- Dependency management facilitated by uv.
By combining rigorous testing capabilities with a standardised development framework, tests_workflows_shared ensures high-quality pipeline development and fosters a cohesive developer experience.
- Uses
uv
to manage dependencies - Uses
renovate
for package upgrades - Uses
analysis-runner
to run the test workflow - The
jobs
andstages
are defined in separate files:- The
cpg_flow_test/jobs/
directory contains the job definitions that can be reused across stages. - The
cpg_flow_test/stages.py
file contains the stage definitions, which call the jobs.
- The
- The
cpg_flow_test/workflow.py
file contains the test workflow definition.
From your command line:
# Clone this repository
$ git clone https://github.com/populationgenomics/test_workflows_shared
# Go into the repository
$ cd test_workflows_shared
# Go to the test folder
$ cd cpg_flow_test
# Run the test with the bash script
$ chmod +x run-test-workflow.sh
# See the notes below on how to find a valid path/tag.
# The default path is australia-southeast1-docker.pkg.dev/cpg-common/images/cpg_flow:0.1.0-alpha.14
$ ./run-test-workflow.sh --image "australia-southeast1-docker.pkg.dev/cpg-common/images/cpg_flow:<tag_id>"
If the job is successfully created, the analysis-runner output will include a job URL. This driver job will trigger additional jobs, which can be monitored via the /batches page on Hail. Monitoring these jobs helps verify that the workflow ran successfully. When all expected jobs complete without errors, this confirms the successful execution of the test workflow and indicates that the cpg_flow package is functioning as intended.
-
You will need to have
analysis-runner
installed in your environment. See the analysis-runner for more information or install it withpipx install analysis-runner
. -
Testing with Different Image Tags: Running the pipeline on different tags of the cpg_flow image is valuable for validating unmerged functionality in the cpg_flow repository. To ensure stability, you can default to a recent release tag when testing with a stable version of the cpg_flow image.
-
Finding a Valid Tag: A valid tag can be obtained from the most recent cpg-flowDocker workflow runs. Look under the print docker tag job of the workflow. Be mindful of the distinction between images (stable) and images-tmp (test images pruned fortnightly).
To enable syntax highlighting in your IDE, you will need to install dependencies.
# Install dependencies
# `uv` documentation: https://docs.astral.sh/uv/
$ uv sync
# Activate the virtual environment
$ source .venv/bin/activate
cpg-flow - supports various stages of genomic data processing, from raw data ingestion to final analysis outputs, making it easier for researchers to manage and scale their population genomics workflows.