ClinVar, reimagined

motivation

During the creation of Talos, a tool for identifying clinically relevant variants in large cohorts, we use ClinVar ratings as a contributing factor in determining pathogenicity. During development of this tool we determined that the default summaries generated in ClinVar were highly conservative; see the table here describing the aggregate classification logic.

content

This repository contains an alternative algorithm (described here) for re-aggregating the individual ClinVar submissions, generating decisions which favour clear assignment of pathogenic/benign ratings instead of defaulting to 'conflicting'. These ratings are not intended as a replacement of ClinVar's own decisions, but may provide value by showing that that though conflicting submissions exist, there is a clear bias towards either benign or pathogenic ratings.

outputs

Our intention with this repository is to make this code and process available, as well as periodically producing releases containing the resulting data files for consumption in other analyses.

This currently generates a few key outputs:

JSON file of all revised decisions
Hail Table of all revised decisions
VCF of all revised decisions; this can be used as a custom annotation source in VEP
VCF of all Pathogenic-SNVs, for annotation & feeding into the second stage; ACMG criteria PM5 analysis

this script shows the steps involved in generating the data indicated above. The VCF form can be used as a custom annotation source in VEP. See this syntax available in VEP >= 110:

./vep [...] --custom file=clinvar_for_VEP.vcf.gz,short_name=CPG_ClinVar,format=vcf,type=exact,coords=0,fields=allele_id%gold_stars%clinical_significance

acknowledgements

ClinVar, for providing the data which this process is based on

At CPG we leverage Hail, a python-based analysis framework which leverages Apache Spark to perform distributed computation. This repository contains a number of scripts which were initially designed to be run using Hail, and can either be executed once hail is installed, or using a public Hail Docker image, sourced from DockerHub. A Dockerfile included in this repository will build a custom image capable of locally executing all scripts.

Our aim with this repository is to carry out a periodic reprocessing of ClinVar data, and to make the results available to the wider community in a range of formats. We hope that this will be useful to others who are working with ClinVar data, and that it will be a useful resource for those who are working on similar projects.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
clinvarbitration		clinvarbitration
data		data
docs		docs
test		test
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
example_script.sh		example_script.sh
pull_request_template.md		pull_request_template.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClinVar, reimagined

motivation

content

outputs

acknowledgements

About

Releases 5

Packages

Contributors 2

Languages

License

populationgenomics/ClinvArbitration

Folders and files

Latest commit

History

Repository files navigation

ClinVar, reimagined

motivation

content

outputs

acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Languages

Packages