Genomic Variant Prioritization

Snakemake workflow post-genotype calling to prioritize disease-causing variants on biowulf2.

Quick-ish Test Start Using Demo vcf

Log into your biowulf2 account.
sinteractive
mkdir -p ~/R/3.5/library
module load R/3.5.2
R
devtools::install_github('davemcg/see_gem', build_vignettes=T)
# THE ABOVE MUST INSTALL WITHOUT ERROR. IF IT DOES FIGURE IT OUT / ASK ME TO HELP.
q()
cd ~/
mkdir -p ~/git
cd ~/git
git clone https://github.com/davemcg/variant_prioritization.git
cd variant_prioritization
sed -i 's/mcgaugheyd|guanb/YOUR_BIOWULF2_USERNAME/g' src/vcfanno_v4.conf
cd variant_prioritization/tests
sbatch --time=12:0:0 ../Snakemake.wrapper.sh config_variant_prioritization.yaml

Input

VCF from NGS_genotype_calling Has to be bgzipped.
PED with samples in VCF. The samples in PED and VCF must match. PED file has to be "\t" delimited. If header in PED, it has to start with #.
SampleID in fastq files and PED files CANNOT contain "-" or "_". Can consider adding these characters in next version.
"Default" Gemini quieries for samples and families will be included. Use

Set up

Copy src/config_variant_prioritization.yaml to your local folder and edit the ped field to give a path to your ped file. You will also need to edit the family_name to instruct Snakemake which families (must match ped family field, column 1) to create reports from. You can either give one family like so:

####- family_name: 'gupta_fam' - if you leave this blank (family_name: '') then only the GEMINI database will be created (no family reports) Or a list of families to process like so:- family_name: ['gupta_fam', 'smith_fam', 'chan_fam']

family_name will be generated from PED file by the pipeline

Install SeeGEM in R on biowulf2 to produce the html report.

sinteractive
module load R
R
devtools::install_github('davemcg/see_gem', build_vignettes=T)

Finally edit the first line of src/config_variant_prioritization.yaml to put your vcf (bgzip'ed and tabix'ed) in.

Run (in biowulf2)

freen to pick gpu p100 (default withouting specifying $2 below), v100 (need to edit cluster.json file), or k80 ($2 below). Currently using 4 gpu thus one node. When gpu node is busy, spliceai could take time to be started. sbatch --time=12:00:00 ~/git/variant_prioritization/Snakemake.wrapper.sh COPIED_OVER_YAML_FILE.yaml [optional: ~/git/variant_prioritization/src/k80cluster.json]

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
data		data
src		src
tests		tests
ForDagFig.Snakemake.wrapper.sh		ForDagFig.Snakemake.wrapper.sh
GATK_vcf_to_geminiDB.sh		GATK_vcf_to_geminiDB.sh
README.md		README.md
SV_Snakemake.wrapper.sh		SV_Snakemake.wrapper.sh
Snakemake.wrapper.sh		Snakemake.wrapper.sh
metadata_file.csv		metadata_file.csv
query_gemini_wrapper.sh		query_gemini_wrapper.sh
variant_prioritization_dag.svg		variant_prioritization_dag.svg
variant_prioritization_dag_ogl.svg		variant_prioritization_dag_ogl.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomic Variant Prioritization

Quick-ish Test Start Using Demo vcf

Input

Set up

Run (in biowulf2)

Visualization

About

Releases 1

Packages

Contributors 3

Languages

davemcg/variant_prioritization

Folders and files

Latest commit

History

Repository files navigation

Genomic Variant Prioritization

Quick-ish Test Start Using Demo vcf

Input

Set up

Run (in biowulf2)

Visualization

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages