-
Notifications
You must be signed in to change notification settings - Fork 23
rare disease
for de novo variants, this command:
slivar expr \
--ped $ped \
--vcf $vcf \
--pass-only \
--js functions.js \
--info 'variant.FILTER == "PASS" && (!variant.is_multiallelic) && INFO.gnomad_popmax_af < 0.001' \
--trio 'DN:trio_denovo(kid, dad, mom) && !("gnomad_popmax_af_controls_filter" in INFO)' \
-g ./gnomad.hg37.zip \
-g ./spliceai.hg37.zip \
| bcftools csq -f $ref37 -g $gff -l -
works quite well. Note that it:
- annotates with the gnomad and spliceai files (via -g) linked here and then uses the gnomad allele frequency for filtering.
- filters out variants that are PASS in the query VCF, but non-PASS in gnomad. This turns out to be quite a good filter.
- uses function.js to define
trio_denovo
. - pipes to
bcftools csq
for annotation with e.g.missense
,stop_gained
, etc.
This command results in ~3-6 high-quality candidate variants per trio exome and takes ~30 seconds to run (faster on a fast disk).
slivar
provides some machinery for finding compound heterozygotes in trios using phase-by-inheritance. It also reports variant-pairs where 1 side is a de novo. It assumes the input has already been filtered to high-quality variants. The steps are:
- use slivar to filter to rare heterozygous (or denovo) variants
- pipe to an effect annotator like bcftools csq, VEP, or snpEff
- pipe to
slivar compound-hets
for example:
slivar expr \
--ped $ped \
--vcf $vcf \
--pass-only \
--js functions.js \
--info 'INFO.MQ > 40 && variant.FILTER == "PASS" && (!variant.is_multiallelic) && INFO.gnomad_popmax_af < 0.005' \
--trio 'recessive:trio_hets(kid, dad, mom) && !("gnomad_popmax_af_controls_filter" in INFO)' \
-g ./gnomad.hg37.zip \
# annotate with bcftools. this can also be VEP or snpEff
| bcftools csq -f $ref37 -g $gff -l - \
# tell slivar that the gene info is in the 2nd column of the BCSQ INFO field
| ./src/slivar compound-hets -p $ped -f BCSQ -i 2 \
> comphets.vcf
here, trio_hets
defined in functions.js
finds variants that are heterozygous in the kid and 1 (and only 1) parent. The result is annotated using bcftools
and then sent to slivar compound-hets
.
The -f BCSQ
argument tells slivar
that the effect is in the BCSQ
field in the INFO column of the VCF (this will be, for example CSQ
on a VEP annotated VCF).
The -i 2
argument tells slivar
that the 2nd (| delimited) column in the BCSQ record contains the gene name used for grouping.
slivar
then finds pairs of variants that are both het in the kid and het opposite parents so that we know they came from opposite haplotypes (this is phase-by-inheritance
).
The output is a VCF of only comp-het variants with an extra INFO field of e.g.
slivar_comphet=16/1250304/CGAG/C/$gene/$sample
which indicates that the current variant is compound-heterozygous in $sample
with the variant at 16:1250304
in $gene
.
Even with a less stringent frequency filter of < 0.005
and no filter on variant function (e.g. allowing synonymous). This can reduce the number of candidate pairs of variants down to about 1 dozen in a standard trio.
using my.js, users can find variants with uniparental-disomy using:
--trio "uniparent_disomy:uniparent_disomy(kid, mom, dad) && hqrv(variant, INFO, 1)" \
The following 2 command encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets. For exomes, these 2 commands will often result in ~20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named my.js those are easily modified to allow more/less stringency or different logic.
## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
gff=Homo_sapiens.GRCh37.82.gff.gz # ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
name=mycohort
## end variables
slivar expr --vcf $vcf --ped $ped \
--pass-only \
--js my.js \
-g gnomad.hg37.zip \
--trio "denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.001)" \
--trio "x_denovo:x_denovo(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
--trio "recessive:recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01)" \
--trio "x_recessive:x_recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
| bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o vcfs/$name.vcf
# find compound hets, including where 1 end is a denovo
slivar expr --vcf $vcf --ped $ped \
--js my.js \
-g gnomad.hg37.zip \
--pass-only \
--trio "lenient_denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.005)" \
--trio "lenient_ar:lenient_ar(kid, mom, dad) && hqrv(variant, INFO, 0.01) && INFO.gnomad_nhomalt_controls < 4" \
| bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -O u \
| slivar compound-hets -s lenient_denovo -s lenient_ar-p $ped > vcfs/$name.comp-hets.vcf
Note that for multi-sample VCFS, -s
arguments to compound-hets
make sure that only the trio(s) that passed the filters in the upstream slivar
command are used to compose the 2 ends of a compound heterozgyote.