Skip to content

rare disease

Brent Pedersen edited this page Apr 15, 2019 · 52 revisions

using slivar for rare disease research

denovo

for de novo variants, this command:

slivar expr \
    --ped $ped \
    --vcf $vcf \
    --pass-only \
    --js functions.js \
    --info 'variant.FILTER == "PASS" && (!variant.is_multiallelic) && INFO.gnomad_popmax_af < 0.001' \
    --trio 'DN:trio_denovo(kid, dad, mom) && !("gnomad_popmax_af_controls_filter" in INFO)' \
    -g ./gnomad.hg37.zip \
    -g ./spliceai.hg37.zip \
  | bcftools csq -f $ref37 -g $gff -l -

works quite well. Note that it:

  • annotates with the gnomad and spliceai files (via -g) linked here and then uses the gnomad allele frequency for filtering.
  • filters out variants that are PASS in the query VCF, but non-PASS in gnomad. This turns out to be quite a good filter.
  • uses function.js to define trio_denovo.
  • pipes to bcftools csq for annotation with e.g. missense, stop_gained, etc.

This command results in ~3-6 high-quality candidate variants per trio exome and takes ~30 seconds to run (faster on a fast disk).

compound heterozygotes

slivar provides some machinery for finding compound heterozygotes in trios using phase-by-inheritance. It also reports variant-pairs where 1 side is a de novo. It assumes the input has already been filtered to high-quality variants. The steps are:

  1. use slivar to filter to rare heterozygous (or denovo) variants
  2. pipe to an effect annotator like bcftools csq, VEP, or snpEff
  3. pipe to slivar compound-hets

for example:

slivar expr \
    --ped $ped \
    --vcf $vcf \
    --pass-only \
    --js functions.js \
    --info 'INFO.MQ > 40 && variant.FILTER == "PASS" && (!variant.is_multiallelic) && INFO.gnomad_popmax_af < 0.005' \
    --trio 'recessive:trio_hets(kid, dad, mom) && !("gnomad_popmax_af_controls_filter" in INFO)' \ 
    -g ./gnomad.hg37.zip \ 
  # annotate with bcftools. this can also be VEP or snpEff
  | bcftools csq -f $ref37 -g $gff -l - \
  # tell slivar that the gene info is in the 2nd column of the BCSQ INFO field
  |  ./src/slivar compound-hets -p $ped -f BCSQ -i 2  \
  > comphets.vcf

here, trio_hets defined in functions.js finds variants that are heterozygous in the kid and 1 (and only 1) parent. The result is annotated using bcftools and then sent to slivar compound-hets.

The -f BCSQ argument tells slivar that the effect is in the BCSQ field in the INFO column of the VCF (this will be, for example CSQ on a VEP annotated VCF). The -i 2 argument tells slivar that the 2nd (| delimited) column in the BCSQ record contains the gene name used for grouping. slivar then finds pairs of variants that are both het in the kid and het opposite parents so that we know they came from opposite haplotypes (this is phase-by-inheritance).

The output is a VCF of only comp-het variants with an extra INFO field of e.g.

slivar_comphet=16/1250304/CGAG/C/$gene/$sample

which indicates that the current variant is compound-heterozygous in $sample with the variant at 16:1250304 in $gene.

Even with a less stringent frequency filter of < 0.005 and no filter on variant function (e.g. allowing synonymous). This can reduce the number of candidate pairs of variants down to about 1 dozen in a standard trio.

uniparental disomy (or deletions)

using my.js, users can find variants with uniparental-disomy using:

       --trio "uniparent_disomy:uniparent_disomy(kid, mom, dad) && hqrv(variant, INFO, 1)" \

Full analysis for trios with unaffected parents

The following 2 command encapsulate the current best-practices for finding denovos, x-linked denovos, recessives for X and autosome (including compound hets. For exomes, these 2 commands will often result in ~20 variants per trio without even considering the variant effect (missense/synonymous/etc). And a majority of those variants are pairs of a compound-het so the number of unique genes is often under < 10. The commands use the javascript functions in a file named my.js those are easily modified to allow more/less stringency or different logic.

## set these variables
vcf=path/to/vcf
ped=path/to/ped
fasta=/path/to/fasta
gff=Homo_sapiens.GRCh37.82.gff.gz # ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/
name=mycohort
## end variables

slivar expr --vcf $vcf --ped $ped \
     --pass-only \
    --js my.js \
    -g gnomad.hg37.zip \
    --trio "denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.001)" \
    --trio "x_denovo:x_denovo(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
    --trio "recessive:recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01)" \
    --trio "x_recessive:x_recessive(kid, mom, dad) && hqrv(variant, INFO, 0.01) && (variant.CHROM == 'X' || variant.CHROM == 'chrX')" \
     | bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -o vcfs/$name.vcf

# find compound hets, including where 1 end is a denovo
slivar expr --vcf $vcf --ped $ped \
    --js my.js \
     -g gnomad.hg37.zip \
     --pass-only \
     --trio "lenient_denovo:denovo(kid, mom, dad) && hqrv(variant, INFO, 0.005)" \
     --trio "lenient_ar:lenient_ar(kid, mom, dad) && hqrv(variant, INFO, 0.01) && INFO.gnomad_nhomalt_controls < 4" \
      | bcftools csq -s - --ncsq 40 -g $gff -l -f $fasta - -O u \
      | slivar compound-hets -s lenient_denovo -s lenient_ar-p $ped > vcfs/$name.comp-hets.vcf

Note that for multi-sample VCFS, -s arguments to compound-hets make sure that only the trio(s) that passed the filters in the upstream slivar command are used to compose the 2 ends of a compound heterozgyote.

Clone this wiki locally