Skip to content

A toolkit to harmonize and filter structural variations across methods and samples.

Notifications You must be signed in to change notification settings

Han-Cao/HarmoniSV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HarmoniSV

A toolkit to harmonize and filter structural variations across methods and samples.

Important: The document is under construction. We have tested HarmoniSV to perform population-scale SV calling using SV calls from Sniffles2, cuteSV, and SVIM. It should be able to work with any SV callers whose output follows VCF specification. Please open an issue for questions or bug reports.

Features

  • Harmonize SVs discovered by different SV calling methods
  • Filter high-confidence SVs with a random forest classifier
  • Fast VCF manipulation, annotation, and conversion

Installation

git clone https://github.com/Han-Cao/HarmoniSV.git

Dependencies

HarmoniSV is written in python3.8. The following python modules are required:

pysam
pandas
numpy
matplotlib
scikit-learn
pyranges

Quick start

cd src/harmoniSV
./harmonisv

HarmoniSV: A toolkit to harmonize and filter structural variantions across methods and samples
Version: 0.1.0

Usage: harmonisv <command> [options]

Commands:

 -- VCF manipulation
    harmonize          Harmonize SV VCFs from across samples and SV calling methods
    harmonize-header   Harmonize VCF headers
    sample2pop         Convert single-sample VCF to multi-sample VCF
    intersect          Intersect SVs with genomic features

 -- Analysis on SV callset
    represent          Select the representative SV from merged SVs
    genotype           Genotype SVs across SV genotyping methods
    filter             Random forest filter for SVs
    concordance        Calculate genotype concordance between two VCFs


Note:
    1. All input VCFs MUST follow the VCF specification
    2. Some commands assume specific variant ID format to index SVs from different methods and samples, 
       please check the required ID format before you use
    3. The input/output VCF format (i.e., vcf, vcf.gz, bcf) will be automatically detected. However, a 
       temporary uncompressed VCF file will be generated if the output is vcf.gz or bcf.
     

For help on a specific command, run:
    harmonisv <command> -h

About

A toolkit to harmonize and filter structural variations across methods and samples.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published