Skip to content
Brian Haas edited this page Feb 24, 2016 · 67 revisions

FusionInspector: In silico Validation of Fusion Transcript Predictions

FusionInspector is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). FusionInspector assists in fusion transcript discovery by performing a supervised analysis of fusion predictions, attempting to recover and re-score evidence for such predictions.

Given a list of candidate fusion genes (as derived from running any fusion transcript prediction tool, such as Prada, FusionCatcher, SoapFuse, TophatFusion, DISCASM/GMAP-Fusion, STAR-Fusion, or other), FusionInspector extracts the genomic regions for the fusion partners and constructs mini-fusion-contigs containing the pairs of genes in their proposed fused orientation. The original reads are aligned to these candidate fusion contigs; fusion-supporting reads that would normally align as discordant pairs or split reads should align as concordant 'normal' reads in this fusion-gene context. Those reads supporting each fusion (spanning fragments and fusion-breakpoint-containing reads) are identified, reported, and scored accordingly.

Optionally, Trinity de novo transcriptome assembly can be executed as part of the FusionInspector routine in order to de novo reconstruct fusion transcripts from the mapped reads.

Outputs generated by FusionInspector are easily viewed in a genome browser such as IGV so that the evidence for fusion transcripts can be manually assessed for read and alignment quality.

Installation Requirements

Software requirements:

FusionInspector requires the following companion software tools to be installed:

The cpanm tool is useful for local installations of these.

Be sure STAR, samtools, and bgzip are available via your PATH env var setting, and create env var TRINITY_HOME set to the Trinity installation directory.

Data requirements

FusionInspector is a component of the Trinity Cancer Transcriptome Analysis Toolkit, and as with the other fusion-transcriptome components of CTAT, FusionInspector leverages the FusionFilter data resources. Visit the FusionFilter website for links to existing data resources for human fusion transcript detection, or instructions on how to build your own data resources for use with CTAT.

Running FusionInspector

FusionInspector requires one or more lists of fusion candidates, with each formatted like so, as geneA--geneB:

B3GNT1--NPSR1
ZNF709--DYRK1A
ZNF844--NCBP2
RBX1--HAPLN2
FAM180B--TRIM60
CASP9--ADCYAP1
HS3ST3A1--C1QTNF2
OPTC--AP000347.4
GRIA2--ZW10

We'll call the file containing this list 'fusions.listA.txt'. Let's assume we have another such list from another source, and we'll call it 'fusions.listB.txt'.

It's ok to have a tab-delimited file containing other attributes (such as the raw output from some fusion-prediction tool) as long as the first column fits the above format.

Given this list of fusions, we'll run FusionInspector like so:

FusionInspector --fusions fusions.listA.txt,fusions.listB.txt \
                --genome_lib /path/to/CTAT_genome_lib \
                --left_fq rnaseq_1.fq --right_fq rnaseq_2.fq \
                --out_dir my_FusionInspector_outdir \
                --out_prefix finspector \
                --prep_for_IGV

Output of FusionInspector

The final output of FusionInspector is a file called 'finspector.fusion_predictions.final', which you'll find in the --out_dir specified. The format of this file is tab-delimited and contains the following fields and formatting:

0       #fusion_name
1       JunctionReads
2       SpanningFrags
3       Splice_type
4       LeftGene 
5       LeftBreakpoint
6       RightGene
7       RightBreakpoint
8       JunctionReads
9       SpanningFrags
10      Annotations


0       HS3ST3A1--C1QTNF2
1       106
2       1254
3       ONLY_REF_SPLICE
4       HS3ST3A1
5       chr17:13503848:-
6       C1QTNF2
7       chr5:159776788:-
8       fragBp1383/1,fragBp1365/1,... # the exact reads identified as breakpoint-junction reads
9       fragBp692,fragBp277,fragBp389,... # the names of the fragments containing breakpoint-spanning paired reads.
10      .

The .final files can be large and difficult to navigate due to all the evidence reads being described. Instead, try navigating the .final.abridged file, which contains all the above information, but excludes the names of the reads.

When the '--prep_for_IGV' parameter is specified, a number of files as shown below are generated for viewing in the IGV (or other) genome browser:

finspector.fa : the candidate fusion-gene contigs
finspector.bed : the reference gene structure annotations for fusion partners
finspector.junction_reads.bam : alignments of the breakpoint-junction supporting reads.
finspector.spanning_reads.bam : alignments of the breakpoint-spanning paired-end reads.

An example of viewing a fusion candidate with recovered read evidence using IGV is shown below.

Example data and execution

See the 'test/' subdirectory and examine the README.txt file included. Example data and command execution info are provided.

User support

Contact us on our google group https://groups.google.com/forum/#!forum/trinity_ctat_users