Skip to content
Brian Haas edited this page Jul 5, 2023 · 34 revisions

CTAT-LR-Fusion : Detect Fusion Transcripts from Long Reads (PacBio Iso-seq or ONT transcriptomes)

CTAT-LR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT) used for detecting fusion transcripts from long-read transcriptome sequencing data, including PacBio Iso-seq and Oxford Nanopore Technology sequenced transcriptomes. If matched Illumina RNA-seq data are available, these can be leveraged as well for additional exploration and quantification of fusions initially detected via long reads.

CTAT-LR-Fusion was developed in the Broad Institute's Methods Development Laboratory (MDL) for characterizing long read transcriptome sequences such as derived from MAS-seq.

Installing CTAT-LR-Fusion

Obtaining CTAT-LR-Fusion software

Docker and Singularity images are available and recommended.

If you would prefer to install from source code, download the latest 'FULL' release tarball from the CTAT-LR-Fusion release site. Unpack it, and run 'make' in the base installation directory.

Obtaining and configuring the CTAT Genome Lib

The CTAT genome lib is the same used for other CTAT tools and can be downloaded from https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/. The ctat genome lib software compatibility matrix indicates the version of STAR to use if you have companion Illumina short reads.

Configuring the CTAT Genome Lib for CTAT-LR-Fusion

The ctat-LR-fusion software comes with a customized version of minimap2 named ctat-minimap2, and CTAT-LR-Fusion requires a minimap2 index of the reference genome. To build this, initially run ctat-LR-fusion like so:

ctat-LR-fusion -T long_reads.fq \
               --genome_lib_dir  /path/to/ctat_genome_lib_build_dir \
               --prep_reference

and it will first build the minimap2 genome index before running ctat-LR-fusion to find fusion transcripts.

If you run with --prep_reference_only, it will stop after building the index.

For future runs, drop the --prep_reference argument, as the index only needs to be built once. If you forget, no worries. It'll only build it once anyway.

Running CTAT-LR-Fusion

Once you have the ctat genome lib installed and configured as above.

For long reads, you need either a FASTA or FASTQ formatted file. Then, run ctat-LR-fusion like so:

ctat-LR-fusion -T long_reads.fq \
               --genome_lib_dir  /path/to/ctat_genome_lib_build_dir 

If you have the ctat genome lib dir set up as an environmental variable CTAT_GENOME_LIB, then you don't need to specify --genome_lib_dir, and only need to specify -T for the long reads.

If you have reads that align to the reference genome with <90% sequence identity, adjust the --min_per_id parameter (default: 90) accordingly.

Includuing Illumina RNA-seq

If you additionally have Illumina RNA-seq for the sample, you can include that as well like so:

ctat-LR-fusion -T long_reads.fq \
               --genome_lib_dir  /path/to/ctat_genome_lib_build_dir  \
               --left_fq illumina_reads_1.fq \
               --right_fq illumina_reads_2.fq

ctat-LR-fusion does not find additional fusions based on short reads... it will only additionally examine short read support for those fusion gene pairs initially detected via long read sequences. However, it will identify fusion splicing isoforms that are uniquely supported by Illumina short read data.

See the full usage info (via --help or no parameters) for additional options and configurations.

Fusion Outputs

The output files consist of the following:

  • ctat-LR-fusion.fusion_predictions.tsv : the final fusion predictions including names for the evidence reads. See the .abridged version for simpler output lacking the read names.

  • ctat-LR-fusion.fusion_inspector_web.html : the results in an interactive igv-reports for exploring the evidence supporting each fusion.

A preliminary list of fusions before any filtering is performed to generate the final list is provided as file 'ctat-LR-fusion.fusion_predictions.preliminary.tsv'. This is useful for additional exploration and for troubleshooting purposes.

A screenshot of the interactive fusion html view is shown below:

In the image above, we have PacBio Iso-seq reads supporting the fusion, and below Illumina junction reads and spanning fragments that also support this fusion. If you only have long reads, the Illumina tiers will simply be empty.

Clone this wiki locally