The CRISPR-Cas9 system is a type II CRISPR system that has rapidly become the most versatile and widespread tool for genome engineering. It consists of two components, the Cas9 effector protein, and a single guide RNA that combines the spacer (for identifying the target) with the tracrRNA, a trans-activating small RNA required for both crRNA maturation and interference. While there are well-established methods for screening Cas effector proteins and CRISPR arrays, the detection of tracrRNA remains the bottleneck in detecting Class 2 systems. Results: We introduce a new pipeline CRISPRtracer for screening and evaluation of tracrRNA candidates in genomes. This pipeline combines evidence from different components of the Cas9-sgRNA complex. The core is a newly developed structural model via covariance models from sequence-structure alignment of experimentally validated tracrRNAs. As additional evidence, we determine the terminator signal (required for the tracrRNA transcription) and the RNA-RNA interaction between the CRISPR array repeat and the 5’-tail of the tracrRNA. Repeats are detected via an ML-based approach (CRISPRidenifier). As additional evidence, we detect the cassette containing the Cas9 (type II CRISPR systems) and Cas12 (type V CRISPR systems) effector protein. Our tool is the first for detecting tracrRNA for type V systems.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
First you need to install Miniconda Then create an environment and install the required libraries in it
First you need to install Miniconda for python 3. Miniconda can be downloaded from here:
https://docs.conda.io/en/latest/miniconda.html
Then Miniconda should be installed. On a linux machine the command is similar to this one:
bash Miniconda3-latest-Linux-x86_64.sh
Then you have to create the CRISPRtracrRNA. The necessary setup is provided in the "environment.yml" file
In order to install the corresponding environment one can execute the following command
conda env create -f environment.yml
CRISPRtracrRNA utilizes CRISPRidentify for CRISPR-array search and CRISPRcasIdentifier for the detection of the cas genes.
You can find the CRISPRidentify tool and its description here
Please make sure that after you downloaded CRISPRidentify its relative path is:
tools/CRISPRidentify/CRISPRidentify/CRISPRidentify.py
You can find the CRISPRcasIdentifier tool and its description here
You need to make two steps:
Firstly, you need to download the CRISPRcasIdentifier tool:
wget https://github.com/BackofenLab/CRISPRcasIdentifier/archive/v1.1.0.tar.gz
tar -xzf v1.1.0.tar.gz
Please make sure that after you downloaded CRISPRcasIdentifier its relative path is:
tools/CRISPRcasIdentifier/CRISPRcasIdentifier/CRISPRcasIdentifier.py
Secondly, you need to download the models:
Due to GitHub's file size constraints, authors made their HMM and ML models available in Google Drive. You can download them here and here. Save both tar.gz files inside CRISPRcasIdentifier's directory. Note: If you experience problems with the cas detection, try to manually extract the tar.gz files.
Additionally, you need to give permissions to the erpin tool:
sudo chmod 777 tools/erpin/erpin
Before running CRISPRtracrRNA you need to activate the corresponding environment.
conda activate crispr_tracr_rna_env
We prepared the test folder which can be used for the test run.
Example of running CRISPRtracrRNA over a folder of files:
python CRISPRtracrRNA.py --input_folder test_input_complete
You can see the help by using the -h
option
python CRISPRtracrRNA.py -h
The only mandatory parameter which has to be specified is the input folder.
--input_folder <path_to_the_folder>
--output_folder [path_to_the_result_folder]
Specifies the path and name of the folder with the output results. If not specified the results will appear in "CRISPRtracrRNA_output" folder
For example: Specifies the path and name of the folder with the output results.
python CRISPRtracrRNA.py --input_folder test_input_complete --output_folder CRISPRtracrRNA_output
--temp_folder_path [path_to_the_result_folder]
Specifies the path and name of the folder with the intermediate results. If not specified the results will appear in "temp" folder
python CRISPRtracrRNA.py --input_folder test_input_complete --temp_folder_path temp
--run_type [execution_type]
Specifies the way CRISPRtracrRNA will be executed. The following options are available: "complete_run" and "model_run"
python CRISPRtracrRNA.py --input_folder test_input_complete --run_type complete_run
python CRISPRtracrRNA.py --input_folder test_input_model_only --run_type model_run
--model_type [model]
Specifies the type of pretrained model. The following options are available: "II" and "V"
python CRISPRtracrRNA.py --input_folder test_input_complete --model_type II
--anti_repeat_similarity_threshold [value]
Specifies the threshold for the similarity between anti-repeat and repeat. The default value is 0.7.
python CRISPRtracrRNA.py --input_folder test_input_complete --anti_repeat_similarity_threshold 0.7
--anti_repeat_coverage_threshold [value]
Specifies the threshold for the coverage of anti-repeat. The default value is 0.6.
python CRISPRtracrRNA.py --input_folder test_input_complete --anti_repeat_coverage_threshold 0.6
Weights which can be used for the ranking of the results. The default values are:
Description | Default value |
---|---|
weight_crispr_array_score | 0.5 |
weight_anti_repeat_sim | 0.5 |
weight_anti_repeat_coverage | 0.5 |
weight_anti_sim_coverage | 0.5 |
weight_interaction_score | 0.6 |
weight_model_hit_score | 0.9 |
weight_terminator_hit_score | 0.9 |
weight_consistency_orientation | 0.1 |
weight_consistency_anti_repeat_tail | 0.1 |
weight_consistency_tail_terminator | 0.1 |
--perform_type_v_anti_repeat_analysis [value]
python CRISPRtracrRNA.py --input_folder test_input_model_only_type_v --model_type v --run_type model_run --perform_type_v_anti_repeat_analysis True