Computational analysis of the evolutionarily conserved Missing In Metastasis / Metastasis Suppressor 1 gene predicts novel interactions, regulatory regions and transcriptional control
Petar Petrov1*, Alexey V. Sarapulov1, Lel Eöry 2, Cristina Scielzo3,4, Lydia Scarfò3,4,5, Jacqueline Smith2, David W. Burt6 & Pieta K. Mattila1
- Institute of Biomedicine, and MediCity Research Laboratories, University of Turku, Tykistökatu 6A, 20520, Turku, Finland.
- Division of Genetics and Genomics, The Roslin Institute and R(D)SVS, University of Edinburgh, Roslin, Easter Bush campus, Midlothian, EH25 9RG, United Kingdom.
- Unit of B Cell Neoplasia, Division of Molecular Oncology, IRCCS, San Raffaele Scientific Institute, Milano, Italy.
- Università Vita-Salute San Raffaele, Milan, Italy.
- Strategic Research Program on CLL, Division of Experimental Oncology, IRCCS, San Raffaele Scientific Institute, Milano, Italy.
- University of Queensland, St. Lucia, QLD, 4072, Australia.
2019 Scientific Reports https://www.nature.com/articles/s41598-019-40697-1
- Correspondence and requests for the scripts should be addressed to Petar Petrov ([email protected])
These simple shell scripts were tested on Slackware GNU/Linux (http://www.slackware.com/). All prerequisites were installed from the scripts at SlackBuilds.org (http://slackbuilds.org).
You need the following:
- RepeatMasker (Screen DNA sequence for interspersed repeats)
- BedTools (A powerful toolset for genome arithmetic)
- MEME-suite (Motif based sequence analysis tools)
- Motif databases (Used by the MEME Suite)
- Download: http://meme-suite.org/doc/download.html
- Location:
/var/lib/meme-suite/motif_databases
Scripts are divided into two folders (check each sub-folder for detailed instructions):
This folder contains scripts that were used in our study to acquire data for transcription factor (TF) binding sequences. We used TFs reported for H. sapiens to also screen the corresponding genomic regions of other species. Contents:
- 00_batchFromNCBI/
batchFromNCBI
: Download chromosomes from NCBI for a list of speciesbatchGunzip
: Batch extract gzipped chromosomes
- 01_memePrepKnown/
memePrepKnown
: Extract and sort corresponding genomic regions from multiple species
- 02_extractTFmatrix/
extractTFmatrix
: Extract PFM from (4) databases for a list of transcription factors
- 03_mastOnly/
mastOnly
: Run MAST with selected PFM on genomic regions of interest
- 04_mastBestSort/
mastBestSort
: Create a CVS table of the best TF hits.
This folder contains scripts that were used in our study to predict for novel TF binding sequences. We used TFs predicted for H. sapiens to also screen the corresponding genomic regions of other species. Contents:
- 01_memePrepUnknown/
memePrepUnknown
: Extract and sort corresponding genomic regions from multiple speciessequenceRepeatMasker
: Mask repeates on the sorted genomic regions
- 02_searchUnknown/
novelTF
: Search for novel transcription factor binding sites by MEME-suite
- 03_mastOnlyUnknown/
mastOnlyUnknown
: Run MAST with selected PFM on genomic regions of interest
- 04_mastBestSortUnknown/
mastBestSortUnknown
: Create a CVS table of the best TF hits
The scripts use /tmp
for their working and output dirs. The only exception is the scripts found in 00_batchFromNCBI, which use /var/tmp/chromosomes
.