Comparison of tick cementome composition was performed by comparing published sialome and cementome datasets of a variety of species.
Due to the fact that the datasets analysed were originated from various published research, there is a wide range of differences and uneveness in the data.
To create some consistency across the numerous datasets, only proteins with an existing Uniprot ID were included in the data analysis process.
A template script was written to search for fasta files matching each Uniprot ID stored within an input file.
Once a sequence for each protein was identified, this would be appended to a new fasta file, which would then be used in Orthofinder.
Obtain a list of shared orthogroups by removing rows with empty values from the file "Orthogroups.tsv" (generated by Orthofinder). Each row is filled with names of the proteins stored into the orthogroups, separated in columns by species name.
All scripts and data wrangling was written and executed by Areda Elezi.
QMUL MSc Bioinformatics 2020/21