Skip to content

ai4all-sfu/2018_bio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inferring the infection pathway of influenza

2018 SFU Invent the Future Summer Scholar Program

In this project, students will learn how computational biology can allow us to 1) infer ancestral strains and 2) predict future strains of viruses -- in order to understand the infection pathway of influenza and preventatively make vaccines before outbreaks occur.

Phylogenetic trees are widely used in biology to represent evolutionary relationships between species, such as how wolves are related to domesticated dogs. In these trees, leaves represent currently living species, and the internal branches indicate speciation events, where new species were thought to be created. The overall structure and shape of a phylogenetic tree reveals useful information such as the rate of new species formation and extinction. Tree balance usually refers to the structure of the tree, and branch lengths show the time or genetic distance between branching or speciation events.

Phylogenetic trees can also be used to extract information from viral/bacterial/vector speciation events generated by disease outbreaks. This information can be used to analyze the rate and patterns of how new species of virus/bacteria/vectors evolve, providing valuable information towards the development of vaccinations and and other preventative measures.

Homepage: https://sites.google.com/view/ai4all-sfu2018/projects/bioinformatics

Github repository: https://github.com/ai4all-sfu/bio

Slides: https://docs.google.com/presentation/d/1XpPjTZGQP1KkpAeiDjAUfz-uCqQMs2x1_a4lo_aJa3o/edit?usp=sharing

usage

  1. install mafft (for multiple sequence alignment) @ https://mafft.cbrc.jp/alignment/software/

  2. download this repository and follow the R code in index.html; remember to change the variable " root " to the local folder you downloaded this repository to, and if you are starting anew, set " result_time_="" "

folder/file structure

data folder contains processed data

  • FASTA.fa: the HA gene sequence in influenza subtype A/H3N2 from years 1997 - 2017 (sorted by date) (n = 10370)
  • meta.csv: metadata merged, and sorted by date the same way as FASTA.fa
  • alignment.fa: viral strain sequence alignments of mafft made using FASTA.fa
  • Aux_data.csv: contains all the strain name and date from FASTA.fa
  • FinalH3N2: maximum likelihood generated phylogeny made from FASTA.fa
  • df.csv: features for each strain/clade used for prediction

result/<date>_<time> folder contains results separated by time of making (output of index.Rmd)

  • ind.csv: randomly sampled strain names of 200 recent viral sequences
  • FASTA_anc.fa: reconstructed ancestral sequences
  • FASTA_all.fa: data/FASTA.fa (minus the 200 sequences) + reconstructed ancestral sequences
  • alignment_anc.fa: alignment of sequences in FASTA_all.fa
  • dm_anc.Rdata: distance matrix made using alignment_anc.fa

extra packages

if you are having issues installing packages in the script, install the following packages in the order listed (note: install miniconda here); courtesy of raquel

terminal

source activate py27
conda install -c r r-e1071 
conda install -c r r-igraph 
conda install -c geraldmc r-phylotop 
conda install -c r r-nloptr
conda install -c r r-xml 

r

install.packages('phangorn') 
install.packages('phytools')
install.packages('nloptr') 
install.packages('lme4') 
install.packages('pbkrtest') 
install.packages('car') 
install.packages('NHPoisson') 
install.packages('RNeXML') 
install.packages('phylobase') 
install.packages('phyloTop') 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages