This repository is a directory of training resources available in Cambridge and nearby for basic bioinformatics processing and computational expertise in handling big data. The resources listed here address the needs of researchers in the SOMX and PREM themes at the MRC Biostatistics Unit but should be broadly applicable to other themes and related research in other institutes.
The languages we have a particular focus on are R and Python although we are aware MATLAB has many users and Julia is increasingly popular in this space.
Please feel free to update the information here. We would appreciate your involvement to keep this resource up-to-date. If you know of any additional courses or resources not listed below please either let me know or update the repository directly. If you have taken any of the courses or used any of the training resources below and have opinions or information that are worth sharing, please add them to this document.
There are several initiatives and departments that offer training to members of the university. These include:
-
A Bioinformatics Training initiative that runs courses on basic skills and programming and more specialised courses.
-
The Cambridge Computational Biology Institute runs an MPhil in Computational Biology which has several relevant courses including genome informatics and scientific programming.
-
The Computer Laboratory runs an MPhil in Advanced Computer Science that has several relevant courses including Machine Learning and Algorithms for Data Mining.
Researchers at the university are able to join the Cambridge Big Data Initiative. This includes subscription to a mailing list that provides regular updates on big data related activities including training.
Massive Open Online Courses (MOOCs) are an easily accessible way to pick up skills and expertise. The learning experience depends a lot on the instructor(s) and the quality of the materials they have created. Most courses use a mixture of instructional videos, multiple choice tests and project work. Discussion forums are normally available to encourage students to interact. Courses typically run on set dates perhaps a few times a year although the instructional material may be available at other times. Most courses require a few hours of participation each week. Some require a small fee, others do not. Oftentimes official certification of a student's result requires a fee.
Coursera is one of the largest MOOC websites. A few relevant courses out of the many are:
-
A very popular bioinformatics specialisation run by UCSD consisting of 7 courses.
-
A genomic data science specialisation out of John Hopkins with 8 courses.
-
Applied Machine Learning in Python (U. of Michigan)
-
Deep Learning Specialization (Stanford)
-
Serverless Machine Learning with Tensorflow on Google Cloud Platform (Google)
- EdX
- FutureLearn is the largest UK based MOOC provider.
- MIT Open Courseware
- Stanford Online
The Wellcome Genome Campus which hosts the Wellcome Trust Sanger Institute and EMBL-European Bioinformatics Institute runs some bioinformatics courses. These are run on a commercial basis although bursaries of up to 50% may be available.
The Alan Turing Institute is the national institute for data science and runs 2 hour long classes on all aspects of data science. It is easily accessible from Cambridge by train (<50 minutes).
At the BSU John Reid is available to talk about any of the resources listed above. Additionally Colin Starr is the main point of contact regarding high performance computing.