Prof: Dr. Thomas Parchman; SFB 209; [email protected]
Co-instructor: Trevor Faske; [email protected]
Workshop hours: 9:00-10:45 AM, August 1-5th
Modern biology, and other fields of science, are increasingly shaped by data sets that are orders of magnitude larger than life scientists have traditionally been trained to work with. For example, major recent advances in DNA sequencing technology have created the ability to affordably generate data spanning billions of DNA sequences in an extremely short period of time. Similar leaps in data acquisition technology are transforming other scientific disciplines as well, including but not limited to geography, communications, economics, chemistry, and physics. Spreadsheet software and graphical user interface statistical analysis packages (e.g., Excel, Statistica, JMP) are useless for the now common scale of data. The ability to manipulate, process, and analyze large data sets with basic programming and data science skills should accelerate the research productivity and success of graduate students in this era.
Last year we introduced basic computational tools, focusing on Unix and Python, to support data proficiency. This year we will introduce and/or discuss a more idiosyncratic group of tools to facilitate code reproducability, data visualization, communication of results, and will discuss opportunuties outside of academia for those with computational and analytical skillsets.
By the end of the module, students should have learned enough to feel enabled and motivated to learn more about the concepts we covered.
Topics we will explore/introduce this week:
-
Basic understanding of using markdown in R to organize code and visualize results
-
Introduction to data visualization using ggplot2 in R
-
Overview of High Performance Computing (HPC), with introduction to UNR HPC resources
-
Job opportunities outside of academica
-
Computer with Unix operating system Students with Mac computers already have machines running Unix and are ready to go. Students without Mac computers will have the option of checking out a Mac laptop for the semester, or will need to figure out how to install Linux or a Linux emulator on their computer.
-
Installed text editor with syntax recognition Students should have installed a text editor that will recognize syntax from code written for Unix, Python, Perl, etc. We suggest BBedit (for mac users), Visual Studio Code, or Sublime. All are free and easy to locate, download, and install.
-
Supplemental primers, readings and assignments are provided on the workshop github page.
-
Rstudio installed, for enhanced use of Rmd Rstudio download and install
- Practical computing for biologists Haddock, S.H.D. and Dunn, C.W., 2011. Sunderland, MA, USA: Sinauer Associates. The book is very useful for both Unix and Python.
We will meet from 9:00-10:45 the mornings of Aug 1, 2, 4, and 5. At the beginning of each session, we will introduce new concepts and material that will form the basis of the exercises/tutorials we will work through during that session. We will cover questions regarding current or previous material, and then students will spend at least half of each class working on writing code independently or in small groups. Students will get the most out of each session if they review the primers and outlines of concepts ahead of time.
The material for each day of the workshop will be organized in separate directories on workshop github page. Each of these directories will contain the slides that we will use to introduce material, a primer covering example Unix and Python code along with explanations, and a worksheet of programming practice exercises. There are also general directories on the repository with supplementary resources for R markdown, data visualization, and HPC systems, including cheat sheets, tutorials, and recommended resources for learning more.
While you can download indidvidual files from github using your preferred web browser, you can also use the UNIX command to access github as well. Using git commands can get complicated very quickly, it is a very useful skill to have for reproducibility, tracking changes, and collaboration. We do not go over git in this course but there are many tutorials online (http://swcarpentry.github.io/git-novice/).
For this course, downloading individual files might suffice. But if you would like to download the entire repo, you can do so through the command line using the below command:
hint: make a directory somewhere on your computer for this workshop. Run below command in that directory.
git clone https://github.com/tparchman/GAIN_summer2022
Tentative Workshop Schedule. All contents are subject to change.
Date | Topic | Assignment |
---|---|---|
Aug. 1 | R markdown | Tutorial |
Aug. 2 | Data Visualization | ggplot2 R tutorial |
Aug. 4 | HPC | Lecture, demo, discussion |
Aug. 5 | Life outside of Academia | Questions for Dr. Johan Grahnen |
--------- | --------------- | -------------------------------- |
|