-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01_Session2_excel_and_import.Rmd
99 lines (67 loc) · 1.83 KB
/
01_Session2_excel_and_import.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
title: "Biochemistry R training"
subtitle: "Session 1a: Data organisation in Excel"
author: "James Boocock<BR>Department of Biochemistry"
date: "22 February 2016"
output: ioslides_presentation
---
## Data organisation.
Today, we are going to talk about:
- Good data entry practices
- How to avoid common formatting mistakes
- Dates
- Basic quality control
- Exporting data from spreadsheets
We are going to follow these lessons developed by data carpentry:
http://www.datacarpentry.org/spreadsheet-ecology-lesson/00-intro.html
## URL for the lesson files.
Follow the link below.
```bash
bit.ly/1oVvND1
```
View raw
Extract the zip file
## R data wrangling
###Benefits
- Needed for large datasets too big for excel e.g Sequencing data
- Can automate the quality control procedure.
- Reproducible.
###Downsides
- Pre-formatting required
- Excel sometimes better for simple wrangling
You will learn more about these possibilities this afternoon!
## Importing data into R.
```{r}
# read in the data file
setwd("~/Biochem_R_training/data/")
dat = read.csv("survey_data_spreadsheet_messy_fixed.csv",header=T)
# Summarise some columns
summary(dat$Sex)
summary(dat$Weight)
```
## Visualise your data
```{r}
# Hmmm let's see what plot does with weight, not that useful.
plot(dat$Weight)
```
## Visualise your data
```{r}
# A histogram is what we want
hist(dat$Weight, breaks=10)
```
## Plotting variables against each other
```{r}
# Let's see the weight by species.
# you can clearly see that DS is much heavier on average.
plot(dat$Weight ~ dat$Species)
```
## Plotting variables against each other
```{r}
# What about the relationship between plot and weight
plot(dat$Weight ~ dat$Plot)
```
## Box and whisker
```{r}
# We can make R give us what we want.
boxplot(dat$Weight ~ dat$Plot)
```