-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathPart1-ggplot2_intro.Rmd
119 lines (79 loc) · 3.2 KB
/
Part1-ggplot2_intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
title: "Part 1 - Intro to the gRammar of gRaphics with `ggplot2`"
author: "Chester Ismay"
output:
html_document:
code_download: true
code_folding: hide
---
```{r include=FALSE}
library(tidyverse)
knitr::opts_chunk$set(message=FALSE)
filter <- dplyr::filter
knitr::opts_chunk$set(warning=FALSE, message=FALSE, fig.width=9.5, fig.height=4.5, comment=NA, rows.print=16, out.width = "\\textwidth")
theme_set(theme_gray(base_size = 20))
```
In this section, we'll discuss the Grammar of Graphics developed by [Leland Wilkinson](http://www.powells.com/book/the-grammar-of-graphics-9780387245447) and implemented in R via [Hadley Wickham](http://www.powells.com/book/ggplot2-elegant-graphics-for-data-analysis-9783319242750/68-428). We'll see how this is applied to a scatterplot with and without a regression line. These ideas will then be extended in Part 2 of the workshop.
### The Grammar of Graphics
![](figure/gap.png)
- What are the variables here?
- What is the observational unit?
- i.e., what is the THING being measured?
- How are the variables mapped to aesthetics?
---
## What is a statistical graphic?
### A `mapping` of `data` variables
### to `aes()`thetic attributes
### of `geom_`etric objects.
---
**Back to basics**
Consider the following data in tidy format:
```{r}
simple_ex <-
data_frame(
A = c(1980, 1990, 2000, 2010),
B = c(1, 2, 4, 5),
C = c(3, 2, 1, 2),
D = c("low", "low", "high", "high")
)
simple_ex
```
- Sketch the graphics below on paper, where the `x`-axis is variable `A` and the `y`-axis is variable `B`
1. A scatterplot
1. A scatterplot with fitted least-squares regression line
Intermediate folks:
- 3. A scatter plot where the `color` of the points corresponds to `D` and the `size` of the points corresponds to `C`
- 4. Only show a regression line of color "goldenrod" (no points and also no error bounds)
---
1. A scatterplot
```{r}
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) +
geom_point()
```
2. A scatterplot with fitted least-squares regression line
```{r}
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) +
geom_point() +
geom_smooth(method = "lm")
```
**Intermediate**
3. A scatter plot where the `color` of the points corresponds to `D` and the `size` of the points corresponds to `C`
```{r}
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) +
geom_point(mapping = aes(color = D, size = C))
```
4. Only show a regression line of color "goldenrod" (no points and also no error bounds)
```{r}
ggplot(data = simple_ex, mapping = aes(x = A, y = B)) +
geom_smooth(method = "lm", se = FALSE, color = "goldenrod")
```
---
## Your Task
Recreate the gapminder plot shown at the beginning of this workshop (and below) using `ggplot2` and the `gapminder` data frame in the `gapminder` package. The [Data Visualization Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf) from RStudio may be helpful.
**Note**: To focus on only the rows in the data frame corresponding to 1992 we use the `filter` function from `dplyr` that we will discuss in Part 3 of this workshop.
![](figure/gap.png)
```{r}
library(gapminder)
gap1992 <- gapminder %>% filter(year == 1992)
#Space for your answer here.
```