-
Notifications
You must be signed in to change notification settings - Fork 62
/
Copy pathactivity01-bechdel-test.Rmd
141 lines (100 loc) · 5.5 KB
/
activity01-bechdel-test.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: "Bechdel Test"
author: "Name"
output: github_document
---
Below is a section that starts with three backticks (i.e, the key above your Tab key) - not to be confused with a single quotation mark.
This is called an R code chunk.
In this code chunk, I am setting some global options for this `.Rmd` report: forcing your report to knit even if there are errors and the figure dimensions.
For the time being you can, "pay not attention to the code between lines 12 and 14."
```{r setup, include=FALSE}
knitr::opts_chunk$set(error = TRUE, fig.width = 6, fig.asp = 0.618)
```
In this mini analysis we work with the data used in the FiveThirtyEight story titled ["The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women"](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/).
This activity is adapted from [Dr. Mine Çetinkaya-Rundel](http://www2.stat.duke.edu/~mc301/)'s [STA 199](http://www2.stat.duke.edu/courses/Spring18/Sta199/) course.
I do **not** expect the R code to make sense yet, but you should be able to interpret the output and use your Markdown skills to provide a response to the questions that I ask.
## Data and packages
In RMarkdown files, we can interweave narrative and R code chunks.
For example, in the code chunk below we are loading two packages: `{fivethirtyeight}` and `{tidyverse}`.
There are a number of ways to run this code, but for the time being **knit** your document (in the tool bar of this Rmd file, click on ![knit icon](README-img/knit-icon.png) **Knit**).
```{r load-packages, message=FALSE}
library(fivethirtyeight)
library(tidyverse)
```
You might see an warning message at the top of your `.Rmd` file that says, "Package fivethirtyeight required but is not installed."
In your **Console** (lower-left-hand area) type `install.packages("fivethirtyeight")`.
You only need to install a package once and this should always be done in your **Console**.
That is, do not install packages within RMarkdown files!
Another nice feature of RMarkdown files is that we can write R code within a narrative block.
The next paragraph has three inline R code calls that will display the results of the code when you knit this document.
The dataset contains information on `r nrow(bechdel)` movies released between `r min(bechdel$year)` and `r max(bechdel$year)`.
However we will focus our analysis on movies released between 1990 and 2013.
```{r}
bechdel90_13 <- bechdel %>%
filter(between(year, 1990, 2013))
bechdel90_13
```
How many movies are there?
**Response**:
The financial variables that we will focus on are:
- `budget_2013`: Budget in 2013 inflation adjusted dollars
- `domgross_2013`: Domestic gross (US) in 2013 inflation adjusted dollars
- `intgross_2013`: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars
Also, we will use the `binary` and `test_clean` variables for grouping.
## Analysis
First, we will take a look at how median budget and median gross (both domestic and international) vary by whether the movie passed the Bechdel test.
```{r}
bechdel90_13 %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE))
```
What patterns do you notice?
**Response**:
Next, we will take a look at how median budget and median gross vary by a more granular (detailed) indicator of the Bechdel test result (where `ok` = passes test, `dubious`, `men` = women only talk about men, `notalk` = women don't talk to each other, `nowomen` = fewer than two women).
```{r}
bechdel90_13 %>%
group_by(clean_test) %>%
summarise(med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE))
```
What patterns do you notice?
**Response**:
In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we first create a new variable called `roi` as the ratio of the gross to budget.
However, I missed part of the calculation.
Update the missing variable (i.e., replace the `___` after the minus sign with the appropriate variable name) in the code chunk below so that we obtain the correct `roi`.
```{r}
bechdel90_13 <- bechdel90_13 %>%
mutate(roi = (intgross_2013 + domgross_2013 - ___) / budget_2013)
```
Now we can see which movies have the highest return on investment.
```{r}
bechdel90_13 %>%
arrange(desc(roi)) %>%
select(title, clean_test, binary, roi, budget_2013, intgross_2013)
```
Below is a visualization of the return on investment by test result, however it's difficult to see the distributions due to a few extreme observations.
```{r}
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(title = "Return on investment vs. Bechdel test result",
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result")
```
Zooming in on the movies with `roi < 10` provides a better view of how the medians across the categories compare:
```{r}
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
ylim(0, 10) +
labs(title = "Return on investment vs. Bechdel test result",
subtitle = "For ROI less than 10",
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result")
```
What patterns do you notice?
**Response**:
Go back to the `README` document