-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
257 lines (158 loc) · 16.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
output: md_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
# OpenCaseStudies
<!-- badges: start -->
[![render-README](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/workflows/render-README/badge.svg)](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/actions)
[![render-index](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/workflows/render-index/badge.svg)](https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard/actions)
<!-- badges: end -->
### Important links
- HTML: https://www.opencasestudies.org/ocs-bp-school-shootings-dashboard
- GitHub: https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard
- Bloomberg American Health Initiative: https://americanhealth.jhu.edu/open-case-studies
- Dashboard: https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/
- GitHub repo for dashboard: https://github.com/opencasestudies/ocs-bp-school-shootings-flexdashboard
### Disclaimer
The purpose of the [Open Case
Studies](https://opencasestudies.github.io) project is **to demonstrate
the use of various data science methods, tools, and software in the
context of messy, real-world data**. A given case study does not cover
all aspects of the research process, is not claiming to be the most
appropriate way to analyze a given dataset, and should not be used in
the context of making policy decisions without external consultation
from scientific experts.
### License
This case study is part of the [OpenCaseStudies](https://opencasestudies.github.io) project.
This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 ([CC BY-NC 3.0](https://creativecommons.org/licenses/by-nc/3.0/us/)) United States License.
### Citation
To cite this case study:
Wright, Carrie and Ontiveros, Michael and Breshock, Michael and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). [https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard](https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard). Open Case Studies: School Shootings in the United States (Version v1.0.0).
### Acknowledgments
We would like to acknowledge [Elizabeth Stuart](https://www.jhsph.edu/faculty/directory/profile/1792/elizabeth-a-stuart) for assisting in framing the major direction of the case study.
We would also like to acknowledge the [Bloomberg American Health Initiative](https://americanhealth.jhu.edu/) for funding this work.
### Reading Metrics
The total reading time for this case study was calculated with [koRpus](https://github.com/unDocUMeantIt/koRpus): **About 110 minutes**
The Flesch-Kincaid Readability Index was also calculated with [koRpus](https://github.com/unDocUMeantIt/koRpus): **Grade 9, Age 14**
### Title
School Shootings in the United States
### Motivation
According to this [report](https://siepr.stanford.edu/sites/default/files/publications/19-036.pdf) school shootings can have long lasting impacts on those that witness them. This article states that:
> Over **240,000** American students experienced a school shooting in the last two decades.
Therefore as the number of school shootings apppears to be increasing, it is useful to better understand the characteristics about these shootings to better understand why they happen and how to avoid them in the future. Thus we will make a dashboard to display this data.
The dashboard created in this case study can be found [here](https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/).
### Motivating questions
<b><u> Our main questions: </u></b>
1) What has been the yearly rate of school shootings and where in the country have they occurred in the last 50 years (from January 1970 to June 2020)?
2) How many individuals are typically killed in a shooting?
3) What were the characteristics of the shooters: How often was a shooter male? How often did a shooter attempt or commit suicide?
### Data
In this case study we will be using data related to school shootings in the US from 1970 to June 2020 from the [**Center for Homeland Defense and Security (CHDS)**](Center for Homeland Defense and Security (CHDS)) [**K-12 Shool Shooting Database**](https://www.chds.us/ssdb/dataset/).
Their methods for identifying and authenticating incidents are outlined [here](https://www.chds.us/ssdb/methods/).
Previously according to their website:
*"The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. **All of the information is based on open-source information and 3rd party reporting... and may include reporting errors.**"*
#### Learning Objectives
The skills, methods, and concepts that students will be familiar with by the end of this case study are:
<u>**Data Science Learning Objectives:**</u>
1. Importing text from a Google Sheets document (`googlesheets4`)
2. Converting date formats (`lubridate`)
3. Geocoding data (`ggmap`) and creating a jitter for geocoded data on a map (`SF`)
4. How to reshape data by pivoting between "long" and "wide" formats and drop rows with `NA` values (`tidyr`)
5. How to create data visualizations with `ggplot2`
6. An introductory understand of R Markdown
7. How to create an interactive table (`DT`)
8. How to create a map (`leaflet`)
9. How to create an interactive dashboard with `flexdashboard` and `shiny`
<u>**Statistical Learning Objectives:**</u>
1. Calculating percentages for data with missing values
#### Data import
In this case study we demonstrate how to import data from Google Sheets, however we have also downloaded the data as a CSV file and we demonstate how to import the data in this format as well.
#### Data wrangling
This case study covers the differences between the various `*_join()` functions of the `dplyr` package, as well as use of the `case_when()` function to recode data based on particular evaluations of existing values.
We also cover removing `NA` values with the `drop_na()` function of the `tidyr` package, and selecting the last few variables of a tibble using the `last_col()` function. We cover using the `tidyr` functions such as `pivot_wider()` and `pivot_longer()` for reshaping data, as well as arranging levels of factors using the `forcats` package.
Finally, this case study also covers a few of the `stringr` functions to manipulate character strings, including `str_c()`, `str_detect()`, and `str_remove()` as well as some of the functions of teh `lubridate` package for working with data related to dates.
We also cover how to geocode data using the `ggmap` package and how to modify duplicated locations using the `SF` pacakge so as to avoid overlapping points on a map.
#### Data Visualization
In this case study we show how to make faceted plots where each plot has its own y-axis label (which is actually a bit tricky), we show how to make pie charts with `ggplot2` and we demonstrate how to use the `waffle` package to create a waffle plot. We also discuss why in some cases a pie chart might not be a good choice.
We also show how to create an interactive table with the `DT` package, as well as how to create an interactive map with the `leaflet` package.
### Analysis
This case study does not really include an analysis like other case studies, but it does domonstrate how to create simple percentage statistics using a data with missing values, as well as how to properly report such percentages.
### Other notes and resources
The dashboard created in this case study can be found [here](https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/).
[RStudio](https://rstudio.com/products/rstudio/features/){target="_blank"}
[Cheatsheet on RStuido IDE](https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf){target="_blank"}
[Other RStudio cheatsheets](https://rstudio.com/resources/cheatsheets/){target="_blank"}
[RStudio projects](https://r4ds.had.co.nz/workflow-projects.html)
[Tidyverse](https://www.tidyverse.org/){target="_blank"}
[Piping in R](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html){target="_blank"}
[String manipulation cheatsheet](https://rstudio.com/resources/cheatsheets/){target="_blank"}
[Table formats](https://en.wikipedia.org/wiki/Wide_and_narrow_data){target="_blank"}
[Geocoding](https://en.wikipedia.org/wiki/Geocoding)
[Coordinate reference system (CRS)](https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems) [ESPG](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset)
[World Geodetic System (WGS) version 84 also called ESPG:4326 ](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84)
[Albers equal-area conic projection](https://en.wikipedia.org/wiki/Albers_projection#:~:text=The%20Albers%20equal%2Darea%20conic,that%20uses%20two%20standard%20parallels.&text=The%20Albers%20projection%20is%20used,the%20United%20States%20Census%20Bureau.)
[crs 102008](https://spatialreference.org/ref/esri/102008/html/)
To learn more about geospatial coordinate systems see [here](https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdf) and [here](https://guides.library.duke.edu/r-geospatial/CRS).
[`ggplot2` package](http://ggplot2.tidyverse.org){target="_blank"}
Please see [this case study](https://opencasestudies.github.io/ocs-bp-co2-emissions/) for more details on using `ggplot2`
[grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.html){target="_blank"}
[`ggplot2` themes](https://ggplot2.tidyverse.org/reference/ggtheme.html){target="_blank"}
[Motivating article for this case study about school shootings](https://link.springer.com/content/pdf/10.1007/s11920-012-0331-6.pdf)
Also see this [article](https://siepr.stanford.edu/sites/default/files/publications/19-036.pdf) to learn more about the impacts of school shootings.
[Lightweight markup languages(LML)](https://en.wikipedia.org/wiki/Lightweight_markup_language)
[Markdown](https://en.wikipedia.org/wiki/Markdown)
[R markdown](http://rmarkdown.rstudio.com/)
[`knitr`](https://yihui.org/knitr/)
[`rmarkdown` (package)](https://cran.r-project.org/web/packages/rmarkdown/rmarkdown.pdf)
See this [book](https://bookdown.org/yihui/rmarkdown/) for more information on working with R Markdown files.
The RStudio [cheatsheet for R Markdown](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) and this [tutorial](https://ourcodingclub.github.io/tutorials/rmarkdown/) are great for getting started.
[Pandoc](https://en.wikipedia.org/wiki/Pandoc)
[YAML](https://en.wikipedia.org/wiki/YAML)
[Configuration](https://en.wikipedia.org/wiki/Configuration_file)
[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/)
See [here](https://rstudio.com/resources/webinars/introducing-flexdashboards/) for a video about flexdashboard and [here](https://rmarkdown.rstudio.com/flexdashboard/) for a more information on how to use this package.
See [here](https://rmarkdown.rstudio.com/flexdashboard/using.html#components) for a list of other packages that are useful for adding elements to dashboards created with the `flexdashboard` package.
See [here](https://www.datadreaming.org/post/r-markdown-theme-gallery/) for a list of R Markdown themes which can be used with `flexdashbard`.
See [Font Awesome](https://fontawesome.com/icons?d=gallery) for icons.
To learn more about using `shiny` with the `flexdashboard` package to create interactive dashboards, see this [tutorial](https://rmarkdown.rstudio.com/flexdashboard/shiny.html).
[leaflet (R package)](https://rstudio.github.io/leaflet/)
[Leaflet (JavaScript Library)](https://leafletjs.com/)
[shiny](https://shiny.rstudio.com/)
See [here](https://shiny.rstudio.com/gallery/) for a gallery of `shiny` examples.
See this [website](https://rstudio.github.io/shinydashboard/) to learn about a more flexible and slightly more challenging option for creating dashboards in R using a package called `shinydashboard`.
<u>**Packages used in this case study:** </u>
Package | Use in this case study
---------- |-------------
[here](https://github.com/jennybc/here_here){target="_blank"} | to easily load and save data
[readr](https://readr.tidyverse.org/) | to import the data as a csv file
[googlesheets4](https://googlesheets4.tidyverse.org/) | to import directly from Google Sheets
[tibble](https://tibble.tidyverse.org/) | to create tibbles (the tidyverse version of dataframes)
[dplyr](https://dplyr.tidyverse.org/){target="_blank"} | to filter, subset, join, add rows to, and modify the data
[stringr](https://stringr.tidyverse.org/){target="_blank"} | to manipulate character strings within the data (collapsing strings together, replace values, and detect values)
[magrittr](https://magrittr.tidyverse.org/){target="_blank"} | to pipe sequential commands
[tidyr](https://tidyr.tidyverse.org/){target="_blank"} | to change the shape or format of tibbles to wide and long, to drop rows with `NA` values, and to see the last few columns of a tibble
[ggmap](https://cran.r-project.org/web/packages/ggmap/ggmap.pdf) | to geocode the data (which means get the latitude and longitude values)
[sf](https://r-spatial.github.io/sf/) | to modify the geocoded data so that overlapping points did not overlap
[lubridate](https://lubridate.tidyverse.org/) | to work with the data-time data
[DT](https://rstudio.github.io/DT/) | to create the interactive table
[htmltools](https://www.rdocumentation.org/packages/htmltools/versions/0.5.0) | to add a caption to our interactive table
[ggplot2](https://ggplot2.tidyverse.org/){target="_blank"} | to create plots
[forcats](https://forcats.tidyverse.org/){target="_blank"} | to reorder factor for plot
[waffle](https://github.com/hrbrmstr/waffle) | to make waffle proportion plots
[poliscidata](https://cran.r-project.org/web/packages/poliscidata/poliscidata.pdf) | to get population values for the states
[flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/) | to create the dashboard
[shiny](https://shiny.rstudio.com/){target="_blank"} | to allow our dashboard to be interactive
[leaflet](https://rstudio.github.io/leaflet/shiny.html) | to implement the [leaflet](http://leafletjs.com/) (a JavaScript library for maps) to create the map for our dashboard
#### For users
There is a [`Makefile`](Makefile) in this folder that allows you to type `make` to knit the case study contained in the `index.Rmd` to `index.html` and it will also knit the [`README.Rmd`](README.Rmd) to a markdown file (`README.md`). Note that you may need to press the "Q" key to close the documentation about flexdashboard.
Users can skip the Data Import and Data Wrangling sections to start with the Data Analysis and Visualization section if they wish. Alternatively users can also start at the Dashboard Basics or Our Dashboard sections.
#### For instructors
Instructors who only wish to demonstrate the basics of how to create a dashboard with `flexdashboard` can simply use the `Dashboard Basics` section, this would likely only take one or two class sessions to cover.
Instructors can skip the Data Import and Data Wrangling sections to start with the Data Analysis section if they wish.
#### Target audience
This case study is appropriate for those new to R programming. It is also appropriate for more advanced R users who are new to the Tidyverse. This particular case study may require some introductory knowlege of R programming.
#### Suggested homework
Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more shootings. Students could be asked to use one of the pages of the dashboard that we created as an example.
#### Estimate of RMarkdown Compilation Time:
~ About 37 - 47 seconds
This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.