.

jacobkap · Sep 17, 2024 · 823848f · 823848f
1 parent 6d3b7aa
commit 823848f
Show file tree

Hide file tree

Showing 228 changed files with 1,366 additions and 2,714 deletions.
diff --git a/06_shr.Rmd b/06_shr.Rmd
@@ -72,7 +72,7 @@ ggplot(shr_offenses_known_murders, aes(x = year)) +
 
 Let us look at Chicago for another example of the differences in reporting from the SHR and the Offenses Known data. Figure \@ref(fig:chicagoSHRvsOffensesKnown) shows the annual number of homicide victims from both datasets. In most years they are pretty similar, excluding a few really odd years in the 1980s and in 1990. But what is also strange is that most years have more SHR victims than Offenses Known victims. So nationally SHR has fewer homicides than Offenses Known but that pattern is reversed in Chicago? This is one of the many quirks of SHR data. And is a warning against treating national trends as local trends; what is true nationally is not always true in your community. So when you use this data, check everything closely. And once you have done that, check it again. 
 
-```{r chicagoSHRvsOffensesKnown, fig.cap = "The annual number of homicide victims in Chicago, SHR and Offeksnes Known, 1976-2022."}
+```{r chicagoSHRvsOffensesKnown, fig.cap = "The annual number of homicide victims in Chicago, Supplementary Homicide Reports and Offenses Known and Clearances by Arrest, 1976-2022."}
 chicago_homicides <- shr %>%
   filter(ori %in% "ILCPD00") %>%
   mutate(victim_count = additional_victim_count + 1) %>%
@@ -106,24 +106,6 @@ ggplot(chicago_offenses_known_homicides, aes(x = year)) +
 
 ```
 
-Another way to visualize reporting is to see the total number of agencies that report at least one homicide, as depicted in Figure \@ref(fig:shrAnnualAgencies). Here we can see that have about 3,000 agencies reporting. Given that most agencies are small and truly do have zero homicides in a year, that may be reasonable. Agencies that do not have homicides do not submit a report saying so, they just do not submit any data. So that makes it hard to tell when an agency not reporting data is doing so because they choose to not report, or because they have nothing to report. This is most common in small agencies where many years truly have no homicides. But let us look at our biggest agencies, and see how much of an impact it would make to have them not report data.
-
-```{r, shrAnnualAgencies, fig.cap = "The annual number of agencies that report at least one homicide."}
-shr %>%
-  distinct(year,
-           ori) %>%
-  count(year) %>%
-ggplot(aes(x = year, y = n)) +
-  geom_line(size = 1.02) +
-  xlab("Year") +
-  ylab("# of Agencies") +
-  theme_crim() +
-  crimeutils::scale_color_crim() + 
-  scale_y_continuous(labels = scales::comma) +
-  labs(color = "") +
-  expand_limits(y = 0)
-```
-
 Figures \@ref(fig:shrTopAgenciesCount) and \@ref(fig:shrTopAgenciesCountPercent) attempt to get at this question by looking the number and percent of all incidents that the top 100, 50 and 10 agencies based on number of homicide incidents make up out of all homicide incidents in each year. These agencies are massively disproportionate in how many homicides they represent - though they are also generally the largest cities in the country so are a small number of agencies but a large share of this nation's population. On average, the 10 agencies with the most homicide incidents each year - which may change every year - have over 4,000 homicide incidents and make up about 1/4 of all homicide incidents reported nationally. The top 50 have about 7,500 incidents a year, accounting for 46% of incidents. The top 100 agencies have a bit under 10,000 incidents a year and make up over 55% of all homicide incidents in the United States. So excluding the largest agencies in the country would certainly undercount homicides.
 
 ```{r, shrTopAgenciesCount, fig.cap = "The annual number of homicide incidents, showing all agencies, the top 100 agencies (by number of homicide incidents), top 50, and top 10 agencies, 1976-2022."}

diff --git a/07_leoka.Rmd b/07_leoka.Rmd
@@ -21,54 +21,52 @@ Figure \@ref(fig:leokaAgencies) shows the annual number of police agencies that
 
 The decline after 2020 is part of what I have referred to as the "death and rebirth" of the SRS. 2020 was the last year that the FBI accepted SRS data - though in 2022 they began accepting SRS submissions again. As noted in previous chapters, this death and rebirth led to changes in both which agencies reported and what data was reported. In 2021 when only NIBRS was collected, no SRS agencies could report, but even once they began to accept SRS data again the damage was done. Some agencies were transitioning from SRS to NIBRS so reported neither, while others likely made the decision to stick to NIBRS only - perhaps caused by their data vendor no longer supporting SRS data. 
 
-```{r leokaAgencies, fig.cap = "The annual number of police agencies that report at least month of data that year, 1960-2022"}
-leoka %>%
+```{r leokaAgencies, fig.cap = "The annual number of police agencies that report at least month of data, at least one employee, and at least one assault against an officer, 1960-2022"}
+leoka_years_one_month <-
+  leoka %>%
   dplyr::filter(number_of_months_reported > 0) %>%
   dplyr::distinct(ori, year, .keep_all = TRUE) %>%
   count(year) %>%
-  ggplot(aes(x = year, y = n)) +
-  geom_line(size = 1.05) +
-  xlab("Year") +
-  ylab("# of Agencies") +
-  theme_crim() +
-  scale_y_continuous(labels = scales::comma) +
-  expand_limits(y = 0)
-```
+  rename(at_least_one_month = n)
 
-Part of the decline we see in Figure \@ref(fig:leokaAgencies) is because starting in 2018 - for reasons I am unsure of - many more agencies started reporting having zero employees. In Figure \@ref(fig:leokaAgenciesEmployees) we can see the annual number of agencies that report having at least one employee (civilian or sworn officer). Compared to Figure \@ref(fig:leokaAgencies) we see more agencies reporting since the 200s, and an earlier but less steep drop in reporting.
 
-```{r leokaAgenciesEmployees, fig.cap = "The annual number of police agencies that report having at least one employee, 1960-2022"}
+leoka_years_one_assault <-
+  leoka %>%
+  dplyr::filter(total_assaults_total > 0,
+                !is.na(total_assaults_total)) %>%
+  dplyr::distinct(ori, year, .keep_all = TRUE) %>%
+  count(year) %>%
+  rename(at_least_one_assault = n)
+
 leoka %>%
   dplyr::filter(total_employees_total > 0,
                 !is.na(total_employees_total)) %>%
   dplyr::distinct(ori, year, .keep_all = TRUE) %>%
   count(year) %>%
-  ggplot(aes(x = year, y = n)) +
-  geom_line(size = 1.05) +
+  left_join(leoka_years_one_assault) %>%
+  left_join(leoka_years_one_month) %>%
+  ggplot(aes(x = year, y = n, color = "\u2265 One employees")) +
+  geom_line(size = 1.05) + 
+  geom_line(size = 1.05, aes(y = at_least_one_month, color = "\u2265 One Month")) +
+  geom_line(size = 1.05, aes(y = at_least_one_assault, color = "\u2265 One Assault")) +
   xlab("Year") +
   ylab("# of Agencies") +
   theme_crim() +
   scale_y_continuous(labels = scales::comma) +
-  expand_limits(y = 0)
+  expand_limits(y = 0) +
+  scale_color_manual(values = c("\u2265 One employees" = "#1b9e77",
+                                "\u2265 One Month" = "#d95f02",
+                                "\u2265 One Assault" = "#7570b3")) +
+  scale_y_continuous(labels = scales::comma) +
+  labs(color = "")
+
+
 ```
 
-I mentioned that LEOKA has two purposes: employee information and assaults on officers information. You should really think about this data as two separate datasets as agencies can report one, both, or neither part. In practice, more agencies report data on the number of employees they have than they do for assaults on officers. In Figure \@ref(fig:leokaAgenciesAssaults) we can see that in most years of data fewer than 6,000 (out of ~18k agencies in the country) report having at least one officer assaulted. The year with the most agencies reporting >1 assault was 2022 with 6,397 agencies. Most years average about 5,000 agencies reporting at least one assault on an officer. Though there is variation over time, the trend is much more settled than in the previous figures without any sharp decline in recent years. Assaults on officers is *relatively* rare, at least considering how many officer-civilian interactions occur. And many agencies are small with relatively little crime. So agencies that say they had zero assaults on officers may in fact truly have zero assaults. However, there are agencies that likely do have assaults on officers - such as large, high crime agencies which report assaults in other years - which report zero assaults in some months or years. So you will need to be careful when determining if a zero report is a true zero rather than an agency submitting incomplete data.
+Part of the decline we see in Figure \@ref(fig:leokaAgencies) is because starting in 2018 - for reasons I am unsure of - many more agencies started reporting having zero employees. In Figure \@ref(fig:leokaAgenciesEmployees) we can see the annual number of agencies that report having at least one employee (civilian or sworn officer). Compared to Figure \@ref(fig:leokaAgencies) we see more agencies reporting since the 200s, and an earlier but less steep drop in reporting.
 
-```{r leokaAgenciesAssaults, fig.cap = "The annual number of police agencies that report having at least one assault against a police officer, 1960-2022"}
-leoka %>%
-  dplyr::filter(total_assaults_total > 0,
-                !is.na(total_assaults_total)) %>%
-  dplyr::distinct(ori, year, .keep_all = TRUE) %>%
-  count(year) %>%
-  ggplot(aes(x = year, y = n)) +
-  geom_line(size = 1.05) +
-  xlab("Year") +
-  ylab("# of Agencies") +
-  theme_crim() +
-  scale_y_continuous(labels = scales::comma) +
-  expand_limits(y = 0)
+I mentioned that LEOKA has two purposes: employee information and assaults on officers information. You should really think about this data as two separate datasets as agencies can report one, both, or neither part. In practice, more agencies report data on the number of employees they have than they do for assaults on officers. In Figure \@ref(fig:leokaAgenciesAssaults) we can see that in most years of data fewer than 6,000 (out of ~18k agencies in the country) report having at least one officer assaulted. The year with the most agencies reporting >1 assault was 2022 with 6,397 agencies. Most years average about 5,000 agencies reporting at least one assault on an officer. Though there is variation over time, the trend is much more settled than in the previous figures without any sharp decline in recent years. Assaults on officers is *relatively* rare, at least considering how many officer-civilian interactions occur. And many agencies are small with relatively little crime. So agencies that say they had zero assaults on officers may in fact truly have zero assaults. However, there are agencies that likely do have assaults on officers - such as large, high crime agencies which report assaults in other years - which report zero assaults in some months or years. So you will need to be careful when determining if a zero report is a true zero rather than an agency submitting incomplete data.
 
-```
 
 ## Important variables
 
@@ -223,7 +221,7 @@ This data breaks down the monthly number of assaults on officers in a few differ
 
 We can start by looking at the breakdown of assaults by injury and weapon type for officers in the Los Angeles Police Department. Figure \@ref(fig:leokaAssaultTypeInjury) shows the number of assaults from all years reported for these categories. Over the complete time period there were almost 43,000 officers assaulted with about three-quarters of these assaults - 33,000 assaults - leading to no injuries. This data shows the number of officers assaulted, not unique officers, so an officer can potentially be included in the data multiple times if they are assaulted multiple times over a year. A little under a quarter of assaults lead to officer injury with most of these from unarmed offenders. Interestingly, there are far more gun and knife assaults where the officer is not injured than where the officer is injured. There are likely cases when the offender threatens the officer with the weapon but does not shoot or stab the officer. 
 
-```{r leokaAssaultTypeInjury, fig.cap = "The total number of assaults on officers by injury sustained and offender weapon in Los Angeles, 1960-2022."}
+```{r leokaAssaultTypeInjury, fig.cap = "The total number of assaults on officers by injury sustained and offender weapon in Los Angeles, 1960-2022.", fig.height=16}
 assault_type <- leoka %>%
   filter(ori %in% "CA01942") %>%
   mutate(dummy = 1,
@@ -293,10 +291,7 @@ ggplot(assault_type %>% filter(type != "Total Assaults"),
   expand_limits(x = 0) + 
   theme_crim() +
   scale_x_continuous(labels = scales::comma) +
-  facet_wrap(~category, scales = "free", ncol = 2) +
-  facetted_pos_scales(
-    y = list(COL == 1 ~ scale_y_discrete(guide = 'none'))
-  )
+  facet_wrap(~category, scales = "free", ncol = 1) 
 
 
 ```

diff --git a/11_nibrs_general.Rmd b/11_nibrs_general.Rmd
@@ -87,89 +87,10 @@ So if this data has the same info (other than unfounded and negative crimes) as
 
 We will look here at how many agencies report at least one crime each year between 1991 - the first year of data - and 2019 - the latest year of data - as well as compare NIBRS reporting to UCR reporting. Figure \@ref(fig:agenciesReporting) shows the number of agencies each year that reported at least one incident. Keep in mind that there are about 18,000 police agencies in the United States. Only a little over 600 agencies reported in 1991. This has grown pretty linearly, adding a few hundred agencies each year though that trend accelerated in recent years. In 2019, nearly 8,200 agencies reported at least some data to NIBRS. Compared to the estimated 18,000 police agencies in the United States, however, this is still fewer than half of agencies. The data shown here is potentially an overcount, however, as it includes agencies reporting any crime that year, even if they do not report every month. 
 
-```{r agenciesReporting, fig.cap="The annual number of agencies reporting at least one incident in that year."}
-agencies_reporting %>%
-  ggplot(aes(x = year, y = agencies)) +
-  geom_line(size = 1.05) +
-  xlab("Year") +
-  ylab("# of Agencies Reporting") +
-  theme_crim() +
-  scale_y_continuous(labels = scales::comma) +
-  expand_limits(y = 0)
-```
-
 Another way to look at reporting is comparing it to reporting to UCR. Figure \@ref(fig:agenciesReportingMap) shows the number of agencies in each state that report NIBRS data in 2019. Since 2019 is the year with the most participation, this does overstate reporting for previous years. This map pretty closely follows a population map of the US. Texas had the most agencies, followed by Michigan and Ohio. The southern states have more agencies reporting than the lightly populated northern states. The issue here is that a number of states are in white, indicating that very few agencies reported. Indeed, four of the most populated states - California, New York, Florida, and New Jersey - do not have any agencies at all that report NIBRS data.
 
-```{r agenciesReportingMap, fig.cap = "The number of agencies in each state that reported at least one crime in 2022 to NIBRS."}
-admin_map <-
-  administrative %>%
-  distinct(ori, .keep_all = TRUE) %>%
-  count(state)
-
-
-admin_map %>%
-  ggplot2::ggplot(aes(map_id = state)) + 
-  ggplot2::geom_map(aes(fill = n), map = fifty_states, color = "black") + 
-  expand_limits(x = fifty_states$long, y = fifty_states$lat) +
-  coord_map() +
-  scale_x_continuous(breaks = NULL) + 
-  scale_y_continuous(breaks = NULL) +
-  labs(x = "", y = "", fill = "# of NIBRS Agencies") +
-  theme(panel.background = element_blank()) +
-  fifty_states_inset_boxes() +
-  scale_fill_gradient(low = "white", high = "red", breaks = c(0, 200, 400, 600, 800, 1000)) 
-```
-
 Since the number of agencies in a state is partially just a factor of population, Figure \@ref(fig:agenciesReportingMapPercent) shows each state as a percent of agencies in that state that report to NIBRS that also reported to the UCR Offenses Known and Clearances by Arrest (the "crime" dataset) in 2019.^[This is the UCR dataset which has the highest reporting rate.] Not all agencies in the US reported to UCR in 2019 -  and a small number reported to NIBRS but not UCR in 2019 - but this is a fairly good measure of reporting rates. Here the story looks a bit different than in the previous figure. Now we can tell that among north-western states and states along the Appalachian Mountains, nearly all agencies report. In total, 18 states have 90% or more of agencies that reported to UCR in 2019 also reporting to NIBRS. Thirteen agencies have fewer than 10% of agencies reporting to NIBRS that also reported to UCR, with 5 of these having 0% of agencies reporting. The remaining states average about 56% of agencies reporting. So when using NIBRS data, keep in mind that you have very good coverage of certain states, and very poor coverage of other states. And the low - or zero - reporting states are systematically high population states.    
 
-```{r agenciesReportingMapPercent, fig.cap = "Agencies in each state reporting at least one crime to NIBRS in 2022 as a percent of agencies that reported UCR Offenses Known and Clearances by Arrests data in 2022."}
-ucr_agencies_per_state <- 
-  ucr %>%
-  filter(year %in% 2022,
-         !last_month_reported %in% "no months reported") %>%
-  count(state) %>%
-  rename(ucr_n = n)
-
-admin_map <- left_join(admin_map, ucr_agencies_per_state, by = "state")
-admin_map$percent <- admin_map$n / admin_map$ucr_n
-admin_map$percent <- round(admin_map$percent * 100, 2)
-
-
-admin_map %>%
-  ggplot2::ggplot(aes(map_id = state)) + 
-  ggplot2::geom_map(aes(fill = percent), map = fifty_states, color = "black") + 
-  expand_limits(x = fifty_states$long, y = fifty_states$lat) +
-  coord_map() +
-  scale_x_continuous(breaks = NULL) + 
-  scale_y_continuous(breaks = NULL) +
-  labs(x = "", y = "", fill = "% Also Report to UCR") +
-  theme(panel.background = element_blank()) +
-  fifty_states_inset_boxes() +
-  scale_fill_gradient(low = "white", high = "red", 
-                      breaks = c(0, 20, 40, 60, 80, 99.64)) 
-
-```
-
-For ease of reference, Table \@ref(tab:agenciesReportingTable) shows the number of agencies in each state reporting to NIBRS and to UCR in 2019, and the percent shown in Figure \@ref(fig:agenciesReportingMapPercent). 
-
-```{r }
-admin_map$state <- capitalize_words(admin_map$state)
-admin_map       <- admin_map %>% arrange(state)
-admin_map$percent <- paste0(admin_map$percent, "\\%")
-names(admin_map) <- c("State", "NIBRS Agencies", "UCR Agencies", "\\% of UCR Agencies")
-
-kableExtra::kbl(admin_map, 
-             #format = "html",
-             digits = 2, 
-             align = c("l", "r", "r", "r"),
-             #booktabs = TRUE, 
-             longtable = TRUE,
-             escape = TRUE,
-             label = "agenciesReportingTable",
-             caption = "The number of agencies in each state reporting to NIBRS and to UCR in 2019. Also shows NIBRS reporting in each state as a percent of UCR reporting.") %>%
-  kable_styling(bootstrap_options = "striped", full_width = FALSE, latex_options = c("hold_position", "repeat_header"))
-```
-
 ```{r nibrsAnnualNumberAgencies, fig.cap = "The annual number of police agencies that report data to NIBRS."}
 batch_header_all_years$number_of_months_reported <-
   as.numeric(batch_header_all_years$number_of_months_reported)
@@ -246,7 +167,7 @@ batch_header_all_years %>%
 ```
 
 
-```{r nibrsStateParticipation2020, fig.cap = "The percent of each state's population that is covered by police agencies reporting at least one month of data to NIBRS.", fig.height=25}
+```{r nibrsStateParticipation2020, fig.cap = "The percent of each state's population that is covered by police agencies reporting at least one month of data to NIBRS.", fig.height=16}
 fips_state_codes <- c(
   "01", "02", "04", "05", "06", "08", "09", "10", "11", "12",
   "13", "15", "16", "17", "18", "19", "20", "21", "22", "23",