-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add filtered datasets for "serious" incidents
h/t and thanks to @medievalmadeline for the core development of this new feature 🎉
- Loading branch information
Showing
7 changed files
with
31,951 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Filtered Subsets | ||
|
||
This directory contains filtered subsets of the full incident dataset. | ||
|
||
## `serious-incidents.csv` | ||
|
||
This dataset contains all rows for which *any* of the following fields' values is `Yes`: | ||
|
||
- `Serious Incident Ind` | ||
- `Hmis Serious Bulk Release` | ||
- `Hmis Serious Evacuations` | ||
- `Hmis Serious Fatality` | ||
- `Hmis Serious Flight Plan` | ||
- `Hmis Serious Injury` | ||
- `Hmis Serious Major Artery` | ||
- `Hmis Serious Marine Pollutant` | ||
- `Hmis Serious Radioactive` | ||
|
||
The Data Liberation Project thanks volunteer Madeline Everett for developing this filter, as well the filter described below. | ||
|
||
## `serious-incidents-expensive.csv` | ||
|
||
This dataset begins with the same filter as above, but adds an additional constraint: The total cost of the incident (`Total Amount Of Damages`) is $10,000 or more. |
10,000 changes: 10,000 additions & 0 deletions
10,000
data/processed/filtered/serious-incidents-expensive.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
import pathlib | ||
|
||
import pandas as pd | ||
|
||
|
||
def filter_rows(df, cost_min=0): | ||
return df.loc[ | ||
(df["Total Amount Of Damages"] >= cost_min) | ||
& ( | ||
(df["Serious Incident Ind"] == "Yes") | ||
| (df["Hmis Serious Bulk Release"] == "Yes") | ||
| (df["Hmis Serious Evacuations"] == "Yes") | ||
| (df["Hmis Serious Fatality"] == "Yes") | ||
| (df["Hmis Serious Flight Plan"] == "Yes") | ||
| (df["Hmis Serious Injury"] == "Yes") | ||
| (df["Hmis Serious Major Artery"] == "Yes") | ||
| (df["Hmis Serious Marine Pollutant"] == "Yes") | ||
| (df["Hmis Serious Radioactive"] == "Yes") | ||
) | ||
] | ||
|
||
|
||
def read_csv(path): | ||
return pd.read_csv(path, dtype=str).astype({"Total Amount Of Damages": int}) | ||
|
||
|
||
def main(): | ||
# Collect all of the CSVs in the fetched folder | ||
paths = sorted(pathlib.Path("data/fetched").glob("*.csv")) | ||
|
||
# Concatenate all of the CSV files | ||
all_rows = pd.concat(map(read_csv, paths), ignore_index=True) | ||
|
||
# Filter to "serious" incidents | ||
filtered_rows = filter_rows(all_rows) | ||
filtered_rows.to_csv("data/processed/filtered/serious-incidents.csv", index=False) | ||
|
||
# Filter the serious incidents to just those with $10k+ in total costs | ||
filtered_rows_expensive = filter_rows(filtered_rows, cost_min=10000) | ||
filtered_rows_expensive.to_csv( | ||
"data/processed/filtered/serious-incidents-expensive.csv", index=False | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |