Code for managing large data sets in Python, usually with Pandas. These scripts mostly merge, filter, inspect, and count things. Developed for a charter school database of 10K+ units based on web-crawling and federal data sources (CCD, ACS, etc).
List of data sources:
- Web data collected by author & URAP team
- Common Core of Data (CCD) Public School Universe Survey
- American Community Survey (ACS) (interactive)
- EdFacts Achievement Results
- Civil Rights Data Collection (CRDC)
- Partisan Voting Index (PVI)
Overview of data generation workflow: