Repo to assist in rapid analysis and development on top of existing data for COVID-19.
Docker image runs local Postgres instance and populates multiple datasets into tables that can be easily used for analysis. There's DDL and everything.
Will be extending this to other datasets to provide a quickstart to anyone who wants to play around with tons of COVID-19 datasets not in a CSV format.
Some early notes here.
- Add COVID_DB_USER, COVID_DB_PASSWORD, & COVID_DB_VOLUME to your environment variables (highly suggest putting this into your .profile / .bashrc / .bash_profile files). If you want to change the postgres db name from 'covid' to something else OR update the postgres port, update the connect.py file in covid_utils & the docker-compose file.
- If you'd like to load data, clone the necessary repos (see the "data" section), then check out the config/local.py file. This file contains the place where you'd hardcode paths for your local machine. You can be as abstract or as deliberate as you like.
- If you have a local Postgres install, make sure it isn't masking the ports that
the docker-compose file exposes. I ran into this problem on OSX and just spun down
the homebrew server that kept restarting --
brew services postgres stop
- For now, you'll need to run the DDL file corresponding to the datasource you want to look at after the initial Postgres setup. You can find those files in the DDL folder.
- NYTimes dataset
- State and county level timeseries of cases and deaths since January 2020
- COVID Tracking Project dataset
- Known also as the "Atlantic dataset"
- State level timeseries of cases, deaths, tests & results, and hospitalizations since January 2020
- Static Data
- FIPS to LatLng
- Translation of FIPS datapoints (referred to in US Census Terms also as "GEOID") to center of county; uses Gazetteer files from Census Bureau
- FIPS to LatLng
docker-compose up
Can run for entire dataset or just a single source.
cd /path/to/covid/repo
python update/{datasource}.py
OR
python update/all.py
Load data that is updated in a regular manner.
cd /path/to/covid/repo
python load_data/{source}_data.py
Load static data (i.e. from a CSV). Put static CSVs in "static" directory.
cd /path/to/covid/repo
python load_data/load_csv.py -l 'static/{filename}'
This step is taken care of in the "dataset refresh" scripts, but can run independently if needed.
cd /path/to/covid/repo
python mvs/mvs_maker.py -s '{filename}.sql'
You can add flatfiles to generate quite easily.
Add the SQL query you want to generate the flatfile to flatfiles/queries.py
Then add the filepath to save it AND the name of the sql query to update/{dataset}.py.
Then just run python update/{dataset}.py