Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

add vaccination data #1574

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
073250f
add VaccinationData class
maekke Jan 10, 2021
e86c1df
add vaccination DB scripts
maekke Jan 10, 2021
2487145
add run_vaccinations_scraper.sh
maekke Jan 10, 2021
b4ff0b9
add BS vaccination scraper
maekke Jan 10, 2021
f3cd588
add empty fallzahlen_kanton_BS_impfungen.csv
maekke Jan 10, 2021
12cfe18
add github workflow for vaccination scraper
maekke Jan 10, 2021
b367c0b
include vaccinations scrapers in test_scrapers.sh
maekke Jan 10, 2021
adc4d2a
add GE vaccination scraper
maekke Jan 10, 2021
0e7e2e3
add SO vaccinations scraper
maekke Jan 10, 2021
2aebaa8
add VaccinationData unit test
maekke Jan 11, 2021
d2aaaa6
add vaccination scraper for BE
maekke Jan 13, 2021
47bee18
add empty fallzahlen_kanton_BE_impfungen.csv
maekke Jan 13, 2021
ff71ad6
add BE scraper to github workflow
maekke Jan 13, 2021
ecda1bc
update GE vaccination scraper to new website
maekke Jan 14, 2021
4e83627
add BL vaccination scraper
maekke Jan 15, 2021
61fd7bb
add JU vaccination scraper
maekke Jan 17, 2021
b6db4c8
update BE vaccinations scraper to use latest website
maekke Jan 17, 2021
8927a26
update GE vaccinations scraper to latest website
maekke Jan 17, 2021
5620381
replace date with start/end-date, week and year for VaccinationData
maekke Jan 19, 2021
139c368
use start/end date for BE vaccination scraper
maekke Jan 19, 2021
0be284c
use start/end-date for BL vaccination scraper
maekke Jan 19, 2021
aafaef9
use start/end-date for BS vaccination scraper
maekke Jan 19, 2021
d0a8fc2
use week and year for GE vaccination scraper
maekke Jan 19, 2021
a8fb9cf
use start/end-date for JU vaccination scraper
maekke Jan 19, 2021
63e69c5
use start/end-date for SO vaccination scraper
maekke Jan 19, 2021
9f36926
update CSV files with new columns
maekke Jan 19, 2021
0c6ccba
update DB and runner scripts with new columns
maekke Jan 19, 2021
643727f
add missing __get_int_item function and update header of VaccinationData
maekke Jan 19, 2021
13d2f10
fix VaccinationData tests
maekke Jan 19, 2021
c7b3736
add/update columns for VaccinationData in python code
maekke Jan 20, 2021
fde36a6
update csv file columns
maekke Jan 20, 2021
37ae178
add ZG vaccination scraper
maekke Jan 20, 2021
a715819
move VS daily PDF find logic and strip function to VS common module
maekke Jan 20, 2021
bea9ce5
add VS vaccination scraper
maekke Jan 20, 2021
358736a
update BS vaccination scraper
maekke Jan 21, 2021
6120733
add TG common module and move csv url fetching to common function
maekke Jan 22, 2021
964a022
add TG vaccination scraper, empty csv and wire it up in the workflow
maekke Jan 22, 2021
a54d2f2
add AI vaccination scraper
maekke Jan 22, 2021
e9b88d5
add AR vaccination scraper
maekke Jan 22, 2021
88fabec
add VD vaccination scraper
maekke Jan 23, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions .github/workflows/run_vaccinations_scraper.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
name: Run vaccinations scrapers

on:
schedule:
- cron: '30 * * * *' # run every hour at xx:30
workflow_dispatch: ~
jobs:
run_scraper:
runs-on: ubuntu-18.04
continue-on-error: false
timeout-minutes: 10
strategy:
fail-fast: false
matrix:
canton:
- AI
- AR
- BE
- BL
- BS
- GE
- JU
- SO
- TG
- VD
- VS
- ZG

steps:
- uses: actions/checkout@v2

- name: Set up Python 3.7
uses: actions/setup-python@v1
with:
python-version: 3.7
- run: npm ci
- name: Remove broken apt repos
run: |
for apt_file in `grep -lr microsoft /etc/apt/sources.list.d/`; do sudo rm $apt_file; done
- name: Install dependencies
env:
SCRAPER_KEY: ${{ matrix.canton }}
run: |
python -m pip install --upgrade pip setuptools wheel
pip install -r requirements.txt
sudo apt update || true # do not fail if update does not work
sudo apt-get install sqlite3

- name: Scrape new data
env:
SCRAPER_KEY: ${{ matrix.canton }}
run: |
./scrapers/run_vaccinations_scraper.sh

- name: Check if there are changes in the repo
id: changes
uses: UnicornGlobal/[email protected]

- name: Set commit message
env:
SCRAPER_KEY: ${{ matrix.canton }}
run: |
if [ "$SCRAPER_KEY" = "FL" ] ; then
echo "commit_msg=Update fallzahlen_${SCRAPER_KEY}_vaccinations.csv from scraper" >> $GITHUB_ENV
else
echo "commit_msg=Update fallzahlen_kanton_${SCRAPER_KEY}_vaccinations.csv from scraper" >> $GITHUB_ENV
fi

- name: Commit and push to repo
if: steps.changes.outputs.changed == 1 # only try to commit if there are actually changes
uses: github-actions-x/[email protected]
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
push-branch: master
name: GitHub Action Scraper
email: [email protected]
commit-message: ${{ env.commit_msg }}
rebase: 'true'

- name: Get current unix timestamp
if: always()
id: date
run: echo "::set-output name=ts::$(date +'%s')"

- name: Notify slack failure
if: ${{ failure() || cancelled() }}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
uses: pullreminders/slack-action@master
with:
args: '{\"channel\":\"C013C0UUQ4S\", \"attachments\": [{\"fallback\": \"Job failed.\", \"color\": \"danger\", \"title\": \"Run vaccinations scrapers ${{ matrix.canton }}\", \"title_link\": \"https://github.com/openZH/covid_19/actions/runs/${{ github.run_id }}?check_suite_focus=true\", \"text\": \":x: Vaccinations scraper failed\", \"footer\": \"<https://github.com/openZH/covid_19|openZH/covid_19>\", \"footer_icon\": \"https://github.com/abinoda/slack-action/raw/master/docs/app-icons/github-icon.png\", \"ts\": \"${{steps.date.outputs.ts}}\"}]}'

1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_AI_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_AR_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_BE_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_BL_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_BS_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_GE_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_JU_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_SO_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_TG_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_VD_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_VS_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
1 change: 1 addition & 0 deletions fallzahlen_impfungen/fallzahlen_kanton_ZG_impfungen.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
canton,start_date,end_date,week,year,doses_delivered,first_doses,second_doses,total_vaccinations,source
109 changes: 109 additions & 0 deletions scrapers/add_vaccinations_db_entry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#!/usr/bin/env python3

import sys
import sqlite3
import traceback
import os

import db_common as dc
import scrape_common as sc

__location__ = dc.get_location()

input_failures = 0

try:
DATABASE_NAME = os.path.join(__location__, 'data.sqlite')
conn = sqlite3.connect(DATABASE_NAME)

i = 0
for line in sys.stdin:
vd = sc.VaccinationData()
if vd.parse(line.strip()):
c = conn.cursor()
try:
print(vd)

c.execute(
'''
INSERT INTO data (
canton,
start_date,
end_date,
week,
year,
doses_delivered,
first_doses,
second_doses,
total_vaccinations,
source
)
VALUES
(?,?,?,?,?,?,?,?,?,?)
;

''',
[
vd.canton,
vd.start_date or '',
vd.end_date or '',
vd.week or '',
vd.year or '',
vd.doses_delivered,
vd.first_doses,
vd.second_doses,
vd.total_vaccinations,
vd.url,
]
)

print("Successfully added new entry.")
except sqlite3.IntegrityError as e:
# try UPDATE if INSERT didn't work (i.e. constraint violation)
try:
c.execute(
'''
UPDATE data SET
doses_delivered = ?,
first_doses = ?,
second_doses = ?,
total_vaccinations = ?,
source = ?
WHERE canton = ?
AND start_date = ?
AND end_date = ?
AND week = ?
AND year = ?
;
''',
[
vd.doses_delivered,
vd.first_doses,
vd.second_doses,
vd.total_vaccinations,
vd.url,

vd.canton,
vd.start_date or '',
vd.end_date or '',
vd.week or '',
vd.year or '',
]
)
print("Successfully updated entry.")
except sqlite3.Error as e:
print("Error: an error occured in sqlite3: ", e.args[0], file=sys.stderr)
conn.rollback()
input_failures += 1
finally:
conn.commit()
except Exception as e:
print("Error: %s" % e, file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
sys.exit(1)
finally:
conn.close()

if input_failures:
print(f'input_failures: {input_failures}')
sys.exit(1)
52 changes: 52 additions & 0 deletions scrapers/populate_vaccinations_database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/usr/bin/env python3

# This script creates a new sqlite database based on the CSV is reiceives as an argument
# The sqlite database is used as an intermediate step to merge new data in existing CSVs

import sqlite3
import traceback
import os
import sys
import db_common as dc


__location__ = dc.get_location()

try:
# load the csv to sqlite db
assert len(sys.argv) == 2, "Call script with CSV file as parameter"
columns, to_db = dc.load_csv(sys.argv[1])

# create db
DATABASE_NAME = os.path.join(__location__, 'data.sqlite')
conn = sqlite3.connect(DATABASE_NAME)
c = conn.cursor()
c.execute('DROP TABLE IF EXISTS data')
c.execute(
'''
CREATE TABLE IF NOT EXISTS data (
canton text NOT NULL,
start_date text NOT NULL,
end_date text NOT NULL,
week text NOT NULL,
year text NOT NULL,
doses_delivered integer,
first_doses integer,
second_doses integer,
total_vaccinations integer,
source text,
UNIQUE(canton, start_date, end_date, week, year)
)
'''
)

# add entries
query = dc.insert_db_query(columns)
c.executemany(query, to_db)
conn.commit()
except Exception as e:
print("Error: %s" % e, file=sys.stderr)
print(traceback.format_exc(), file=sys.stderr)
sys.exit(1)
finally:
conn.close()
39 changes: 39 additions & 0 deletions scrapers/run_vaccinations_scraper.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash

# Script to run a single tests scraper

set -e
set -o pipefail

function cleanup {
exit $?
}
trap "cleanup" EXIT

DIR="$(cd "$(dirname "$0")" && pwd)"


# SCRAPER_KEY must be set
if [ -z $SCRAPER_KEY ] ; then
echo "SCRAPER_KEY env variable must be set";
exit 1
fi

area="kanton_${SCRAPER_KEY}"
if [ "$SCRAPER_KEY" = "FL" ] ; then
area="${SCRAPER_KEY}"
fi

# 1. populate the database with the current CSV
echo "Populating database from CSV fallzahlen_${area}_vaccinations..."
$DIR/populate_vaccinations_database.py $DIR/../fallzahlen_impfungen/fallzahlen_${area}_impfungen.csv

# 2. run the scraper, update the db
echo "Run the vaccinations scraper..."
scrape_script="${DIR}/scrape_${SCRAPER_KEY,,}_vaccinations.py"
$scrape_script | $DIR/add_vaccinations_db_entry.py

# 3. Export the database as csv
echo "Export database to CSV..."
sqlite3 -header -csv $DIR/data.sqlite "select * from data order by canton, start_date, end_date, year, week+0 asc;" > $DIR/../fallzahlen_impfungen/fallzahlen_${area}_impfungen.csv
sed -i 's/""//g' $DIR/../fallzahlen_impfungen/fallzahlen_${area}_impfungen.csv
15 changes: 15 additions & 0 deletions scrapers/scrape_ai_vaccinations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env python3

import scrape_common as sc

url = 'https://www.ai.ch/themen/gesundheit-alter-und-soziales/gesundheitsfoerderung-und-praevention/uebertragbare-krankheiten/coronavirus/impfung'
d = sc.download(url, silent=True)

vd = sc.VaccinationData(canton='AI', url=url)
date = sc.find(r'>.*Stand (.*\s\d{4}),\s\d+\sUhr</div>', d)
date = sc.date_from_text(date)
vd.start_date = date.isoformat()
vd.end_date = date.isoformat()
vd.total_vaccinations = sc.find(r'<li>([0-9]+)\s+Personen geimpft \(kumuliert\)<\/li>', d)
assert vd
print(vd)
20 changes: 20 additions & 0 deletions scrapers/scrape_ar_vaccinations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env python3

import re
import scrape_common as sc

url = 'https://www.ar.ch/verwaltung/departement-gesundheit-und-soziales/amt-fuer-gesundheit/informationsseite-coronavirus/'
d = sc.download(url, silent=True)
d = d.replace('&nbsp;', ' ')
d = re.sub(r'(\d+)\'(\d+)', r'\1\2', d)

vd = sc.VaccinationData(canton='AR', url=url)

date = sc.find(r'Impfzahlen.*Stand (\d+\.\d+\.\d{4})\)', d)
date = sc.date_from_text(date)

vd.start_date = date.isoformat()
vd.end_date = date.isoformat()
vd.doses_delivered = sc.find(r'>Bereits gelieferte Impfdosen: <strong>(\d+)</strong>', d)
vd.total_vaccinations = sc.find(r'>Bereits verimpfte Impfdosen: <strong>(\d+)</strong>', d)
print(vd)
28 changes: 28 additions & 0 deletions scrapers/scrape_be_vaccinations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env python3

import re
from bs4 import BeautifulSoup
import scrape_common as sc

url = 'https://www.besondere-lage.sites.be.ch/de/start/impfen.html'
d = sc.download(url, silent=True)
d = re.sub(r'(\d+)\'(\d+)', r'\1\2', d)
soup = BeautifulSoup(d, 'html.parser')

table = soup.find('p', string=re.compile('Durchgef.hrte Impfungen im Kanton Bern')).find_next('table')
tbody = table.find_all('tbody')[0]
trs = tbody.find_all('tr')

for tr in trs[1:]:
tds = tr.find_all('td')
assert len(tds) == 3, f'expected 3 rows, but got {len(tds)} ({tds})'

vd = sc.VaccinationData(canton='BE', url=url)
date = sc.find(r'(\d+\.\d+\.\d+)', tds[0].text)
date = sc.date_from_text(date)
vd.start_date = date.isoformat()
vd.end_date = date.isoformat()
vd.total_vaccinations = sc.find(r'(\d+)\s?', tds[1].text)
vd.second_doses = sc.find(r'(\d+)\s?', tds[2].text)
if vd:
print(vd)
Loading