Transformation of SNIRH platform data into an accessible RESTFull API.
- What is SNIRH?
- Motivation
- Structure
- Setup for development
- Setup for deployment
- Populate timeseries data
- Crawler
SNIRH (Sistema Nacional de Informação de Recursos Hídricos - National Information System for Water Resources) is a website built in the mid90s that gives access to all sorts of water resources data accross Portugal. It had little to no updates in the last 30 years.
- The user interface is pretty old and hard to get multiple station's data.
- Provide access to the data in an easy and standard format, through a REST API.
- On top of this API, a frontend modern application can be easily built.
This project consists of 4 main containers:
- backend - fetches the data and creates a RESTFull API interface for easy access.
- db - database container.
- pgadmin - admin panel for postgreSQL.
- frontend - creates a modern dashboard for easy access. ❗ work in progress ❗
❗ If you only need the crawler (without all this web stuff) go to this repo
build and run for development
docker-compose up -d --build
the api server will be available in http://localhost:8000
You should populate the database with network, stations and parameters data (static data, -s
):
docker exec -it backend python3 manage.py populate -s -r # -r stands for replace
⚠️ Fething the data can take a looong time
1 - Setup traefik - follow this tutorial
2 - edit docker-compose.prod.yml
traefik domain settings with your domain.
3 - add .env
file in the main directory (copy from .env.dev)
4 - build and run for production
docker-compose -f docker-compose.prod.yml up -d --build
You should populate the database with network, stations and parameters data (static data, -s
):
docker exec -it backend python3 manage.py populate -s -r # -r stands for replace
⚠️ Fetching the data can take a looong time
Currently, this functionality is ignored, due to long waiting times. The data is directly fetched from SNIRH
to get all timeseries data and populate the database run:
docker exec -it backend python3 manage.py populate -t -r # -r stands for replace
to get timeseries data just for the last day:
docker exec -it backend python3 manage.py populate -t
⚠️ Fetching the data can take a looong time
The crawler accepts multiple commands that will print the data and write it to a .json
file
❗ If you only need the crawler go to this repo
# all networks
python3 manage.py fetch networks
# all stations for a network_uid
python3 manage.py fetch stations -n {network_uid}
# all params of a station_uid from a network_uid
python3 manage.py fetch params -n {network_uid} -s {station_uid}
# data for a parameter_uid of a station_uid from tmin (yyyy-mm-dd) to tmax (yyyy-mm-dd)
python3 manage.py fetch data -s {station_uid} -p {parameter_uid} -f {tmin} -t {tmax}
Get all networks - writes it in data/networks.json
python3 manage.py fetch networks
Get all stations of the network 920123705 - writes it in data/stations-network_920123705.json
python3 manage.py fetch stations -n 920123705
Get all parameters of the station 1627758916 inside the network 920123705 - writes it in data/parameters-station_1627758916.json
python3 manage.py fetch parameters -n 920123705 -s 1627758916
Get data for parameter 1849 of the station 1627758916 between 1980-01-01 and 2020-12-31 - writes it in data/data-station_1627758916-parameter_1849-tmin_1980-01-01-tmax_2020-12-31.json
python3 manage.py fetch data -s 1627758916 -p 1849 -f 1980-01-01 -t 2020-12-31
Get data for multiple parameters (4237, 1436794570) and multiple stations (920752570, 920752670) between 1930-01-01 and 2020-12-31 - writes it in
Get data for parameter 1849 of the station 1627758916 between 1980-01-01 and 2020-12-31 - writes it in data/data-stations_920752570,920752670-parameters_4237,1436794570-tmin_1930-01-01-tmax_2020-12-31.json
python3 manage.py fetch data -s 920752570 920752670 -p 4237 1436794570 -f 1930-01-01 -t 2020-12-31