A city traffic department wants to collect traffic data using swarm UAVs (drones) from a number of locations in the city and use the data collected for improving traffic flow in the city and for a number of other undisclosed projects. The company is responsible for creating a scalable data warehouse that will host the vehicle trajectory data extracted by analysing footage taken by swarm drones and static roadside cameras. The data warehouse take into account future needs, organise data such that a number of downstream projects query the data efficiently. I use the Extract Load Transform (ELT) framework using DBT. The ELT framework helps analytic engineers in the city traffic department setup transformation workflows on a need basis.
Building data warehouse techstack
-
Consisting of
- A “data warehouse” (PostgresQL)
- An orchestration service (Airflow)
- An ELT tool (dbt)
- A reporting environment (redash)
-
Set it up locally using
- Fully dockerized
- data: Contains the dataset
- dbt: Contains dbt related files
- airflow: contains airflow related files
- scripts: Contains script codes
- logs: contains log files
- tests: Unit test files
1. Clone the repository
$ git clone https://github.com/Haylemicheal/Data-warehouse
2. Navigate to the directory
$ cd Data-warehouse
1. Navigate to the airflow folder
$ cd airflow
2. Run the airflow
$ docker-compose up airflow-init
$ docker-compose up
3. View the dashboard
$ localhost:8080
1. Navigate to the dbt folder
$ cd dbt_traffic
2. Run the dbt
$ dbt init
$ dbt run
$ dbt docs generate
$ dbt docs serve --port=port number
3. View the dashboard
$ localhost:port number
1. Navigate to the redash folder
$ cd redash
2. Run the redash
$ docker-compose up
3. View the dashboard
$ localhost:port number
- Haylemicheal Berihun
- 10academy team