Skip to content

Data warehouse tech stack with Postgresql, DBT, and Airflow

Notifications You must be signed in to change notification settings

Haylemicheal/Data-warehouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

A city traffic department wants to collect traffic data using swarm UAVs (drones) from a number of locations in the city and use the data collected for improving traffic flow in the city and for a number of other undisclosed projects. The company is responsible for creating a scalable data warehouse that will host the vehicle trajectory data extracted by analysing footage taken by swarm drones and static roadside cameras. Project design diagram The data warehouse take into account future needs, organise data such that a number of downstream projects query the data efficiently. I use the Extract Load Transform (ELT) framework using DBT. The ELT framework helps analytic engineers in the city traffic department setup transformation workflows on a need basis.

Objectives

Building data warehouse techstack

  • Consisting of

    • A “data warehouse” (PostgresQL)
    • An orchestration service (Airflow)
    • An ELT tool (dbt)
    • A reporting environment (redash)
  • Set it up locally using

    • Fully dockerized

Dataset

Project Structure

  • data: Contains the dataset
  • dbt: Contains dbt related files
  • airflow: contains airflow related files
  • scripts: Contains script codes
  • logs: contains log files
  • tests: Unit test files

Installation

1. Clone the repository
  $ git clone https://github.com/Haylemicheal/Data-warehouse
2. Navigate to the directory
  $ cd Data-warehouse

Run Airflow

1. Navigate to the airflow folder
  $ cd airflow
2. Run the airflow
  $ docker-compose up airflow-init 
  $ docker-compose up
3. View the dashboard
  $ localhost:8080

Run dbt

1. Navigate to the dbt folder
  $ cd dbt_traffic
2. Run the dbt
  $ dbt init
  $ dbt run 
  $ dbt docs generate
  $ dbt docs serve --port=port number
3. View the dashboard
  $ localhost:port number

Run Redash

1. Navigate to the redash folder
  $ cd redash
2. Run the redash
  $ docker-compose up
3. View the dashboard
  $ localhost:port number

Sample visualization from Redash

Fastest Vehicles Table Fastest Vehicles Bar graph Distribution of the Vehicles table Distribution of the Vehicles piechart

Author

About

Data warehouse tech stack with Postgresql, DBT, and Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published