fisheye-etl-az

Setup and load Fisheye data into Azure

Overview

In this case we are using Azure services. We'll use the Serverless functions to manage the load into ADLS gen 2 (as a blob) then load that blob into Postgres. Some links for reference:

Azure Setup Instructions here to help get us started.
ADLS into Postgres

EPA ECHO Data

Our initial use case is data from the EPA ECHO dataset. We'll start with these.

Most of these are .zip files containing multile child files. We'll have to identify how to use Azure resources to unpack and load them, since the examples basically loady from local .csv files.

Local Dev

We are using Python virtual environments. If you are unfamiliar with venv please reach out to the other contributors. Please use the requirements.txt in the current environment.

Make sure you have a file called .env that resembles the file .env.sample. Cop the sample file and replace with your values.

Database

For local dev, please install postgres and set up a database called "fisheye-dev". We'll create a schema called "epa" until we have a full architecture review. If you are using Windows Subsystem for Linux (WSL2) you can follow these instructions for Postgresql. You will need to install the database locally until we have establihed a public SQL server.

Remember to login to postgres as postgres and create a user for the application. These credentials should be unique and will only apply to your local database. Here is an example

>  sudo -u postgres createuser buddha
> sudo -u postgres createdb fisheye_dev

Login to the db

> sudo -u postgres psql
> psql (12.11 (Ubuntu 12.11-0ubuntu0.20.04.1))
Type "help" for help.

> alter user buddha with encrypted password 'epa4you';
> grant all privileges on database fisheye_dev to buddha ;

At this point, you're still the postgres user, so logout and re-enter the db

psql buddha -d fisheye_dev

There are no tables. But let's create the epa schema.

> create schema epa

All right! Now you should have some structure and you can either a) join the ETL team or b) go back to Python.

You may have to run the script several times to load the database. It runs out of memory, but if you run it again it will just work on the new tables.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
epa		epa
notebooks		notebooks
scripts		scripts
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fisheye-etl-az

Overview

EPA ECHO Data

Local Dev

Database

About

Releases

Packages

Languages

License

VerdantAI/fisheye-etl-az

Folders and files

Latest commit

History

Repository files navigation

fisheye-etl-az

Overview

EPA ECHO Data

Local Dev

Database

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages