ML pipeline to expose API on Heroku
pip environment set up
git clone <github HTTPS filepath>
virtualenv venv
source venv/bin/activate
# Install all dependencies of this file.
pip3 install -r requirements.txt
Set up git and dvc
- Install dvc
pip install 'dvc[s3]'
- Create a directory for the project and initialize git and dvc.
git init
dvc init
ls -a # check the file
- As you work on the code, continually commit changes. Generated models you want to keep must be committed to dvc.
mkdir ../local remote_dir
dvc remote add -d local_remote_dir
dvc remote list
Connect your local git repo to GitHub.
Setup GitHub Actions on your repo. You can use one of the pre-made GitHub Actions if at a minimum it runs pytest and flake8 on push and requires both to pass without error.
Make sure you set up the GitHub Action to have the same version of Python as you used in development.
Set up a remote repository for dvc. mybucket name is youheekil
dvc remote add -d storage s3://youheekil
git add .dvc/config
git commit -m "Configure remote storage"
- send data to the local remote with
dvc push
- retrieve the data
dvc pull
- Download census.csv and commit it to dvc.
dvc add ./data/raw/census.csv
git add .gitignore ./data/raw/census.csv
dvc push
- Raw data is messy
- Removed space in each column
- Replaced '?' in data to NA
- Dropped NA
python src/
- Commit this modified data to dvc.
- We kept the raw data untouched but then can keep updating the cooked version (processed).
dvc add ./data/processed/processed_census.csv
git add .gitignore ./data/processed/processed_census.csv
dvc push
- train machine learning model on data, save and load the model and any categorical encoders model inference determine the classification metrics.
python src/
- Unit tests for 3 functions in the model code.
pytest src/
dvc add ./model/xgboost.pkl
git add .gitignore ./model/xgboost.pkl && git commit -m "model file added"
dvc push
dvc add ./model/encoder.joblib
git add .gitignore ./model/encoder.joblib && git commit -m "model file added"
dvc push
- Details of the model can be found in a model card (document/
- GET on the root giving a welcome message.
- POST that does model inference. This model should contain an example.
- Write 3 unit tests to test the API (one for the GET and two for POST, one that tests each prediction).
Create Procfile Procfile is to give heroku command on what should be running (without extension)
Create runtime.txt runtime.txt is to specify which python version you are running.
> heroku
> heroku create
> heroku apps
> heroku create <app-name> --buildpack heroku/python
> heroku buildpacks --app <app-name>
- git
> git status
> git add *
> git commit -m "heroku setup"
> git branch # check branch of git
> git push heroku main
- shell
> heroku run bash --app mlops-income-pred
# running heroku
> pwd # check current work directory
> ls
> exit # exit the heroku