ML pipeline to expose API on Heroku
pip environment set up
git clone <github HTTPS filepath>
virtualenv venv
source venv/bin/activate
# Install all dependencies of this file.
pip3 install -r requirements.txt
Set up git and dvc
- Install dvc
pip install 'dvc[s3]'
- Create a directory for the project and initialize git and dvc.
git init
dvc init
ls -a # check the file
- As you work on the code, continually commit changes. Generated models you want to keep must be committed to dvc.
mkdir ../local remote_dir
dvc remote add -d local_remote_dir
dvc remote list
-
Connect your local git repo to GitHub.
-
Setup GitHub Actions on your repo. You can use one of the pre-made GitHub Actions if at a minimum it runs pytest and flake8 on push and requires both to pass without error.
-
Make sure you set up the GitHub Action to have the same version of Python as you used in development.
-
Set up a remote repository for dvc. mybucket name is youheekil
dvc remote add -d storage s3://youheekil
git add .dvc/config
git commit -m "Configure remote storage"
- send data to the local remote with
dvc push
- retrieve the data
dvc pull
- Download census.csv and commit it to dvc.
dvc add ./data/raw/census.csv
git add .gitignore ./data/raw/census.csv
dvc push
- Raw data is messy
- Removed space in each column
- Replaced '?' in data to NA
- Dropped NA
python src/clean_data.py
- Commit this modified data to dvc.
- We kept the raw data untouched but then can keep updating the cooked version (processed).
dvc add ./data/processed/processed_census.csv
git add .gitignore ./data/processed/processed_census.csv
dvc push
- train machine learning model on data, save and load the model and any categorical encoders model inference determine the classification metrics.
python src/model.py
- Unit tests for 3 functions in the model code.
pytest src/model_test.py
dvc add ./model/xgboost.pkl
git add .gitignore ./model/xgboost.pkl && git commit -m "model file added"
dvc push
dvc add ./model/encoder.joblib
git add .gitignore ./model/encoder.joblib && git commit -m "model file added"
dvc push
- Details of the model can be found in a model card (document/model_card_template.md)
- GET on the root giving a welcome message.
- POST that does model inference. This model should contain an example.
- Write 3 unit tests to test the API (one for the GET and two for POST, one that tests each prediction).
-
Create Procfile Procfile is to give heroku command on what should be running (without extension)
-
Create runtime.txt runtime.txt is to specify which python version you are running.
-
shell
> heroku
> heroku create
> heroku apps
> heroku create <app-name> --buildpack heroku/python
> heroku buildpacks --app <app-name>
- git
> git status
> git add *
> git commit -m "heroku setup"
> git branch # check branch of git
> git push heroku main
- shell
> heroku run bash --app mlops-income-pred
# running heroku
> pwd # check current work directory
> ls
> exit # exit the heroku