2024 MLops Zoomcamp Customer Satisfaction at Restaurants Prediction Project

Description

Welcome to the 2024 MLOps Zoomcamp Customer Satisfaction Prediction Project.

In this project, we aim to predict customer satisfaction at restaurants using a machine learning pipeline and create a MLops pipeline with Microsoft MLops Maturity level 2 with automated training using Mage ai. We utilize a dataset from Kaggle, the Predict Restaurant Customer Satisfaction Dataset, to build and deploy our models. The project covers the end-to-end MLOps lifecycle, including data preprocessing, model training, hyperparameter tuning, and model deployment.

The services are deployed on AWS using CloudFormation (with AWS CDK), and the project demonstrates the integration of various AWS services such as SageMaker, S3, Lambda, and more.

Key components of this project include:

Deploying an MLflow Server: Using AWS CDK to deploy and manage an MLflow server for experiment tracking and model registry.
Model Training: Training a Random Forest Classifier and XGBoost Classifier in Amazon SageMaker Studio, with the models and artifacts stored in an S3 bucket.
Hyperparameter Tuning: Performing hyperparameter tuning on the XGBoost model and registering the best model parameters in MLflow.
MLops Orchestraction: Using Mage AI automate the training pipeline and model deployment
Model Deployment: Deploying the trained model as an AWS Lambda function using AWS CDK for serverless inference.

References: Manage your machine learning lifecycle with MLflow and Amazon SageMaker Mage AI docker Microsoft MLops Maturity Level

Overview

This project I break it down into different compounent, the services are all deployed to AWS and using CloudFormation(CDK) to deploy.

Deploy a MLflow server through CDK
Train a Random Forest Classifier and XGBoost Classifier in SageMaker studio and store model artifast in S3 bucket
Hyperparameter tuning XGBoost model and register the best parameter in MLflow
Productionize the training pipeline in Mage to orchestrate the flow
Deploy model through CDK as a Lambda function
CI enabled through GitHub Action

Prerequisites

AWS Account You need an AWS account with sufficient permissions to create and manage resources such as S3 buckets, Lambda functions, and SageMaker instances.
AWS CLI Install and configure the AWS Command Line Interface (CLI) to interact with AWS services. Install AWS CLI Configure AWS CLI with your credentials

aws configure

Python and Pipenv Ensure you have at least 3.10 python installed and latest pipenv
Docker Install Docker to build and deploy containerized applications.

Installation

Clone the Repository

git clone https://github.com/ruqianq/aws-xgboost-mlops-project

Install Required Packages and activate the virtual environment Install required package through pipenv

pipenv install
pipenv shell

Set Up Environment

export AWS_ACCESS_KEY_ID = [YOUR AWS ACCESS KEY ID]
export AWS_SECRET_ACCESS_KEY = [YOUR AWS SECRET ACCESS KEY]
export AWS_REGION=[YOUR AWS REGION]

Usage

Deploying MLflow Server as experiment tracking in AWS

Navigate to the MLflow Directory

cd experiment-tracking

Deploy MLflow using AWS CDK

ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
AWS_REGION=$(aws configure get region)
cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
cdk deploy --parameters ProjectName=mlflow --require-approval never

Then you can navigate to your CloudFormation output and get the URL of your remote MLflow server

import mlflow
mlflow.set_tracking_uri('<YOUR LOAD BALANCER URI>')

NOTE: this example is from this repo from AWS

Training and register model in SageMaker
1. Open SageMaker Studio in AWS and Create a domain
2. Navigate to notebooks
```
cd notebooks
```
1. Upload notebooks and run all the blocks and in the end you should be able to register the model and see the model artifest save under S3 bucket:
Training and register model in Mage
1. Navigate to orchestration
```
cd orchestration
```
1. Run the docker compose
```
docker-compose up
```
1. Navigate to localhost
2. Run the pipeline:

Future Enhancement

Model Monitoring
CD release pipeline
AWS Flink for real time ML inference

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
data		data
deployment		deployment
experiment-tracking		experiment-tracking
images		images
notebooks		notebooks
orchestration		orchestration
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2024 MLops Zoomcamp Customer Satisfaction at Restaurants Prediction Project

Description

Table of Contents

Overview

Prerequisites

Installation

Usage

Future Enhancement

About

Releases

Packages

Languages

ruqianq/aws-xgboost-mlops-project

Folders and files

Latest commit

History

Repository files navigation

2024 MLops Zoomcamp Customer Satisfaction at Restaurants Prediction Project

Description

Table of Contents

Overview

Prerequisites

Installation

Usage

Future Enhancement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages