Welcome to the 2024 MLOps Zoomcamp Customer Satisfaction Prediction Project.
In this project, we aim to predict customer satisfaction at restaurants using a machine learning pipeline and create a MLops pipeline with Microsoft MLops Maturity level 2 with automated training using Mage ai. We utilize a dataset from Kaggle, the Predict Restaurant Customer Satisfaction Dataset, to build and deploy our models. The project covers the end-to-end MLOps lifecycle, including data preprocessing, model training, hyperparameter tuning, and model deployment.
The services are deployed on AWS using CloudFormation (with AWS CDK), and the project demonstrates the integration of various AWS services such as SageMaker, S3, Lambda, and more.
Key components of this project include:
- Deploying an MLflow Server: Using AWS CDK to deploy and manage an MLflow server for experiment tracking and model registry.
- Model Training: Training a Random Forest Classifier and XGBoost Classifier in Amazon SageMaker Studio, with the models and artifacts stored in an S3 bucket.
- Hyperparameter Tuning: Performing hyperparameter tuning on the XGBoost model and registering the best model parameters in MLflow.
- MLops Orchestraction: Using Mage AI automate the training pipeline and model deployment
- Model Deployment: Deploying the trained model as an AWS Lambda function using AWS CDK for serverless inference.
References: Manage your machine learning lifecycle with MLflow and Amazon SageMaker Mage AI docker Microsoft MLops Maturity Level
This project I break it down into different compounent, the services are all deployed to AWS and using CloudFormation(CDK) to deploy.
- Deploy a MLflow server through CDK
- Train a Random Forest Classifier and XGBoost Classifier in SageMaker studio and store model artifast in S3 bucket
- Hyperparameter tuning XGBoost model and register the best parameter in MLflow
- Productionize the training pipeline in Mage to orchestrate the flow
- Deploy model through CDK as a Lambda function
- CI enabled through GitHub Action
-
AWS Account You need an AWS account with sufficient permissions to create and manage resources such as S3 buckets, Lambda functions, and SageMaker instances.
-
AWS CLI Install and configure the AWS Command Line Interface (CLI) to interact with AWS services. Install AWS CLI Configure AWS CLI with your credentials
aws configure
-
Python and Pipenv Ensure you have at least 3.10 python installed and latest pipenv
-
Docker Install Docker to build and deploy containerized applications.
- Clone the Repository
git clone https://github.com/ruqianq/aws-xgboost-mlops-project
- Install Required Packages and activate the virtual environment Install required package through pipenv
pipenv install
pipenv shell
- Set Up Environment
export AWS_ACCESS_KEY_ID = [YOUR AWS ACCESS KEY ID]
export AWS_SECRET_ACCESS_KEY = [YOUR AWS SECRET ACCESS KEY]
export AWS_REGION=[YOUR AWS REGION]
-
Deploying MLflow Server as experiment tracking in AWS
- Navigate to the MLflow Directory
cd experiment-tracking
- Deploy MLflow using AWS CDK
ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"') AWS_REGION=$(aws configure get region) cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION} cdk deploy --parameters ProjectName=mlflow --require-approval never
- Then you can navigate to your CloudFormation output and get the URL of your remote MLflow server
import mlflow mlflow.set_tracking_uri('<YOUR LOAD BALANCER URI>')
NOTE: this example is from this repo from AWS
-
Training and register model in SageMaker
- Open SageMaker Studio in AWS and Create a domain
- Navigate to notebooks
cd notebooks
-
Training and register model in Mage
- Navigate to orchestration
cd orchestration
- Run the docker compose
docker-compose up
- Navigate to localhost
- Run the pipeline:
- Model Monitoring
- CD release pipeline
- AWS Flink for real time ML inference