Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change: adding reinvent notebook examples #941

Merged
merged 45 commits into from
Dec 4, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
77188d4
Sagemaker debugger example notebook (#38)
NRauschmayr Nov 28, 2019
8c0cb28
Notebook to enable monitoring for existing Endpoints (#43)
abnagara Nov 30, 2019
2053d0a
MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs…
jigsaw004 Dec 1, 2019
665100b
Model monitor end to end notebook (#53)
abnagara Dec 1, 2019
60fee1c
Fixing minor issues (#51)
abnagara Dec 1, 2019
de2e3eb
SMDebugger notebook for plotting tensors in realtime (#39)
ddavydenko Dec 1, 2019
3628d86
First version or README for SM Debugger (#56)
andreaolgiati Dec 1, 2019
16144a1
Model monitoring visualization example (#52)
sojiadeshina Dec 1, 2019
4d607c3
Debugger readme brief edit (#60)
john-andrilla Dec 1, 2019
ed3831f
Adding example of MNIST TensorFlow analysis with Rules and reacting o…
jigsaw004 Dec 1, 2019
2bce366
fixed example (#61)
NRauschmayr Dec 2, 2019
7de4fed
Modified realtime analysis notebook for MEAD/Loosleaf (#58)
ddavydenko Dec 2, 2019
ef54e6d
Notebook demonstrating how to enable spot training with sagemaker deb…
leleamol Dec 2, 2019
ab215e6
Restructure notebooks, and update the rules notebooks (#57)
rahul003 Dec 2, 2019
87f96f0
Add SageMaker Debugger XGBoost Rules notebook (#54)
Dec 2, 2019
1430a5f
Add SageMaker Debugger XGBoost realtime analysis notebook (#50)
Dec 2, 2019
4478edf
Add an example notebook for data preprocessing using SageMaker Proces…
apacker Dec 2, 2019
c694c28
Add processing sklearn example (#63)
andremoeller Dec 2, 2019
9c51a0f
tf-mnist-custom-rule.ipynb editorial review (#65)
john-andrilla Dec 2, 2019
87d2161
XGBoost realtime analysis text edit (#64)
Dec 2, 2019
de2accb
Changes to catch up with SDK (#66)
abnagara Dec 2, 2019
6a19252
Add keras example, and rename folders to make it clear that they are …
rahul003 Dec 2, 2019
939fa83
Doc update and bug fixes in TF MNIST stop training job example (#70)
Dec 2, 2019
2ecceea
Updated the text in notebook from Sagemaker-dbugger to Amazon SageMak…
leleamol Dec 2, 2019
67de4b0
TF MNIST buildin rule doc fix and bug fix (#69)
Dec 2, 2019
87760e9
Fix typos (#72)
jarednielsen Dec 2, 2019
c7c99b2
Name Change (#73)
anirudhacharya Dec 2, 2019
903eca3
Update links in readme (#74)
rahul003 Dec 2, 2019
ecbe79b
add sagemaker experiment management sample notebook (#40)
jerrypeng7773 Dec 2, 2019
50ced59
Removed unnencessary imports and used the right pip command to instal…
leleamol Dec 2, 2019
24c4e29
Clear outputs except plots in xgboost debugger notebooks (#75)
Dec 2, 2019
907cadf
SMDebugger Example with BYOC (#45)
vandanavk Dec 3, 2019
40361b4
Add Deep Graph Library Amazon examples for SageMaker (#48)
classicsong Dec 3, 2019
8e4d412
Add sample notebook for AP (#62)
pokiripayal Dec 3, 2019
43fca85
rebase staging repo with public repo (#78)
chuyang-deng Dec 3, 2019
c6f51e8
Add readme for dgl_kge example (#79)
classicsong Dec 3, 2019
784f22e
Add sagemaker experiments LL sample notebook. (#80)
jerrypeng7773 Dec 3, 2019
1e8fbbe
rebase staging with public repo (#81)
chuyang-deng Dec 3, 2019
b90ea80
Adding a new sample notebook demonstrating using data from AWS Data E…
kwwaikar Nov 22, 2019
f0ee53f
Renamed sample notebook file (#932)
kwwaikar Nov 22, 2019
280e4cf
Add batch RL example notebook (#926)
yijiezh Nov 27, 2019
1494092
adding mxnet embedding serving notebook (#882)
la-cruche Nov 27, 2019
dedf1b8
Embedding demo fix (#938)
la-cruche Nov 29, 2019
a04d311
Add README for RL directory; Typo fix in network compression README f…
annaluo676 Dec 2, 2019
5793fa4
resolve conflicts
Dec 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,19 @@ These examples provide more thorough mathematical treatment on a select group of
- [Latent Dirichlet Allocation (LDA)](scientific_details_of_algorithms/lda_topic_modeling) dives into Amazon SageMaker's spectral decomposition approach to LDA.
- [Linear Learner features](scientific_details_of_algorithms/linear_learner_class_weights_loss_functions) shows how to use the class weights and loss functions features of the SageMaker Linear Learner algorithm to improve performance on a credit card fraud prediction task

### Amazon SageMaker Debugger
These examples provide and introduction to SageMaker Debugger which allows debugging and monitoring capabilities for training of machine learning and deep learning algorithms. Note that although these notebooks focus on a specific framework, the same approach works with all the frameworks that Amazon SageMaker Debugger supports. The notebooks below are listed in the order in which we recommend you review them.

- [Using a built-in rule with TensorFlow](sagemaker-debugger/tensorflow_builtin_rule/)
- [Using a custom rule with TensorFlow Keras](sagemaker-debugger/tensorflow_keras_custom_rule/)
- [Interactive tensor analysis in notebook with MXNet](sagemaker-debugger/mnist_tensor_analysis/)
- [Real-time analysis in notebook with MXNet](sagemaker-debugger/mxnet_realtime_analysis/)
- [Using a built in rule with XGBoost](sagemaker-debugger/xgboost_builtin_rules/)
- [Real-time analysis in notebook with XGBoost](sagemaker-debugger/xgboost_realtime_analysis/)
- [Using SageMaker Debugger with Managed Spot Training and MXNet](sagemaker-debugger/mxnet_spot_training/)
- [Reacting to CloudWatch Events from Rules to take an action based on status with TensorFlow](sagemaker-debugger/tensorflow_action_on_rule/)
- [Using SageMaker Debugger with a custom PyTorch container](sagemaker-debugger/pytorch_custom_container/)

### Advanced Amazon SageMaker Functionality

These examples that showcase unique functionality available in Amazon SageMaker. They cover a broad range of topics and will utilize a variety of methods, but aim to provide the user with sufficient insight or inspiration to develop within Amazon SageMaker.
Expand All @@ -109,6 +122,9 @@ These examples that showcase unique functionality available in Amazon SageMaker.
- [Inference Pipeline with SparkML and XGBoost](advanced_functionality/inference_pipeline_sparkml_xgboost_abalone) shows how to deploy an Inference Pipeline with SparkML for data pre-processing and XGBoost for training on the Abalone dataset. The pre-processing code is written once and used between training and inference.
- [Inference Pipeline with SparkML and BlazingText](advanced_functionality/inference_pipeline_sparkml_blazingtext_dbpedia) shows how to deploy an Inference Pipeline with SparkML for data pre-processing and BlazingText for training on the DBPedia dataset. The pre-processing code is written once and used between training and inference.
- [Experiment Management Capabilities with Search](advanced_functionality/search) shows how to organize Training Jobs into projects, and track relationships between Models, Endpoints, and Training Jobs.
- [Host Multiple Models with Your Own Algorithm](advanced_functionality/multi_model_bring_your_own) shows how to deploy multiple models to a realtime hosted endpoint with your own custom algorithm.
- [Host Multiple Models with XGBoost](advanced_functionality/multi_model_xgboost_home_value) shows how to deploy multiple models to a realtime hosted endpoint using a multi-model enabled XGBoost container.
- [Host Multiple Models with SKLearn](advanced_functionality/multi_model_sklearn_home_value) shows how to deploy multiple models to a realtime hosted endpoint using a multi-model enabled SKLearn container.

### Amazon SageMaker Neo Compilation Jobs

Expand All @@ -120,6 +136,13 @@ These examples provide you an introduction to how to use Neo to optimizes deep l
- [Distributed TensorFlow](sagemaker_neo_compilation_jobs/tensorflow_distributed_mnist) Adapts form [tensorflow mnist](sagemaker-python-sdk/tensorflow_distributed_mnist) including Neo API and comparsion between the baseline
- [Predicting Customer Churn](sagemaker_neo_compilation_jobs/xgboost_customer_churn) Adapts form [xgboost customer churn](introduction_to_applying_machine_learning/xgboost_customer_churn) including Neo API and comparsion between the baseline

### Amazon SageMaker Procesing

These examples show you how to use SageMaker Processing jobs to run data processing workloads.

- [Scikit-Learn Data Processing and Model Evaluation](sagemaker_processing/scikit_learn_data_processing_and_model_evaluation) shows how to use SageMaker Processing and the Scikit-Learn container to run data preprocessing and model evaluation workloads.
- [Feature transformation with Amazon SageMaker Processing and SparkML](sagemaker_processing/feature_transformation_with_sagemaker_processing) shows how to use SageMaker Processing to run data processing workloads using SparkML prior to training.

### Amazon SageMaker Pre-Built Framework Containers and the Python SDK

#### Pre-Built Deep Learning Framework Containers
Expand Down Expand Up @@ -173,6 +196,9 @@ These examples show you how to use model-packages and algorithms from AWS Market
- [Using models for extracting vehicle metadata](aws_marketplace/using_model_packages/auto_insurance) provides a detailed walkthrough on how to use pre-trained models from AWS Marketplace for extracting metadata for a sample use-case of auto-insurance claim processing.
- [Using models for identifying non-compliance at a workplace](aws_marketplace/using_model_packages/improving_industrial_workplace_safety) provides a detailed walkthrough on how to use pre-trained models from AWS Marketplace for extracting metadata for a sample use-case of generating summary reports for identifying non-compliance at a construction/industrial workplace.

- [Using Data](aws_marketplace/using_data)
- [Using data and algorithm from AWS Marketplace for training a model](aws_marketplace/using_data/using_data_from_aws_data_exchange_to_predict_product_popularity) provides a detailed walkthrough on how to use data from AWS Marketplace for training a model that predicts popularity of a bath product.

### Under Development

These Amazon SageMaker examples fully illustrate a concept, but may require some additional configuration on the users part to complete.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
FROM ubuntu:16.04

# Set a docker label to advertise multi-model support on the container
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
# Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

# Install necessary dependencies for MMS and SageMaker Inference Toolkit
RUN apt-get update && \
apt-get -y install --no-install-recommends \
build-essential \
ca-certificates \
openjdk-8-jdk-headless \
python3-dev \
curl \
vim \
&& rm -rf /var/lib/apt/lists/* \
&& curl -O https://bootstrap.pypa.io/get-pip.py \
&& python3 get-pip.py

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

# Install MXNet, MMS, and SageMaker Inference Toolkit to set up MMS
RUN pip3 --no-cache-dir install mxnet \
multi-model-server \
sagemaker-inference \
retrying

# Copy entrypoint script to the image
COPY dockerd-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
RUN chmod +x /usr/local/bin/dockerd-entrypoint.py

RUN mkdir -p /home/model-server/

# Copy the default custom service file to handle incoming data and inference requests
COPY model_handler.py /home/model-server/model_handler.py

# Define an entrypoint script for the docker image
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]

# Define command to be passed to the entrypoint
CMD ["serve"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import subprocess
import sys
import shlex
import os
from retrying import retry
from subprocess import CalledProcessError
from sagemaker_inference import model_server

def _retry_if_error(exception):
return isinstance(exception, CalledProcessError or OSError)

@retry(stop_max_delay=1000 * 50,
retry_on_exception=_retry_if_error)
def _start_mms():
# by default the number of workers per model is 1, but we can configure it through the
# environment variable below if desired.
# os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
model_server.start_model_server(handler_service='/home/model-server/model_handler.py:handle')

def main():
if sys.argv[1] == 'serve':
_start_mms()
else:
subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))

# prevent docker exit
subprocess.call(['tail', '-f', '/dev/null'])

main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
"""
ModelHandler defines an example model handler for load and inference requests for MXNet CPU models
"""
from collections import namedtuple
import glob
import json
import logging
import os
import re

import mxnet as mx
import numpy as np

class ModelHandler(object):
"""
A sample Model handler implementation.
"""

def __init__(self):
self.initialized = False
self.mx_model = None
self.shapes = None

def get_model_files_prefix(self, model_dir):
"""
Get the model prefix name for the model artifacts (symbol and parameter file).
This assume model artifact directory contains a symbol file, parameter file,
model shapes file and a synset file defining the labels

:param model_dir: Path to the directory with model artifacts
:return: prefix string for model artifact files
"""
sym_file_suffix = "-symbol.json"
checkpoint_prefix_regex = "{}/*{}".format(model_dir, sym_file_suffix) # Ex output: /opt/ml/models/resnet-18/model/*-symbol.json
checkpoint_prefix_filename = glob.glob(checkpoint_prefix_regex)[0] # Ex output: /opt/ml/models/resnet-18/model/resnet18-symbol.json
checkpoint_prefix = os.path.basename(checkpoint_prefix_filename).split(sym_file_suffix)[0] # Ex output: resnet18
logging.info("Prefix for the model artifacts: {}".format(checkpoint_prefix))
return checkpoint_prefix

def get_input_data_shapes(self, model_dir, checkpoint_prefix):
"""
Get the model input data shapes and return the list

:param model_dir: Path to the directory with model artifacts
:param checkpoint_prefix: Model files prefix name
:return: prefix string for model artifact files
"""
shapes_file_path = os.path.join(model_dir, "{}-{}".format(checkpoint_prefix, "shapes.json"))
if not os.path.isfile(shapes_file_path):
raise RuntimeError("Missing {} file.".format(shapes_file_path))

with open(shapes_file_path) as f:
self.shapes = json.load(f)

data_shapes = []

for input_data in self.shapes:
data_name = input_data["name"]
data_shape = input_data["shape"]
data_shapes.append((data_name, tuple(data_shape)))

return data_shapes

def initialize(self, context):
"""
Initialize model. This will be called during model loading time
:param context: Initial context contains model server system properties.
:return:
"""
self.initialized = True
properties = context.system_properties
# Contains the url parameter passed to the load request
model_dir = properties.get("model_dir")
gpu_id = properties.get("gpu_id")

checkpoint_prefix = self.get_model_files_prefix(model_dir)

# Read the model input data shapes
data_shapes = self.get_input_data_shapes(model_dir, checkpoint_prefix)

# Load MXNet model
try:
ctx = mx.cpu() # Set the context on CPU
sym, arg_params, aux_params = mx.model.load_checkpoint(checkpoint_prefix, 0) # epoch set to 0
self.mx_model = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
self.mx_model.bind(for_training=False, data_shapes=data_shapes,
label_shapes=self.mx_model._label_shapes)
self.mx_model.set_params(arg_params, aux_params, allow_missing=True)
with open("synset.txt", 'r') as f:
self.labels = [l.rstrip() for l in f]
except (mx.base.MXNetError, RuntimeError) as memerr:
if re.search('Failed to allocate (.*) Memory', str(memerr), re.IGNORECASE):
logging.error("Memory allocation exception: {}".format(memerr))
raise MemoryError
raise

def preprocess(self, request):
"""
Transform raw input into model input data.
:param request: list of raw requests
:return: list of preprocessed model input data
"""
# Take the input data and pre-process it make it inference ready

img_list = []
for idx, data in enumerate(request):
# Read the bytearray of the image from the input
img_arr = data.get('body')

# Input image is in bytearray, convert it to MXNet NDArray
img = mx.img.imdecode(img_arr)
if img is None:
return None

# convert into format (batch, RGB, width, height)
img = mx.image.imresize(img, 224, 224) # resize
img = img.transpose((2, 0, 1)) # Channel first
img = img.expand_dims(axis=0) # batchify
img_list.append(img)

return img_list

def inference(self, model_input):
"""
Internal inference methods
:param model_input: transformed model input data list
:return: list of inference output in NDArray
"""
# Do some inference call to engine here and return output
Batch = namedtuple('Batch', ['data'])
self.mx_model.forward(Batch(model_input))
prob = self.mx_model.get_outputs()[0].asnumpy()
return prob

def postprocess(self, inference_output):
"""
Return predict result in as list.
:param inference_output: list of inference output
:return: list of predict results
"""
# Take output from network and post-process to desired format
prob = np.squeeze(inference_output)
a = np.argsort(prob)[::-1]
return [['probability=%f, class=%s' %(prob[i], self.labels[i]) for i in a[0:5]]]

def handle(self, data, context):
"""
Call preprocess, inference and post-process functions
:param data: input data
:param context: mms context
"""

model_input = self.preprocess(data)
model_out = self.inference(model_input)
return self.postprocess(model_out)

_service = ModelHandler()


def handle(data, context):
if not _service.initialized:
_service.initialize(context)

if data is None:
return None

return _service.handle(data, context)
Loading