Triton Paddle Backend

Quick Start

Pull Image

docker pull paddlepaddle/triton_paddle:21.10

Note: Only Triton Inference Server 21.10 image is supported.

Create A Model Repository

The model repository is the directory where you place the models that you want Triton to server. An example model repository is included in the examples. Before using the repository, you must fetch it by the following scripts.

$ cd examples
$ ./fetch_models.sh
$ cd .. # back to root of paddle_backend

Launch Triton Inference Server

Launch the image

$ docker run --gpus=all --rm -it --name triton_server --net=host -e CUDA_VISIBLE_DEVICES=0 \
           -v `pwd`/examples/models:/workspace/models \
           paddlepaddle/triton_paddle:21.10 /bin/bash

Launch the triton inference server

/opt/tritonserver/bin/tritonserver --model-repository=/workspace/models

Note: /opt/tritonserver/bin/tritonserver --help for all available parameters

Verify Triton Is Running Correctly

Use Triton’s ready endpoint to verify that the server and the models are ready for inference. From the host system use curl to access the HTTP endpoint that indicates server status.

$ curl -v localhost:8000/v2/health/ready
...
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain

The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready.

Examples

Before running the examples, please make sure the triton server is running correctly.

Change working directory to examples

$ cd examples

ERNIE Base

ERNIE-2.0 is a pre-training framework for language understanding.

Steps to run the benchmark on ERNIE

$ bash perf_ernie.sh

ResNet50 v1.5

The ResNet50-v1.5 is a modified version of the original ResNet50 v1 model.

Steps to run the benchmark on ResNet50-v1.5

$ bash perf_resnet50_v1.5.sh

Steps to run the inference on ResNet50-v1.5.

Prepare processed images following DeepLearningExamples and place imagenet folder under examples directory.
Run the inference

$ bash infer_resnet_v1.5.sh imagenet/<id>

Performance

ERNIE Base (T4)

Precision	Backend Accelerator	Client Batch Size	Sequences/second	P90 Latency (ms)	P95 Latency (ms)	P99 Latency (ms)	Avg Latency (ms)
FP16	TensorRT	1	270.0	3.813	3.846	4.007	3.692
FP16	TensorRT	2	500.4	4.282	4.332	4.709	3.980
FP16	TensorRT	4	831.2	5.141	5.242	5.569	4.797
FP16	TensorRT	8	1128.0	7.788	7.949	8.255	7.089
FP16	TensorRT	16	1363.2	12.702	12.993	13.507	11.738
FP16	TensorRT	32	1529.6	22.495	22.817	24.634	20.901

ResNet50 v1.5 (V100-SXM2-16G)

Precision	Backend Accelerator	Client Batch Size	Sequences/second	P90 Latency (ms)	P95 Latency (ms)	P99 Latency (ms)	Avg Latency (ms)
FP16	TensorRT	1	288.8	3.494	3.524	3.608	3.462
FP16	TensorRT	2	494.0	4.083	4.110	4.208	4.047
FP16	TensorRT	4	758.4	5.327	5.359	5.460	5.273
FP16	TensorRT	8	1044.8	7.728	7.770	7.949	7.658
FP16	TensorRT	16	1267.2	12.742	12.810	13.883	12.647
FP16	TensorRT	32	1113.6	28.840	29.044	30.357	28.641
FP16	TensorRT	64	1100.8	58.512	58.642	59.967	58.251
FP16	TensorRT	128	1049.6	121.371	121.834	123.371	119.991

ResNet50 v1.5 (T4)

Precision	Backend Accelerator	Client Batch Size	Sequences/second	P90 Latency (ms)	P95 Latency (ms)	P99 Latency (ms)	Avg Latency (ms)
FP16	TensorRT	1	291.8	3.471	3.489	3.531	3.427
FP16	TensorRT	2	466.0	4.323	4.336	4.382	4.288
FP16	TensorRT	4	665.6	6.031	6.071	6.142	6.011
FP16	TensorRT	8	833.6	9.662	9.684	9.767	9.609
FP16	TensorRT	16	899.2	18.061	18.208	18.899	17.748
FP16	TensorRT	32	761.6	42.333	43.456	44.167	41.740
FP16	TensorRT	64	793.6	79.860	80.410	80.807	79.680
FP16	TensorRT	128	793.6	158.207	158.278	158.643	157.543

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
examples		examples
paddle-lib		paddle-lib
scripts		scripts
src		src
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md
README_en.md		README_en.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton Paddle Backend

Table of Contents

Quick Start

Pull Image

Create A Model Repository

Launch Triton Inference Server

Verify Triton Is Running Correctly

Examples

ERNIE Base

ResNet50 v1.5

Performance

ERNIE Base (T4)

ResNet50 v1.5 (V100-SXM2-16G)

ResNet50 v1.5 (T4)

About

Releases

Packages

Contributors 4

Languages

License

triton-inference-server/paddlepaddle_backend

Folders and files

Latest commit

History

Repository files navigation

Triton Paddle Backend

Table of Contents

Quick Start

Pull Image

Create A Model Repository

Launch Triton Inference Server

Verify Triton Is Running Correctly

Examples

ERNIE Base

ResNet50 v1.5

Performance

ERNIE Base (T4)

ResNet50 v1.5 (V100-SXM2-16G)

ResNet50 v1.5 (T4)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages