From 06a54c9ac8fdfbac311a240ab7557b7778561782 Mon Sep 17 00:00:00 2001 From: David Goodwin Date: Thu, 29 Nov 2018 14:41:26 -0800 Subject: [PATCH] Add README reference to release notes for #3 --- README.rst | 49 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/README.rst b/README.rst index b8a516266d..f3ff0e05b5 100644 --- a/README.rst +++ b/README.rst @@ -30,7 +30,6 @@ NVIDIA TensorRT Inference Server ================================ - **NOTE: You are currently on the master branch which tracks under-development progress towards the next release. The latest release of the TensorRT Inference Server is 0.8.0 beta and is @@ -45,26 +44,37 @@ inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. TRTIS provides the following features: -* `Multiple framework support `_. The server can manage any number and mix of - models (limited by system disk and memory resources). Supports - TensorRT, TensorFlow GraphDef, TensorFlow SavedModel and Caffe2 - NetDef model formats. Also supports TensorFlow-TensorRT integrated - models. +* `Multiple framework support + `_. The + server can manage any number and mix of models (limited by system + disk and memory resources). Supports TensorRT, TensorFlow GraphDef, + TensorFlow SavedModel and Caffe2 NetDef model formats. Also supports + TensorFlow-TensorRT integrated models. * Multi-GPU support. The server can distribute inferencing across all system GPUs. -* `Concurrent model execution support `_. Multiple models (or multiple instances of the - same model) can run simultaneously on the same GPU. +* `Concurrent model execution support + `_. Multiple + models (or multiple instances of the same model) can run + simultaneously on the same GPU. * Batching support. For models that support batching, the server can accept requests for a batch of inputs and respond with the corresponding batch of outputs. The server also supports `dynamic - batching `_ where individual inference requests are dynamically - combined together to improve inference throughput. Dynamic batching - is transparent to the client requesting inference. -* `Model repositories `_ may reside on a locally accessible file system (e.g. NFS) or - in Google Cloud Storage. -* Readiness and liveness `health endpoints `_ suitable for any orchestration or deployment framework, such as Kubernetes. -* `Metrics `_ indicating GPU utiliization, server throughput, and server - latency. + batching + `_ + where individual inference requests are dynamically combined + together to improve inference throughput. Dynamic batching is + transparent to the client requesting inference. +* `Model repositories + `_ + may reside on a locally accessible file system (e.g. NFS) or in + Google Cloud Storage. +* Readiness and liveness `health endpoints + `_ + suitable for any orchestration or deployment framework, such as + Kubernetes. +* `Metrics + `_ + indicating GPU utiliization, server throughput, and server latency. .. overview-end-marker-do-not-remove @@ -82,6 +92,13 @@ You can also view the documentation for the `master branch and for `earlier releases `_. +The `Release Notes +`_ +and `Support Matrix +`_ +indicate the required versions of the NVIDIA Driver and CUDA, and also +describe which GPUs are supported by TRTIS. + Contributing ------------