Skip to content

Commit

Permalink
docs: Re-structure User Guides for Discoverability (#7807)
Browse files Browse the repository at this point in the history
Co-authored-by: Meenakshi Sharma <[email protected]>
Co-authored-by: Kyle McGill <[email protected]>
  • Loading branch information
3 people authored Dec 18, 2024
1 parent 838966a commit 0194c3d
Show file tree
Hide file tree
Showing 44 changed files with 6,114 additions and 976 deletions.
9 changes: 8 additions & 1 deletion docs/Dockerfile.docs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -59,6 +59,7 @@ RUN pip3 install \
breathe \
docutils \
exhale \
httplib2 \
ipython \
myst-nb \
nbclient \
Expand All @@ -73,6 +74,12 @@ RUN pip3 install \
sphinx-tabs \
sphinxcontrib-bibtex


# install nvidia-sphinx-theme
RUN pip3 install \
--index-url https://urm.nvidia.com/artifactory/api/pypi/ct-omniverse-pypi/simple/ \
nvidia-sphinx-theme

# Set visitor script to be included on every HTML page
ENV VISITS_COUNTING_SCRIPT="//assets.adobedtm.com/b92787824f2e0e9b68dc2e993f9bd995339fe417/satelliteLib-7ba51e58dc61bcb0e9311aadd02a0108ab24cc6c.js"

6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,9 +124,9 @@ Triton supports batching individual inference requests to improve compute resour
- [Queuing Policies](user_guide/model_configuration.md#queue-policy)
- [Ragged Batching](user_guide/ragged_batching.md)
- [Sequence Batcher](user_guide/model_configuration.md#sequence-batcher)
- [Stateful Models](user_guide/architecture.md#stateful-models)
- [Control Inputs](user_guide/architecture.md#control-inputs)
- [Implicit State - Stateful Inference Using a Stateless Model](user_guide/architecture.md#implicit-state-management)
- [Stateful Models](user_guide/model_execution.md#stateful-models)
- [Control Inputs](user_guide/model_execution.md#control-inputs)
- [Implicit State - Stateful Inference Using a Stateless Model](user_guide/implicit_state_management.md#implicit-state-management)
- [Sequence Scheduling Strategies](user_guide/architecture.md#scheduling-strategies)
- [Direct](user_guide/architecture.md#direct)
- [Oldest](user_guide/architecture.md#oldest)
Expand Down
11 changes: 11 additions & 0 deletions docs/backend_guide/vllm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
########
vLLM
########

.. toctree::
:hidden:
:caption: vLLM
:maxdepth: 2

../vllm_backend/README
Multi-LoRA <../vllm_backend/docs/llama_multi_lora_tutorial>
10 changes: 10 additions & 0 deletions docs/client_guide/api_reference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
####
API Reference
####

.. toctree::
:maxdepth: 1
:hidden:

OpenAI API <openai_readme.md>
kserve
39 changes: 39 additions & 0 deletions docs/client_guide/in_process.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
####
In-Process Triton Server API
####


The Triton Inference Server provides a backwards-compatible C API/ python-bindings/java-bindings that
allows Triton to be linked directly into a C/C++/java/python application. This API
is called the "Triton Server API" or just "Server API" for short. The
API is implemented in the Triton shared library which is built from
source contained in the `core
repository <https://github.com/triton-inference-server/core>`__. On Linux
this library is libtritonserver.so and on Windows it is
tritonserver.dll. In the Triton Docker image the shared library is
found in /opt/tritonserver/lib. The header file that defines and
documents the Server API is
`tritonserver.h <https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonserver.h>`__.
`Java bindings for In-Process Triton Server API <../customization_guide/inprocess_java_api.html#java-bindings-for-in-process-triton-server-api>`__
are built on top of `tritonserver.h` and can be used for Java applications that
need to use Tritonserver in-process.

All capabilities of Triton server are encapsulated in the shared
library and are exposed via the Server API. The `tritonserver`
executable implements HTTP/REST and GRPC endpoints and uses the Server
API to communicate with core Triton logic. The primary source files
for the endpoints are `grpc_server.cc <https://github.com/triton-inference-server/server/blob/main/src/grpc/grpc_server.cc>`__ and
`http_server.cc <https://github.com/triton-inference-server/server/blob/main/src/http_server.cc>`__. In these source files you can
see the Server API being used.

You can use the Server API in your own application as well. A simple
example using the Server API can be found in
`simple.cc <https://github.com/triton-inference-server/server/blob/main/src/simple.cc>`__.

.. toctree::
:maxdepth: 1
:hidden:

C/C++ <../customization_guide/inprocess_c_api.md>
python
Java <../customization_guide/inprocess_java_api.md>
15 changes: 15 additions & 0 deletions docs/client_guide/kserve.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
####
KServe API
####


Triton uses the
`KServe community standard inference protocols <https://github.com/kserve/kserve/tree/master/docs/predict-api/v2>`__
to define HTTP/REST and GRPC APIs plus several extensions.

.. toctree::
:maxdepth: 1
:hidden:

HTTP/REST and GRPC Protocol <../customization_guide/inference_protocols.md>
kserve_extension
24 changes: 24 additions & 0 deletions docs/client_guide/kserve_extension.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
####
Extensions
####

To fully enable all capabilities
Triton also implements `HTTP/REST and GRPC
extensions <https://github.com/triton-inference-server/server/tree/main/docs/protocol>`__
to the KServe inference protocol.

.. toctree::
:maxdepth: 1
:hidden:

Binary tensor data extension <../protocol/extension_binary_data.md>
Classification extension <../protocol/extension_classification.md>
Schedule policy extension <../protocol/extension_schedule_policy.md>
Sequence extension <../protocol/extension_sequence.md>
Shared-memory extension <../protocol/extension_shared_memory.md>
Model configuration extension <../protocol/extension_model_configuration.md>
Model repository extension <../protocol/extension_model_repository.md>
Statistics extension <../protocol/extension_statistics.md>
Trace extension <../protocol/extension_trace.md>
Logging extension <../protocol/extension_logging.md>
Parameters extension <../protocol/extension_parameters.md>
1 change: 1 addition & 0 deletions docs/client_guide/openai_readme.md
12 changes: 12 additions & 0 deletions docs/client_guide/python.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
####
Python
####

.. include:: python_readme.rst

.. toctree::
:maxdepth: 1
:hidden:

Kafka I/O <../tutorials/Triton_Inference_Server_Python_API/examples/kafka-io/README.md>
Rayserve <../tutorials/Triton_Inference_Server_Python_API/examples/rayserve/README.md>
Loading

0 comments on commit 0194c3d

Please sign in to comment.