Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Python Deployment of Triton Inference Server #7501

Merged
merged 98 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
c9de538
Basic Interface and Bindings
KrishnanPrash Jul 3, 2024
a5bb38f
testing
KrishnanPrash Jul 8, 2024
2215e58
Adding stuff
KrishnanPrash Jul 13, 2024
c450c0d
Working MVP for wheel
KrishnanPrash Jul 20, 2024
aab5e82
Remove _deps
KrishnanPrash Jul 20, 2024
6c2d9b2
Working MVP (No linking issues)
KrishnanPrash Jul 22, 2024
819262f
Working server HTTP bindings
KrishnanPrash Jul 23, 2024
7a141b7
Not working pybind->dict stuff
KrishnanPrash Jul 25, 2024
62ce8db
Get logic not working
KrishnanPrash Jul 25, 2024
d388614
Update
KrishnanPrash Jul 29, 2024
47bde5b
GRPC basic working
KrishnanPrash Jul 30, 2024
ff2c919
Working http and grpc frontends
KrishnanPrash Aug 3, 2024
a7d17bd
Removing unecessary changes
KrishnanPrash Aug 5, 2024
06147cd
Adding testing and cleaning up code
KrishnanPrash Aug 5, 2024
d35dce4
Adding/Updating copyright
KrishnanPrash Aug 5, 2024
fc40f50
removing extra space
KrishnanPrash Aug 5, 2024
ca29a19
spacing
KrishnanPrash Aug 5, 2024
44c81b7
Adding back CMake compile_feature
KrishnanPrash Aug 5, 2024
745ff6a
modify cmake
KrishnanPrash Aug 5, 2024
bf6f871
Removing print statements
KrishnanPrash Aug 5, 2024
b30ced3
Extra spacing
KrishnanPrash Aug 5, 2024
30067a8
Formatting
KrishnanPrash Aug 5, 2024
33afc53
Running pre-commit
KrishnanPrash Aug 5, 2024
ef1f602
Updates
KrishnanPrash Aug 5, 2024
554ed4a
Standardizing function names
KrishnanPrash Aug 5, 2024
2de88e2
Fixing potential file handling error
KrishnanPrash Aug 5, 2024
de734bd
fixing order of output arguments
KrishnanPrash Aug 6, 2024
c412a60
Removing unecessary code/Standardizing
KrishnanPrash Aug 7, 2024
95ceb33
Changing shared_ptr deleter name
KrishnanPrash Aug 7, 2024
e1548cb
Cleaning up CMake
KrishnanPrash Aug 7, 2024
ff2a6e9
Removing unecessary code / fixing function names
KrishnanPrash Aug 7, 2024
22cc785
Working Triton-specfic Error Handling
KrishnanPrash Aug 7, 2024
ccabc08
Removing unused import
KrishnanPrash Aug 7, 2024
879d3da
Migrating from Custom Validation to Pydantic
KrishnanPrash Aug 8, 2024
f3f3a1d
Error checking added to get_value<T>()
KrishnanPrash Aug 8, 2024
d538f13
Working test suite and Error Handling
KrishnanPrash Aug 12, 2024
89a59ca
Moved CMake instructions
KrishnanPrash Aug 12, 2024
a76cc19
Removed unused import
KrishnanPrash Aug 12, 2024
42d34ec
Working logging
KrishnanPrash Aug 13, 2024
1971215
Update src/common.h
KrishnanPrash Aug 13, 2024
70b80a3
Consistent returns
KrishnanPrash Aug 13, 2024
21bb999
streamlined client
KrishnanPrash Aug 13, 2024
fc9a7f4
Adding const
KrishnanPrash Aug 13, 2024
c14febd
Broken CMake
KrishnanPrash Aug 15, 2024
c25162b
Merge branch 'kprashanth-python-deployment' of https://github.com/tri…
KrishnanPrash Aug 15, 2024
a704589
Working CMake
KrishnanPrash Aug 16, 2024
45089b3
Working variant
KrishnanPrash Aug 16, 2024
47e207f
updates to CMake and core bindings
KrishnanPrash Aug 16, 2024
25cf153
Merge remote-tracking branch 'origin/main' into kprashanth-python-dep…
KrishnanPrash Aug 16, 2024
0c8d5d4
Cleaning up Repo
KrishnanPrash Aug 18, 2024
1a830d2
moved test cases
KrishnanPrash Aug 18, 2024
c90255f
change log verbose
KrishnanPrash Aug 19, 2024
fc273eb
revisions
KrishnanPrash Aug 19, 2024
52c6b61
Updated `README.md`
KrishnanPrash Aug 19, 2024
76e199a
Conditional inclusion of tracer.cc
KrishnanPrash Aug 20, 2024
2c57e71
Removed redundancy to CMakeLists.txt
KrishnanPrash Aug 20, 2024
3fac830
Fixing spelling mistake
KrishnanPrash Aug 20, 2024
ca8f717
Merge branch 'kprashanth-python-deployment' of https://github.com/tri…
KrishnanPrash Aug 20, 2024
c02ee75
Reverting unnecessary changes
KrishnanPrash Aug 20, 2024
35b606f
Completed test suite
KrishnanPrash Aug 20, 2024
448e6b4
Add check for HTTP and gRPC endpoint flags
KrishnanPrash Aug 21, 2024
0f765e8
Documentation and comments
KrishnanPrash Aug 21, 2024
d7b5c6b
Update src/python/README.md
KrishnanPrash Aug 21, 2024
83c4ee8
removing unecessary comments
KrishnanPrash Aug 21, 2024
a16d23e
Adding build instructions
KrishnanPrash Aug 21, 2024
8fb3cae
Using CMake generator expressions and adding LICENSE.txt to python whl
KrishnanPrash Aug 21, 2024
cff480d
Making examples and README.md more user-friendly
KrishnanPrash Aug 21, 2024
5247dbc
Removing unused import
KrishnanPrash Aug 21, 2024
10dafdf
tritonfrontend stubs generated
KrishnanPrash Aug 21, 2024
a4c5bd3
Removing comment
KrishnanPrash Aug 21, 2024
5a83f7c
Update CMakeLists.txt
KrishnanPrash Aug 21, 2024
6bf3c10
Update qa/L0_python_api/test_kserve_frontend.py
KrishnanPrash Aug 21, 2024
d762af1
Update src/common.h
KrishnanPrash Aug 21, 2024
4c74272
updates
KrishnanPrash Aug 25, 2024
8b33c1c
Update Dockerfile.QA
KrishnanPrash Aug 26, 2024
1d07314
fixing
KrishnanPrash Aug 26, 2024
9e540ec
updated CMake messaging
KrishnanPrash Aug 26, 2024
609c151
updated testing
KrishnanPrash Aug 27, 2024
a8ab882
Revamped error handling and test suite
KrishnanPrash Aug 28, 2024
3025f1a
cleaning up error handling
KrishnanPrash Aug 28, 2024
0cedd48
Minor changes
KrishnanPrash Aug 28, 2024
c904300
Adding static decorator
KrishnanPrash Aug 28, 2024
a7a7f58
Attempting to fix faulty pip install for tritonfrontend wheel
KrishnanPrash Aug 28, 2024
ef3197e
Reverting to previous pip install command
KrishnanPrash Aug 28, 2024
0ead5fc
Adding comments
KrishnanPrash Aug 28, 2024
f92e270
comments and copyright
KrishnanPrash Aug 28, 2024
5c7cd0d
cleaning up
KrishnanPrash Aug 28, 2024
86ba0fd
updating filename in bash script
KrishnanPrash Aug 28, 2024
57d441e
updating filename in bash script
KrishnanPrash Aug 28, 2024
79e67db
refactoring tests and moving docs
KrishnanPrash Aug 28, 2024
74e8193
Making function name consistent
KrishnanPrash Aug 28, 2024
94062b1
Making parameter names more descriptive
KrishnanPrash Aug 28, 2024
4625935
Removing redundant code
KrishnanPrash Aug 28, 2024
e87930f
Removing unnecessary use of
KrishnanPrash Aug 28, 2024
4ca394b
Merge branch 'kprashanth-python-deployment' of https://github.com/tri…
KrishnanPrash Aug 28, 2024
69bd5d3
updates to Readme.md
KrishnanPrash Aug 28, 2024
0d34f60
Merge branch 'main' into kprashanth-python-deployment
KrishnanPrash Aug 29, 2024
d95527a
update parameter names
KrishnanPrash Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Let us walk through a simple example:
import tritonserver

# Constructing path to Model Repository
model_path = f"/server/src/python/examples/example_model_repository"
model_path = f"server/src/python/examples/example_model_repository"

server_options = tritonserver.Options(
server_id="ExampleServer",
Expand Down Expand Up @@ -39,7 +39,7 @@ grpc_service.start()
```python
import tritonclient.http as httpclient
import numpy as np # Use version numpy < 2
model_name = "identity"
model_name = "identity" # output == input
url = "localhost:8000"

# Create a Triton client
Expand All @@ -61,7 +61,6 @@ output_data = results.as_numpy("OUTPUT0")

# Print results
print("[INFERENCE RESULTS]")
print("Input data:", input_data)
print("Output data:", output_data)

# Stop respective services and server.
Expand Down Expand Up @@ -96,20 +95,19 @@ with KServeHttp.Server(server) as http_service:
output_data = results.as_numpy("OUTPUT0")
# Print results
print("[INFERENCE RESULTS]")
print("Input data:", input_data)
print("Output data:", output_data)

server.stop()
```
With this workflow, you can avoid having to stop each service after client requests has terminated.
With this workflow, you can avoid having to stop each service after client requests have terminated.


## Known Issues
- The following features are not currently supported when launching the Triton frontend services through the python bindings:
- [Tracing](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md)
- [Shared Memory](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_shared_memory.md)
- [Metrics](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md)
- [Restricted Protocols](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta)
- VertexAI
- Sagemaker
- [Restricted Protocols](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta)
- After a running server has been stopped, if the client sends an inference request, it will result in a Segmentation Fault.
- After a running server has been stopped, if the client sends an inference request, a Segmentation Fault will occur.
126 changes: 72 additions & 54 deletions qa/L0_python_api/test_kserve.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,15 @@
import tritonclient.grpc as grpcclient
import tritonclient.http as httpclient
import tritonserver
from testing_utils import TestingUtils
from testing_utils import (
send_and_test_inference_identity,
setup_client,
setup_server,
setup_service,
teardown_client,
teardown_server,
teardown_service,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI that I am not in favor of this pattern to import name out of their namespace, it's hard to distinguish the origin of the functions and can cause shadowing.

)
from tritonclient.utils import InferenceServerException
from tritonfrontend import KServeGrpc, KServeHttp

Expand Down Expand Up @@ -59,31 +67,33 @@ def test_wrong_grpc_parameters(self):
class TestKServe:
@pytest.mark.parametrize("frontend, client_type, url", [HTTP_ARGS, GRPC_ARGS])
def test_server_ready(self, frontend, client_type, url):
server = TestingUtils.setup_server()
service = TestingUtils.setup_service(server, frontend)
client = TestingUtils.setup_client(client_type, url=url)
server = setup_server()
service = setup_service(server, frontend)
client = setup_client(client_type, url=url)

assert client.is_server_ready()

TestingUtils.teardown_client(client)
TestingUtils.teardown_service(service)
TestingUtils.teardown_server(server)
teardown_client(client)
teardown_service(service)
teardown_server(server)

@pytest.mark.parametrize("frontend", [HTTP_ARGS[0], GRPC_ARGS[0]])
def test_already_exists_error(self, frontend):
server = TestingUtils.setup_server()
def test_service_double_start(self, frontend):
server = setup_server()
# setup_service() performs service.start()
service = TestingUtils.setup_service(server, frontend)
service = setup_service(server, frontend)

with pytest.raises(tritonserver.AlreadyExistsError):
with pytest.raises(
tritonserver.AlreadyExistsError, match="server is already running."
):
service.start()

TestingUtils.teardown_server(server)
TestingUtils.teardown_service(service)
teardown_server(server)
teardown_service(service)

@pytest.mark.parametrize("frontend", [HTTP_ARGS[0], GRPC_ARGS[0]])
def test_invalid_options(self, frontend):
server = TestingUtils.setup_server()
server = setup_server()
# Current flow is KServeHttp.Options or KServeGrpc.Options have to be
# provided to ensure type and range validation occurs.
with pytest.raises(
Expand All @@ -92,81 +102,86 @@ def test_invalid_options(self, frontend):
):
frontend.Server(server, {"port": 8001})

TestingUtils.teardown_server(server)
teardown_server(server)

@pytest.mark.parametrize("frontend", [HTTP_ARGS[0], GRPC_ARGS[0]])
def test_server_service_order(self, frontend):
server = TestingUtils.setup_server()
service = TestingUtils.setup_service(server, frontend)
server = setup_server()
service = setup_service(server, frontend)

TestingUtils.teardown_server(server)
TestingUtils.teardown_service(service)
teardown_server(server)
teardown_service(service)

@pytest.mark.parametrize("frontend, client_type", [HTTP_ARGS[:2], GRPC_ARGS[:2]])
def test_service_custom_port(self, frontend, client_type):
server = TestingUtils.setup_server()
server = setup_server()
options = frontend.Options(port=8005)
service = TestingUtils.setup_service(server, frontend, options)
client = TestingUtils.setup_client(client_type, url="localhost:8005")
service = setup_service(server, frontend, options)
client = setup_client(client_type, url="localhost:8005")

# Confirms that service starts at port 8005
client.is_server_ready()

TestingUtils.teardown_client(client)
TestingUtils.teardown_service(service)
TestingUtils.teardown_server(server)
teardown_client(client)
teardown_service(service)
teardown_server(server)

@pytest.mark.parametrize("frontend, client_type, url", [HTTP_ARGS, GRPC_ARGS])
def test_inference(self, frontend, client_type, url):
server = TestingUtils.setup_server()
service = TestingUtils.setup_service(server, frontend)
server = setup_server()
service = setup_service(server, frontend)

# TODO: use common/test_infer
assert TestingUtils.send_and_test_inference_identity(client_type, url=url)
assert send_and_test_inference_identity(client_type, url=url)

TestingUtils.teardown_service(service)
TestingUtils.teardown_server(server)
teardown_service(service)
teardown_server(server)

@pytest.mark.parametrize("frontend, client_type, url", [HTTP_ARGS])
def test_http_req_during_shutdown(self, frontend, client_type, url):
server = TestingUtils.setup_server()
http_service = TestingUtils.setup_service(server, frontend)
http_client = client_type.InferenceServerClient(url=url)
server = setup_server()
http_service = setup_service(server, frontend)
http_client = httpclient.InferenceServerClient(url="localhost:8000")
model_name = "delayed_identity"
delay = 2 # seconds
input_data0 = np.array([[delay]], dtype=np.float32)

input0 = client_type.InferInput("INPUT0", input_data0.shape, "FP32")
input0 = httpclient.InferInput("INPUT0", input_data0.shape, "FP32")
input0.set_data_from_numpy(input_data0)

inputs = [input0]
outputs = [client_type.InferRequestedOutput("OUTPUT0")]
outputs = [httpclient.InferRequestedOutput("OUTPUT0")]

async_request = http_client.async_infer(
model_name=model_name, inputs=inputs, outputs=outputs
)
# http_service.stop() does not use graceful shutdown
TestingUtils.teardown_service(http_service)
teardown_service(http_service)

# So, inference request will fail as http endpoints have been stopped.
with pytest.raises(InferenceServerException):
async_request.get_result(block=True, timeout=2)
with pytest.raises(
InferenceServerException, match="failed to obtain inference response"
):
async_request.get_result(block=True, timeout=delay)

# http_client.close() calls join() to terminate pool of greenlets
# However, due to an unsuccessful get_result(), async_request is still
# an active thread. Hence, join stalls until greenlet timeouts.
# Does not throw an exception, but displays error in logs.
TestingUtils.teardown_client(http_client)
teardown_client(http_client)

# delayed_identity will still be an active model
# Hence, server.stop() causes InternalError: Timeout.
with pytest.raises(tritonserver.InternalError):
TestingUtils.teardown_server(server)
with pytest.raises(
tritonserver.InternalError,
match="Exit timeout expired. Exiting immediately.",
):
teardown_server(server)

@pytest.mark.parametrize("frontend, client_type, url", [GRPC_ARGS])
def test_grpc_req_during_shutdown(self, frontend, client_type, url):
server = TestingUtils.setup_server()
grpc_service = TestingUtils.setup_service(server, frontend)
server = setup_server()
grpc_service = setup_service(server, frontend)
grpc_client = grpcclient.InferenceServerClient(url=url)
user_data = []

Expand All @@ -193,31 +208,34 @@ def callback(user_data, result, error):
callback=partial(callback, user_data),
)

TestingUtils.teardown_service(grpc_service)
teardown_service(grpc_service)

time_out = delay + 1
while (len(user_data) == 0) and time_out > 0:
time_out = time_out - 1
time.sleep(1)

assert len(user_data) == 1 and isinstance(
user_data[0], InferenceServerException
assert (
len(user_data) == 1
and isinstance(user_data[0], InferenceServerException)
and "[StatusCode.UNAVAILABLE] failed to connect to all addresses"
in str(user_data[0])
)

TestingUtils.teardown_client(grpc_client)
TestingUtils.teardown_server(server)
teardown_client(grpc_client)
teardown_server(server)

# KNOWN ISSUE: CAUSES SEGFAULT
# Created [DLIS-7231] to address at future date
# Once the server has been stopped, the underlying TRITONSERVER_Server instance
# is deleted. However, the frontend does not know the server instance
# is no longer valid.
# def test_inference_after_server_stop(self):
# server = TestingUtils.setup_server()
# http_service = TestingUtils.setup_service(server, KServeHttp)
# http_client = TestingUtils.setup_client(httpclient, url="localhost:8000")
# server = setup_server()
# http_service = setup_service(server, KServeHttp)
# http_client = setup_client(httpclient, url="localhost:8000")

# TestingUtils.teardown_server(server) # Server has been stopped
# teardown_server(server) # Server has been stopped

# model_name = "identity"
# input_data = np.array([["testing"]], dtype=object)
Expand All @@ -230,5 +248,5 @@ def callback(user_data, result, error):

# results = http_client.infer(model_name, inputs=inputs, outputs=outputs)

# TestingUtils.teardown_client(http_client)
# TestingUtils.teardown_service(http_service)
# teardown_client(http_client)
# teardown_service(http_service)
Loading
Loading