feat: Python Deployment of Triton Inference Server #7501

KrishnanPrash · 2024-08-05T19:23:44Z

What does the PR do?

This PR is for the tritonfrontend python package, containing bindings to the HTTPAPIServer and grpc::Server classes in the frontend. This allows Triton users to start their respective frontends from within python.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

server/src/python/tritonfrontend/_c/tritonfrontend_pybind.cc

Test plan:

Working on adding tests, which can be run through pytest, and added as a single command to the existing L0_python_api bash script.

CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

src/python/tritonfrontend/_c/tritonfrontend_pybind.cc

src/CMakeLists.txt

docs/tritonfrontend/README.md

qa/L0_python_api/test.sh

qa/L0_python_api/test_model_repository/identity/1/model.py

src/python/CMakeLists.txt

src/python/tritonfrontend/_c/tritonfrontend.h

rmccorm4

With follow-up tickets in mind, this LGTM other than a few minor comments remaining.

docs/tritonfrontend/README.md

qa/L0_python_api/testing_utils.py

qa/L0_python_api/test_kserve.py

GuanLuo · 2024-08-28T20:21:47Z

qa/L0_python_api/test_kserve.py

+        # delayed_identity will still be an active model
+        # Hence, server.stop() causes InternalError: Timeout.
+        with pytest.raises(tritonserver.InternalError):
+            TestingUtils.teardown_server(server)


Is this true? Why shutting down frontend causes model still marked as "in used"? CC @Tabrizian @kthui

Is the delay more than shutdown timeout seconds?

Delay was set to 2 seconds. Shutdown timeout is set to 30 seconds by default.

Is it reproducible outside In-Process Python API?

Is this true? Why shutting down frontend causes model still marked as "in used"? CC @Tabrizian @kthui

Maybe some handshake like calling InferenceRequestDelete or a callback (request release, response complete, etc) done in the frontend would be missed if frontend is shutdown before responses are complete, and cause core to think the request is still active (and model in use, held by request)?

src/grpc/grpc_server.cc

src/grpc/infer_handler.cc

src/python/README.md

src/common.h

Co-authored-by: GuanLuo <[email protected]>

…ton-inference-server/server into kprashanth-python-deployment

GuanLuo · 2024-08-29T18:38:09Z

qa/L0_python_api/test_kserve.py

+    setup_service,
+    teardown_client,
+    teardown_server,
+    teardown_service,


Just FYI that I am not in favor of this pattern to import name out of their namespace, it's hard to distinguish the origin of the functions and can cause shadowing.

nnshah1 · 2024-08-30T23:27:43Z

[celebrate] Neelay Shah reacted to your message:

…

________________________________ From: KrishnanPrash ***@***.***> Sent: Friday, August 30, 2024 11:23:25 PM To: triton-inference-server/server ***@***.***> Cc: Neelay Shah ***@***.***>; Mention ***@***.***> Subject: Re: [triton-inference-server/server] feat: Python Deployment of Triton Inference Server (PR #7501) Merged #7501<#7501> into main. — Reply to this email directly, view it on GitHub<#7501 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADUMTQU6VRA44E5Z2IBHHPTZUD5G3AVCNFSM6AAAAABMA4M5WKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGA4DSNBYGU4DCMQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

dongs0104 · 2024-09-05T14:24:55Z

hello @KrishnanPrash i want to use this feature but i don't know How to set up TP, PP which is MPI world size

KrishnanPrash added 15 commits July 3, 2024 08:49

Basic Interface and Bindings

c9de538

testing

a5bb38f

Adding stuff

2215e58

Working MVP for wheel

c450c0d

Remove _deps

aab5e82

Working MVP (No linking issues)

6c2d9b2

Working server HTTP bindings

819262f

Not working pybind->dict stuff

7a141b7

Get logic not working

62ce8db

Update

d388614

GRPC basic working

47bde5b

Working http and grpc frontends

ff2c919

Removing unecessary changes

a7d17bd

Adding testing and cleaning up code

06147cd

Adding/Updating copyright

d35dce4

KrishnanPrash added the PR: feat A new feature label Aug 5, 2024

KrishnanPrash requested a review from GuanLuo August 5, 2024 19:23

removing extra space

fc40f50

github-advanced-security bot found potential problems Aug 5, 2024

View reviewed changes

KrishnanPrash added 3 commits August 5, 2024 12:28

spacing

ca29a19

Adding back CMake compile_feature

44c81b7

modify cmake

745ff6a

KrishnanPrash requested a review from rmccorm4 August 5, 2024 19:46

KrishnanPrash marked this pull request as draft August 5, 2024 19:47

KrishnanPrash added 3 commits August 5, 2024 12:49

Removing print statements

bf6f871

Extra spacing

b30ced3

Formatting

30067a8

rmccorm4 reviewed Aug 5, 2024

View reviewed changes

src/python/tritonfrontend/_c/tritonfrontend_pybind.cc Outdated Show resolved Hide resolved

Running pre-commit

33afc53

rmccorm4 reviewed Aug 5, 2024

View reviewed changes

src/CMakeLists.txt Outdated Show resolved Hide resolved