-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Python Deployment of Triton Inference Server #7501
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With follow-up tickets in mind, this LGTM other than a few minor comments remaining.
qa/L0_python_api/test_kserve.py
Outdated
# delayed_identity will still be an active model | ||
# Hence, server.stop() causes InternalError: Timeout. | ||
with pytest.raises(tritonserver.InternalError): | ||
TestingUtils.teardown_server(server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Why shutting down frontend causes model still marked as "in used"? CC @Tabrizian @kthui
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the delay more than shutdown timeout seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delay was set to 2 seconds. Shutdown timeout is set to 30 seconds by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reproducible outside In-Process Python API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Why shutting down frontend causes model still marked as "in used"? CC @Tabrizian @kthui
Maybe some handshake like calling InferenceRequestDelete
or a callback (request release, response complete, etc) done in the frontend would be missed if frontend is shutdown before responses are complete, and cause core to think the request is still active (and model in use, held by request)?
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: GuanLuo <[email protected]>
…ton-inference-server/server into kprashanth-python-deployment
setup_service, | ||
teardown_client, | ||
teardown_server, | ||
teardown_service, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI that I am not in favor of this pattern to import name out of their namespace, it's hard to distinguish the origin of the functions and can cause shadowing.
[celebrate] Neelay Shah reacted to your message:
…________________________________
From: KrishnanPrash ***@***.***>
Sent: Friday, August 30, 2024 11:23:25 PM
To: triton-inference-server/server ***@***.***>
Cc: Neelay Shah ***@***.***>; Mention ***@***.***>
Subject: Re: [triton-inference-server/server] feat: Python Deployment of Triton Inference Server (PR #7501)
Merged #7501<#7501> into main.
—
Reply to this email directly, view it on GitHub<#7501 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADUMTQU6VRA44E5Z2IBHHPTZUD5G3AVCNFSM6AAAAABMA4M5WKVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGA4DSNBYGU4DCMQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
hello @KrishnanPrash i want to use this feature but i don't know How to set up TP, PP which is MPI world size |
What does the PR do?
This PR is for the
tritonfrontend
python package, containing bindings to theHTTPAPIServer
andgrpc::Server
classes in the frontend. This allows Triton users to start their respective frontends from within python.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
server/src/python/tritonfrontend/_c/tritonfrontend_pybind.cc
Test plan:
Working on adding tests, which can be run through pytest, and added as a single command to the existing
L0_python_api
bash script.Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)