-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: TorchServe support #250
Conversation
Motivation The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts. Modifications The bulk of the required changes are to the adapter image, covered by PR kserve/modelmesh-runtime-adapter#34. This PR contains the minimal controller changes needed to enable the support: - TorchServe ServingRuntime spec - Add "torchserve" to the list of supported built-in runtime types - Add "ID extraction" entry for TorchServe's gRPC Predictions RPC so that model-mesh will automatically extract the model name from corresponding request messages Note the supported model format is advertised as "pytorch-mar" to distinguish from the existing "pytorch" format that refers to raw TorchScript .pt files as supported by Triton. Result TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode. Resolves #63 Signed-off-by: Nick Hill <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: njhill The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR should be merged after kserve/modelmesh-runtime-adapter#34. |
One follow-on task here would be to add a torchserve-based test to the FVTs. |
Signed-off-by: Nick Hill <[email protected]>
65ba358
to
a89172a
Compare
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Hi, thank you so much for your work on pushing TorchServe to ModelMesh. I am really interested in this feature :) Do you have any plan to continue working on this? Though it seems that there isn't much work left to do since kserve/modelmesh-runtime-adapter#34 is merged. If you need any help I would like to do some work too. |
Thanks @kbumsik. This should now be finished and fully functional. I have tested it manually. The reason for the delay merging the PR is that I wanted to also include an extension to our functional verification tests to exercise the torchserve integration, but am pretty busy with other things so not sure how soon I'll get a chance. But I'm ok with getting this merged and labeling the torchserve support as "beta" the meantime. I'll aim to get that done in the next couple of days. |
Thank you for your quick and kind response. I will start testing by my own then 👍 |
I've opened a new issue to cover the FVT additions: #280 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested it locally and looks good @njhill!
/lgtm |
FVI @kbumsik this has now been merged. |
#### Motivation Support for TorchServe was added in #250 and kserve/modelmesh-runtime-adapter#34. A test should be added for it as well. #### Modifications - Adds basic FVT for load/inference with a TorchServe MAR model using the native TorchServe gRPC API - Disables OVMS runtime and tests to allow TorchServe to be tested due to resource constraints #### Result Closes #280 Signed-off-by: Rafael Vasquez <[email protected]>
Motivation
The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts.
Modifications
The bulk of the required changes are to the adapter image, covered by PR kserve/modelmesh-runtime-adapter#34.
This PR contains the minimal controller changes needed to enable the support:
Note the supported model format is advertised as "pytorch-mar" to distinguish from the existing "pytorch" format that refers to raw TorchScript .pt files as supported by Triton.
Result
TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode.
Resolves #63