-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gRPC streaming non-decoupled segfault if sending response and final flag separately #7265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we target main
and only include the related commits for this change? It looks like it is independent of other Python be changes.
03130e0
to
8a10ba4
Compare
All non-related commits are removed and the PR is now targeting the main branch. |
c08292b
to
8799aec
Compare
8799aec
to
addab53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks for elaborations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -79,7 +80,7 @@ def _prepare_inputs_and_outputs(self, kind): | |||
self.outputs_.append(grpcclient.InferRequestedOutput("OUT")) | |||
self.outputs_.append(grpcclient.InferRequestedOutput("IDX")) | |||
self.requested_outputs_ = self.outputs_ | |||
elif kind == "simple" or kind == "streaming": | |||
elif kind in ("simple", "streaming"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what is streaming
compared to decoupled_streaming
and non_decoupled_streaming
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoupled_streaming
and non_decoupled_streaming
are for the repeat_int32
and repeat_int32_non_decoupled
models on repeat backend. The simple
and streaming
are for the custom_zero_1_float32
model on identity backend.
I think we could do some refactoring on how the inputs are prepared, i.e. for the repeat backend, we can have a function that returns a inputs
that can be used directly on the client.async_stream_infer()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one question: https://github.com/triton-inference-server/server/pull/7265/files#r1628592664. Doesn't need to be fixed/addressed here, just curious.
When using gRPC streaming for non-decoupled models, the frontend does not properly handle the case which the response and final flag are sent separately. The gRPC server will release the state as soon as the response is received, resulting in a possible segfault when the final flag is later received.
The fix is to defer sending the response if a final flag is not coupled with the response, and leave the gRPC step in WRITEREADY. Then, when the final flag is received, it continue sending the response and move to step WRITTEN.
This PR should merge before #7311