Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-decoupled model to send response and FINAL flag separately #6017

Merged
merged 6 commits into from
Mar 8, 2024

Conversation

GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Jul 4, 2023

Follow up on triton-inference-server/core#229
For custom backend, one may call send response in the following style, even for "non-decoupled" model

TRITONBACKEND_ResponseSend(response, 0, nullptr /* success */);
TRITONBACKEND_ResponseFactorySendFlags(factory, TRITONSERVER_RESPONSE_COMPLETE_FINAL);

This yields single response of the request from the client's perspective and would expect a successful inference through HTTP / non-streaming GRPC as it is "non-decoupled". The previous implementation is restricted that it simply assume decoupled use case when response complete callback is invoked multiple time. The change is to relax the restriction and collapse the above into single response to the client.

src/http_server.h Show resolved Hide resolved

// Defer sending the response until FINAL flag is seen or
// there is error
if ((err == nullptr) && (flags & TRITONSERVER_RESPONSE_COMPLETE_FINAL) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen if there is an error in when calling TRITONBACKEND_ResponseSend()? Would the error be returned to the client and there would be no information sent to the client on TRITONBACKEND_ResponseFactorySendFlags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to not send error until FINAL flag is seen, otherwise there can be issue on the backend trying to call the response send function again after the userp has been released. Currently the only assumption is that the backend will not call the function again once final flag is sent.

@GuanLuo GuanLuo force-pushed the gluo-response branch 2 times, most recently from bcfa971 to 5f21ff1 Compare August 7, 2023 19:28
@rmccorm4
Copy link
Contributor

rmccorm4 commented Feb 9, 2024

@GuanLuo do we still plan to implement this to help unify decoupled/non-decoupled logic in backends like RIVA/TRTLLM etc?

@rmccorm4 rmccorm4 added the investigating The developement team is investigating this issue label Feb 9, 2024
@GuanLuo GuanLuo force-pushed the gluo-response branch 2 times, most recently from 2f787bd to d9831d9 Compare February 21, 2024 00:21
}

state->context_->EraseInflightState(state);

#ifdef TRITON_ENABLE_TRACING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that INFER_RESPONSE_COMPLETE multiple capture issue will be happening here, since deferring the response send is happening later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delayed

Copy link
Contributor

@oandreeva-nv oandreeva-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TMLG

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have some kind of test that asserts the traces come out correctly to avoid future breakage, but won't block the PR on it

@oandreeva-nv
Copy link
Contributor

@rmccorm4 , I've created a ticket for this: DLIS-6308

@GuanLuo GuanLuo merged commit 0a8dbaf into main Mar 8, 2024
3 checks passed
@GuanLuo GuanLuo deleted the gluo-response branch March 8, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating The developement team is investigating this issue
Development

Successfully merging this pull request may close these issues.

4 participants