Allow non-decoupled model to send response and FINAL flag separately #6017

GuanLuo · 2023-07-04T00:39:40Z

Follow up on triton-inference-server/core#229
For custom backend, one may call send response in the following style, even for "non-decoupled" model

TRITONBACKEND_ResponseSend(response, 0, nullptr /* success */);
TRITONBACKEND_ResponseFactorySendFlags(factory, TRITONSERVER_RESPONSE_COMPLETE_FINAL);

This yields single response of the request from the client's perspective and would expect a successful inference through HTTP / non-streaming GRPC as it is "non-decoupled". The previous implementation is restricted that it simply assume decoupled use case when response complete callback is invoked multiple time. The change is to relax the restriction and collapse the above into single response to the client.

src/http_server.h

Tabrizian · 2023-07-04T22:59:17Z

src/http_server.cc

+
+  // Defer sending the response until FINAL flag is seen or
+  // there is error
+  if ((err == nullptr) && (flags & TRITONSERVER_RESPONSE_COMPLETE_FINAL) == 0) {


what would happen if there is an error in when calling TRITONBACKEND_ResponseSend()? Would the error be returned to the client and there would be no information sent to the client on TRITONBACKEND_ResponseFactorySendFlags?

I made the change to not send error until FINAL flag is seen, otherwise there can be issue on the backend trying to call the response send function again after the userp has been released. Currently the only assumption is that the backend will not call the function again once final flag is sent.

rmccorm4 · 2024-02-09T20:48:21Z

@GuanLuo do we still plan to implement this to help unify decoupled/non-decoupled logic in backends like RIVA/TRTLLM etc?

src/http_server.cc

oandreeva-nv · 2024-03-06T20:58:37Z

src/grpc/infer_handler.cc

  }

-  state->context_->EraseInflightState(state);
-
 #ifdef TRITON_ENABLE_TRACING


I have a feeling that INFER_RESPONSE_COMPLETE multiple capture issue will be happening here, since deferring the response send is happening later.

… endpoint

src/grpc/infer_handler.cc

…with the meaning.

oandreeva-nv

TMLG

rmccorm4

I think it would be good to have some kind of test that asserts the traces come out correctly to avoid future breakage, but won't block the PR on it

oandreeva-nv · 2024-03-08T19:18:29Z

@rmccorm4 , I've created a ticket for this: DLIS-6308

GuanLuo requested review from Tabrizian, tanmayv25 and oandreeva-nv July 4, 2023 00:39

Tabrizian reviewed Jul 5, 2023

View reviewed changes

GuanLuo force-pushed the gluo-response branch 2 times, most recently from bcfa971 to 5f21ff1 Compare August 7, 2023 19:28

rmccorm4 added the investigating The developement team is investigating this issue label Feb 9, 2024

GuanLuo force-pushed the gluo-response branch 2 times, most recently from 2f787bd to d9831d9 Compare February 21, 2024 00:21

rmccorm4 reviewed Feb 22, 2024

View reviewed changes

src/http_server.cc Show resolved Hide resolved

GuanLuo force-pushed the gluo-response branch from d9831d9 to 08b9b66 Compare March 6, 2024 02:03

GuanLuo requested review from rmccorm4 and Tabrizian March 6, 2024 02:04

oandreeva-nv reviewed Mar 6, 2024

View reviewed changes

GuanLuo added 5 commits March 7, 2024 14:48

Allow non-decoupled model to send response and FINAL flag separately

6f5e78b

Update copyright

f83094d

Defer sending error until FINAL flag is seen to avoid invalid reference

ec8d12a

Move timestamp capture location

1151a4d

Delay time-point of response complete timestamp in GPRC and SageMaker…

d905aeb

… endpoint

GuanLuo force-pushed the gluo-response branch from 08b9b66 to d905aeb Compare March 7, 2024 22:49

GuanLuo requested a review from oandreeva-nv March 7, 2024 22:49

oandreeva-nv reviewed Mar 7, 2024

View reviewed changes

src/grpc/infer_handler.cc Outdated Show resolved Hide resolved

Move location of RESPONSE_COMPLETE timestamp capture to better align …

bb7efdb

…with the meaning.

oandreeva-nv approved these changes Mar 8, 2024

View reviewed changes

rmccorm4 approved these changes Mar 8, 2024

View reviewed changes

GuanLuo merged commit 0a8dbaf into main Mar 8, 2024
3 checks passed

GuanLuo deleted the gluo-response branch March 8, 2024 20:31

indrajit96 mentioned this pull request Apr 19, 2024

Enhance OTEL testing to capture and verify Cancellation Requests and Non-Decoupled model inference. #7132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow non-decoupled model to send response and FINAL flag separately #6017

Allow non-decoupled model to send response and FINAL flag separately #6017

GuanLuo commented Jul 4, 2023

Tabrizian Jul 4, 2023

GuanLuo Feb 21, 2024

rmccorm4 commented Feb 9, 2024

oandreeva-nv Mar 6, 2024

GuanLuo Mar 7, 2024

oandreeva-nv left a comment

rmccorm4 left a comment

oandreeva-nv commented Mar 8, 2024

Allow non-decoupled model to send response and FINAL flag separately #6017

Allow non-decoupled model to send response and FINAL flag separately #6017

Conversation

GuanLuo commented Jul 4, 2023

Tabrizian Jul 4, 2023

Choose a reason for hiding this comment

GuanLuo Feb 21, 2024

Choose a reason for hiding this comment

rmccorm4 commented Feb 9, 2024

oandreeva-nv Mar 6, 2024

Choose a reason for hiding this comment

GuanLuo Mar 7, 2024

Choose a reason for hiding this comment

oandreeva-nv left a comment

Choose a reason for hiding this comment

rmccorm4 left a comment

Choose a reason for hiding this comment

oandreeva-nv commented Mar 8, 2024