-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the pending_request_count to release the failed requests #286
Conversation
src/infer_request.cc
Outdated
auto status = request->model_raw_->Enqueue(request); | ||
if (!status.IsOk()) { | ||
LOG_STATUS_ERROR( | ||
request->SetState(InferenceRequest::State::RELEASED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the request release callback be called internally on failure? Calling request release should set the state and update accordingly I see in our contract that we don't take ownership of the request if InferAsync -> Run -> Enqueue fails, so it's expected that the user would release the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned offline, I don't love the idea of setting the state to RELEASED
when the request hasn't actually had it's release callbacks called. The state should represent that it has actually been released. Is there another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the request should be set to pending state iff the Enqueue call was a success. However, there is an asynchronous interaction between the enqueue thread and and the backend threads that schedules the requests and moves it out from the queue.
We should not invoke callbacks for the failed requests as the failed requests are owned by the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new state for tracking the failed requests. Discussed offline that the logic might need some clean-up.
Also, existing tests can be found here: https://github.com/triton-inference-server/server/blob/main/qa/L0_metrics/metrics_queue_size_test.py Feel free to rename test script from queue size to pending request count or something if it makes it easier to find |
15397d8
to
7b21c19
Compare
Before
After sending a 136 requests that will fail in Enqueue:
After fix
After sending a 136 requests that will fail in Enqueue:
I couldn't find any tests for nv_inference_pending_request_count metrics. Need to add testing of these features to avoid regression. Will open another ticket to enhance the testing.