Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the pending_request_count to release the failed requests #286

Merged
merged 3 commits into from
Nov 8, 2023

Conversation

tanmayv25
Copy link
Contributor

Before

After sending a 136 requests that will fail in Enqueue:

curl -s localhost:8002/metrics | grep pending
# HELP nv_inference_pending_request_count Instantaneous number of pending requests awaiting execution per-model.
# TYPE nv_inference_pending_request_count gauge
nv_inference_pending_request_count{model="test_model",version="1"} 136

After fix

After sending a 136 requests that will fail in Enqueue:

curl -s localhost:8002/metrics | grep pending
# HELP nv_inference_pending_request_count Instantaneous number of pending requests awaiting execution per-model.
# TYPE nv_inference_pending_request_count gauge
nv_inference_pending_request_count{model="test_model",version="1"} 0

I couldn't find any tests for nv_inference_pending_request_count metrics. Need to add testing of these features to avoid regression. Will open another ticket to enhance the testing.

auto status = request->model_raw_->Enqueue(request);
if (!status.IsOk()) {
LOG_STATUS_ERROR(
request->SetState(InferenceRequest::State::RELEASED),
Copy link
Contributor

@rmccorm4 rmccorm4 Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the request release callback be called internally on failure? Calling request release should set the state and update accordingly I see in our contract that we don't take ownership of the request if InferAsync -> Run -> Enqueue fails, so it's expected that the user would release the request.

Copy link
Contributor

@rmccorm4 rmccorm4 Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned offline, I don't love the idea of setting the state to RELEASED when the request hasn't actually had it's release callbacks called. The state should represent that it has actually been released. Is there another way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the request should be set to pending state iff the Enqueue call was a success. However, there is an asynchronous interaction between the enqueue thread and and the backend threads that schedules the requests and moves it out from the queue.
We should not invoke callbacks for the failed requests as the failed requests are owned by the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new state for tracking the failed requests. Discussed offline that the logic might need some clean-up.

@rmccorm4
Copy link
Contributor

rmccorm4 commented Nov 4, 2023

Also, existing tests can be found here: https://github.com/triton-inference-server/server/blob/main/qa/L0_metrics/metrics_queue_size_test.py

Feel free to rename test script from queue size to pending request count or something if it makes it easier to find

@tanmayv25 tanmayv25 merged commit 1dfb6de into main Nov 8, 2023
@tanmayv25 tanmayv25 deleted the tanmayv-metrics branch May 31, 2024 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants