Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for request rescheduling #319

Merged
merged 7 commits into from
Nov 9, 2023
Merged

Conversation

krishung5
Copy link
Contributor

@krishung5 krishung5 commented Nov 2, 2023

src/infer_request.h Outdated Show resolved Hide resolved
src/pb_stub.cc Show resolved Hide resolved
@krishung5 krishung5 force-pushed the krish-request-reschedule branch from aa2b6c4 to 5c007f8 Compare November 7, 2023 02:27
@krishung5 krishung5 requested review from nnshah1 and GuanLuo November 7, 2023 02:37
README.md Show resolved Hide resolved
src/pb_stub.cc Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
src/python_be.h Show resolved Hide resolved
@krishung5 krishung5 requested a review from Tabrizian November 8, 2023 20:40
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me. I was wondering do you think there could be a memory leak in non-decoupled case with these changes? For non-decoupled we create N responses in the beginning:

auto err = TRITONBACKEND_ResponseNew(&response, requests[i]);

How is the None responses going to be treated by server? I didn't see any flags adjustments for non-decoupled mode for responses.

@krishung5
Copy link
Contributor Author

krishung5 commented Nov 8, 2023

The code changes look good to me. I was wondering do you think there could be a memory leak in non-decoupled case with these changes? For non-decoupled we create N responses in the beginning:

auto err = TRITONBACKEND_ResponseNew(&response, requests[i]);

How is the None responses going to be treated by server? I didn't see any flags adjustments for non-decoupled mode for responses.

The backend will clean up the response if the associated request is rescheduled. Please see here:

if (pb_infer_requests[r]->ReleaseFlags() ==
TRITONSERVER_REQUEST_RELEASE_RESCHEDULE) {
// For rescheduled requests, we do not need to send a response.
LOG_IF_ERROR(
TRITONBACKEND_ResponseDelete((*responses)[r]),
"failed to delete response");
(*responses)[r] = nullptr;
continue;
}

@krishung5 krishung5 merged commit 889585c into main Nov 9, 2023
@krishung5 krishung5 deleted the krish-request-reschedule branch November 9, 2023 19:58
krishung5 added a commit that referenced this pull request Nov 9, 2023
* Add support for request rescheduling

* Address comment

* Add documentation

* Fix up for doc

* Revert response sender changes

* Address comment
krishung5 added a commit that referenced this pull request Nov 9, 2023
* Add support for request rescheduling

* Address comment

* Add documentation

* Fix up for doc

* Revert response sender changes

* Address comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants