-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for request rescheduling #319
Conversation
aa2b6c4
to
5c007f8
Compare
…hon_backend into krish-request-reschedule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good to me. I was wondering do you think there could be a memory leak in non-decoupled case with these changes? For non-decoupled we create N responses in the beginning:
python_backend/src/python_be.cc
Line 1380 in 60a9091
auto err = TRITONBACKEND_ResponseNew(&response, requests[i]); |
How is the None
responses going to be treated by server? I didn't see any flags adjustments for non-decoupled mode for responses.
The backend will clean up the response if the associated request is rescheduled. Please see here: python_backend/src/python_be.cc Lines 1577 to 1585 in 8b01823
|
* Add support for request rescheduling * Address comment * Add documentation * Fix up for doc * Revert response sender changes * Address comment
Testing: triton-inference-server/server#6509