-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add reference count tracking for shared memory regions #7567
Conversation
…ton-inference-server/server into spolisetty_oob_dos_issue_fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the change should be addressed in a different approach so that the shared memory detail is not leaked into Triton core, some background on the current state of the code is that Triton core (/ request) only need to know data pointer of the input and output, the conversion from shared memory handle to address is encapsulated within the frontend. And this can still stay true for reference counting the shared memory regions.
- The
SharedMemoryManager
will still be the centralized location for reference counting, but it doesn't expose inc/dec function directly. Instead, it manages it internally, let's use shared_ptr to be the method to maintain ref count:
- on register, it creates a shared_ptr with deleter which will clean up the shared memory region when shared_ptr's ref count goes to 0
- on unregister, it release the shared_ptr object that it's holding, then ref count --
- (SharedMemoryManager API change) on
GetMemoryInfo
, it returns a copy of the holding shared_ptr, which automatically increase the ref_count.
- Then it is up to the frontend to keep the shared_ptr returned from
GetMemoryInfo
valid until it is done with the shared memory, noticing that both frontends already have done book keeping of the shared memory region used for the request / response, so the shared_ptr can live in the same way as the book keeping information and to be released on corresponding request / output / response release callbacks. In such a way, the reference counting still share similar life cycle as the request but you don't need to inject it into the Triton (core) request object.
There will be corner cases where the use will unregister and re-register the same shared memory region during an inference, but can be addressed through careful design of shared_ptr deleter and SharedMemoryManger modification.
std::string( | ||
"Unable to find system shared memory region: '" + name + "'") | ||
.c_str()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should try to minimize repeated "if/else" for error reporting. When I have time, will think more about it and give more actionable feedback, but this should be something you keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about writing a simple function SharedMemoryTypeString(TRITONSERVER_MemoryType memtype)
which returns "system"
or "CUDA"
?. So we can write error message only once. We already have TRITONSERVER_MemoryTypeString(), but it is not helpful for shared memory related.
Co-authored-by: GuanLuo <[email protected]>
…ton-inference-server/server into spolisetty_oob_dos_issue_fix
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: GuanLuo <[email protected]>
…ons (#7567) (#7612) Co-authored-by: GuanLuo <[email protected]>
…ons (#7567) (#7612) Co-authored-by: GuanLuo <[email protected]>
What does the PR do?
This pull request is intended to address the following issue.
Currently, we do not track whether the shared memory (shm) region is being used by any inference request, and we allow unregistering of the shm region at any time. When performing inference, if the user unregisters the shm region, the server attempts to read or write data in the shm region, resulting in a segmentation fault and causing the server to crash.
To address this issue, we have made the following changes:
ref_count_
counter toShareMemory
, which represents the count of inference requests currently using it. We increment the counter when parsing the request to theInferenceRequest
object, and maintain the unique shm region names in theInferenceRequest
object referencing. Upon response completion or error return, we decrement this counter.ref_count_
is 0 at that moment. This ensures that only unused shm regions can be unregistered. Users can also check the number of inference requests using the shm regionref_count
by querying the shm region status.If a user tries to unregister a shm region that is in use, we return the following error:
Single shm: "Cannot unregister shared memory region 'input0_data', it is currently in use by 1 requests."
All shm: "Failed to unregister the following system shared memory regions: input0_data, output0_data, "
ref_count_
and ensure that all shm regions are unregistered.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)