Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add reference count tracking for shared memory regions #7567

Merged
merged 26 commits into from
Sep 11, 2024

Conversation

pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Aug 26, 2024

What does the PR do?

This pull request is intended to address the following issue.
Currently, we do not track whether the shared memory (shm) region is being used by any inference request, and we allow unregistering of the shm region at any time. When performing inference, if the user unregisters the shm region, the server attempts to read or write data in the shm region, resulting in a segmentation fault and causing the server to crash.

To address this issue, we have made the following changes:

  • Added a new attribute ref_count_ counter to ShareMemory, which represents the count of inference requests currently using it. We increment the counter when parsing the request to the InferenceRequest object, and maintain the unique shm region names in the InferenceRequest object referencing. Upon response completion or error return, we decrement this counter.
  • We now allow unregistering any system/CUDA shm region only if the ref_count_ is 0 at that moment. This ensures that only unused shm regions can be unregistered. Users can also check the number of inference requests using the shm region ref_count by querying the shm region status.
    If a user tries to unregister a shm region that is in use, we return the following error:
    Single shm: "Cannot unregister shared memory region 'input0_data', it is currently in use by 1 requests."
    All shm: "Failed to unregister the following system shared memory regions: input0_data, output0_data, "
  • During the server shutdown, we will disregard the ref_count_ and ensure that all shm regions are unregistered.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 18081701

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 changed the title Add reference count tracking for shared memory regions fix: Add reference count tracking for shared memory regions Aug 26, 2024
Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the change should be addressed in a different approach so that the shared memory detail is not leaked into Triton core, some background on the current state of the code is that Triton core (/ request) only need to know data pointer of the input and output, the conversion from shared memory handle to address is encapsulated within the frontend. And this can still stay true for reference counting the shared memory regions.

  1. The SharedMemoryManager will still be the centralized location for reference counting, but it doesn't expose inc/dec function directly. Instead, it manages it internally, let's use shared_ptr to be the method to maintain ref count:
  • on register, it creates a shared_ptr with deleter which will clean up the shared memory region when shared_ptr's ref count goes to 0
  • on unregister, it release the shared_ptr object that it's holding, then ref count --
  • (SharedMemoryManager API change) on GetMemoryInfo, it returns a copy of the holding shared_ptr, which automatically increase the ref_count.
  1. Then it is up to the frontend to keep the shared_ptr returned from GetMemoryInfo valid until it is done with the shared memory, noticing that both frontends already have done book keeping of the shared memory region used for the request / response, so the shared_ptr can live in the same way as the book keeping information and to be released on corresponding request / output / response release callbacks. In such a way, the reference counting still share similar life cycle as the request but you don't need to inject it into the Triton (core) request object.

There will be corner cases where the use will unregister and re-register the same shared memory region during an inference, but can be addressed through careful design of shared_ptr deleter and SharedMemoryManger modification.

docs/protocol/extension_shared_memory.md Outdated Show resolved Hide resolved
@pskiran1 pskiran1 requested a review from GuanLuo August 29, 2024 19:18
std::string(
"Unable to find system shared memory region: '" + name + "'")
.c_str());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should try to minimize repeated "if/else" for error reporting. When I have time, will think more about it and give more actionable feedback, but this should be something you keep in mind.

Copy link
Member Author

@pskiran1 pskiran1 Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about writing a simple function SharedMemoryTypeString(TRITONSERVER_MemoryType memtype) which returns "system" or "CUDA"?. So we can write error message only once. We already have TRITONSERVER_MemoryTypeString(), but it is not helpful for shared memory related.

src/grpc/infer_handler.cc Outdated Show resolved Hide resolved
src/grpc/infer_handler.cc Show resolved Hide resolved
src/grpc/stream_infer_handler.h Outdated Show resolved Hide resolved
qa/L0_trt_shape_tensors/test.sh Show resolved Hide resolved
@nnshah1 nnshah1 self-requested a review September 2, 2024 19:15
src/grpc/infer_handler.h Outdated Show resolved Hide resolved
@pskiran1 pskiran1 marked this pull request as ready for review September 3, 2024 07:35
@pskiran1 pskiran1 requested a review from GuanLuo September 3, 2024 07:38
@pskiran1 pskiran1 added the PR: fix A bug fix label Sep 3, 2024
GuanLuo
GuanLuo previously approved these changes Sep 5, 2024
src/grpc/infer_handler.cc Outdated Show resolved Hide resolved
src/grpc/infer_handler.h Outdated Show resolved Hide resolved
tanmayv25
tanmayv25 previously approved these changes Sep 5, 2024
@pskiran1 pskiran1 dismissed stale reviews from tanmayv25 and GuanLuo via cfb50cb September 6, 2024 06:58
@pskiran1 pskiran1 merged commit edd0ac1 into main Sep 11, 2024
3 checks passed
@pskiran1 pskiran1 deleted the spolisetty_oob_dos_issue_fix branch September 11, 2024 03:38
pskiran1 added a commit that referenced this pull request Sep 11, 2024
pskiran1 added a commit that referenced this pull request Sep 11, 2024
pvijayakrish pushed a commit that referenced this pull request Sep 17, 2024
pvijayakrish pushed a commit that referenced this pull request Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: fix A bug fix
Development

Successfully merging this pull request may close these issues.

4 participants