fix: Add reference count tracking for shared memory regions #7567

pskiran1 · 2024-08-26T13:26:00Z

What does the PR do?

This pull request is intended to address the following issue.
Currently, we do not track whether the shared memory (shm) region is being used by any inference request, and we allow unregistering of the shm region at any time. When performing inference, if the user unregisters the shm region, the server attempts to read or write data in the shm region, resulting in a segmentation fault and causing the server to crash.

To address this issue, we have made the following changes:

Added a new attribute ref_count_ counter to ShareMemory, which represents the count of inference requests currently using it. We increment the counter when parsing the request to the InferenceRequest object, and maintain the unique shm region names in the InferenceRequest object referencing. Upon response completion or error return, we decrement this counter.
We now allow unregistering any system/CUDA shm region only if the ref_count_ is 0 at that moment. This ensures that only unused shm regions can be unregistered. Users can also check the number of inference requests using the shm region ref_count by querying the shm region status.
If a user tries to unregister a shm region that is in use, we return the following error:
Single shm: "Cannot unregister shared memory region 'input0_data', it is currently in use by 1 requests."
All shm: "Failed to unregister the following system shared memory regions: input0_data, output0_data, "
During the server shutdown, we will disregard the ref_count_ and ensure that all shm regions are unregistered.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 18081701

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

into spolisetty_oob_dos_issue_fix

…ton-inference-server/server into spolisetty_oob_dos_issue_fix

GuanLuo

I think that the change should be addressed in a different approach so that the shared memory detail is not leaked into Triton core, some background on the current state of the code is that Triton core (/ request) only need to know data pointer of the input and output, the conversion from shared memory handle to address is encapsulated within the frontend. And this can still stay true for reference counting the shared memory regions.

The SharedMemoryManager will still be the centralized location for reference counting, but it doesn't expose inc/dec function directly. Instead, it manages it internally, let's use shared_ptr to be the method to maintain ref count:

on register, it creates a shared_ptr with deleter which will clean up the shared memory region when shared_ptr's ref count goes to 0
on unregister, it release the shared_ptr object that it's holding, then ref count --
(SharedMemoryManager API change) on GetMemoryInfo, it returns a copy of the holding shared_ptr, which automatically increase the ref_count.

Then it is up to the frontend to keep the shared_ptr returned from GetMemoryInfo valid until it is done with the shared memory, noticing that both frontends already have done book keeping of the shared memory region used for the request / response, so the shared_ptr can live in the same way as the book keeping information and to be released on corresponding request / output / response release callbacks. In such a way, the reference counting still share similar life cycle as the request but you don't need to inject it into the Triton (core) request object.

There will be corner cases where the use will unregister and re-register the same shared memory region during an inference, but can be addressed through careful design of shared_ptr deleter and SharedMemoryManger modification.

docs/protocol/extension_shared_memory.md

into spolisetty_oob_dos_issue_fix

qa/L0_shared_memory/shared_memory_test.py

src/shared_memory_manager.h

GuanLuo · 2024-08-30T20:28:54Z

src/shared_memory_manager.cc

+            std::string(
+                "Unable to find system shared memory region: '" + name + "'")
+                .c_str());
+      }


Should try to minimize repeated "if/else" for error reporting. When I have time, will think more about it and give more actionable feedback, but this should be something you keep in mind.

How about writing a simple function SharedMemoryTypeString(TRITONSERVER_MemoryType memtype) which returns "system" or "CUDA"?. So we can write error message only once. We already have TRITONSERVER_MemoryTypeString(), but it is not helpful for shared memory related.

src/grpc/infer_handler.cc

src/grpc/stream_infer_handler.h

qa/L0_trt_shape_tensors/test.sh

into spolisetty_oob_dos_issue_fix

src/grpc/infer_handler.h

into spolisetty_oob_dos_issue_fix

src/grpc/infer_handler.cc

src/grpc/infer_handler.h

into spolisetty_oob_dos_issue_fix

Co-authored-by: GuanLuo <[email protected]>

…ton-inference-server/server into spolisetty_oob_dos_issue_fix

Co-authored-by: GuanLuo <[email protected]>

…ons (#7567) (#7612) Co-authored-by: GuanLuo <[email protected]>

pskiran1 added 4 commits August 26, 2024 13:25

Add shm reference counter support

7dad71b

Update

8fd831c

Update copyright

58aa7fe

Fix pre-commit errors

aa0b282

pskiran1 requested review from Tabrizian, tanmayv25 and GuanLuo August 26, 2024 16:12

pskiran1 changed the title ~~Add reference count tracking for shared memory regions~~ fix: Add reference count tracking for shared memory regions Aug 26, 2024

pskiran1 added 2 commits August 28, 2024 02:28

Merge branch 'main' of https://github.com/triton-inference-server/server

7b057d2

into spolisetty_oob_dos_issue_fix

Merge branch 'spolisetty_oob_dos_issue_fix' of https://github.com/tri…

20edd69

…ton-inference-server/server into spolisetty_oob_dos_issue_fix

GuanLuo reviewed Aug 28, 2024

View reviewed changes

docs/protocol/extension_shared_memory.md Outdated Show resolved Hide resolved

pskiran1 added 2 commits August 29, 2024 23:04

Merge branch 'main' of https://github.com/triton-inference-server/server

6c4424b

into spolisetty_oob_dos_issue_fix

Enhancements

0d7bf1a

github-advanced-security bot found potential problems Aug 29, 2024

View reviewed changes

qa/L0_shared_memory/shared_memory_test.py Fixed Show fixed Hide fixed

pskiran1 added 3 commits August 30, 2024 00:08

Fix pre-commit errors

bf9e87c

Fix alert

862d01d

Fix pre-commit errors

8290e87

pskiran1 requested a review from GuanLuo August 29, 2024 19:18

pskiran1 added 5 commits August 30, 2024 20:29

Update

d849132

Update

8266aed

Update

c56725f

Undo formatting

481b2ff

Fix errors

d0fb4b9

GuanLuo reviewed Aug 30, 2024

View reviewed changes

src/shared_memory_manager.h Outdated Show resolved Hide resolved

GuanLuo reviewed Aug 30, 2024

View reviewed changes

pskiran1 added 4 commits August 31, 2024 16:06

Add copyright

ba4a1bc

Merge branch 'main' of https://github.com/triton-inference-server/server

cc1ce14

into spolisetty_oob_dos_issue_fix

Enhancements

094d846

Fix pre-commit

2215904

nnshah1 self-requested a review September 2, 2024 19:15

nnshah1 reviewed Sep 2, 2024

View reviewed changes

src/grpc/infer_handler.h Outdated Show resolved Hide resolved

Update names

474f344

pskiran1 marked this pull request as ready for review September 3, 2024 07:35

pskiran1 requested a review from GuanLuo September 3, 2024 07:38

pskiran1 added the PR: fix A bug fix label Sep 3, 2024

Merge branch 'main' of https://github.com/triton-inference-server/server

813e3c9

into spolisetty_oob_dos_issue_fix

GuanLuo previously approved these changes Sep 5, 2024

View reviewed changes

src/grpc/infer_handler.cc Outdated Show resolved Hide resolved

src/grpc/infer_handler.h Outdated Show resolved Hide resolved

tanmayv25 previously approved these changes Sep 5, 2024

View reviewed changes

pskiran1 and others added 2 commits September 6, 2024 12:27

Merge branch 'main' of https://github.com/triton-inference-server/server

8669859

into spolisetty_oob_dos_issue_fix

Update src/grpc/infer_handler.h

cfb50cb

Co-authored-by: GuanLuo <[email protected]>

pskiran1 dismissed stale reviews from tanmayv25 and GuanLuo via cfb50cb September 6, 2024 06:58

pskiran1 added 2 commits September 6, 2024 12:28

Merge branch 'spolisetty_oob_dos_issue_fix' of https://github.com/tri…

f42368f

…ton-inference-server/server into spolisetty_oob_dos_issue_fix

Update

0be4137

tanmayv25 approved these changes Sep 6, 2024

View reviewed changes

pskiran1 merged commit edd0ac1 into main Sep 11, 2024
3 checks passed

pskiran1 deleted the spolisetty_oob_dos_issue_fix branch September 11, 2024 03:38

pskiran1 added a commit that referenced this pull request Sep 11, 2024

fix: Add reference count tracking for shared memory regions (#7567)

da52f1c

Co-authored-by: GuanLuo <[email protected]>

pskiran1 added a commit that referenced this pull request Sep 11, 2024

fix: Add reference count tracking for shared memory regions (#7567)

dbf7eae

Co-authored-by: GuanLuo <[email protected]>

pskiran1 mentioned this pull request Sep 11, 2024

Cherry-pick: fix: Add reference count tracking for shared memory regions (#7567) #7612

Merged

pvijayakrish pushed a commit that referenced this pull request Sep 17, 2024

Cherry-pick: fix: Add reference count tracking for shared memory regi…

29c7a28

…ons (#7567) (#7612) Co-authored-by: GuanLuo <[email protected]>

hcho3 mentioned this pull request Oct 8, 2024

Possible bug in reference counting with shared memory regions #7688

Open

pskiran1 mentioned this pull request Oct 25, 2024

feat: Enable deferred unregistering of shared memory regions after inference #7743

Merged

20 tasks

pvijayakrish pushed a commit that referenced this pull request Jan 15, 2025

Cherry-pick: fix: Add reference count tracking for shared memory regi…

ae8ffbf

…ons (#7567) (#7612) Co-authored-by: GuanLuo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add reference count tracking for shared memory regions #7567

fix: Add reference count tracking for shared memory regions #7567

pskiran1 commented Aug 26, 2024 •

edited

Loading

GuanLuo left a comment

GuanLuo Aug 30, 2024

pskiran1 Sep 1, 2024 •

edited

Loading

fix: Add reference count tracking for shared memory regions #7567

fix: Add reference count tracking for shared memory regions #7567

Conversation

pskiran1 commented Aug 26, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

GuanLuo left a comment

Choose a reason for hiding this comment

GuanLuo Aug 30, 2024

Choose a reason for hiding this comment

pskiran1 Sep 1, 2024 • edited Loading

Choose a reason for hiding this comment

pskiran1 commented Aug 26, 2024 •

edited

Loading

pskiran1 Sep 1, 2024 •

edited

Loading