-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return queue is full error if sized_channel is full #11063
return queue is full error if sized_channel is full #11063
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #11063 +/- ##
==========================================
- Coverage 92.23% 92.21% -0.02%
==========================================
Files 405 414 +9
Lines 19089 19728 +639
==========================================
+ Hits 17606 18193 +587
- Misses 1123 1165 +42
- Partials 360 370 +10 ☔ View full report in Codecov by Sentry. |
498c230
to
678ef6a
Compare
678ef6a
to
09ce9d3
Compare
cc @sfc-gh-sili as well as snowflake partner is helping out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…1063) #### Description This change fixes a potential deadlock bug for persistent queue. There is a race condition in persistent queue that caused `used` in `sizedChannel` to become out of sync with `ch` len. This causes `Offer` to be deadlocked in specific race condition. For example: 1. Multiple consumers are calling Consume 2. Multiple producers are calling Offer to insert into the queue a. All elements are taken from consumers. ch is empty 3. One consumer completes consume, calls onProcessingFinished a. Inside sizedChannel, syncSize is invoked, used is reset to 0 when other consumers are still waiting for lock to consume 4. More Offer is called inserting elements -> used and ch len should equal 5. As step 3a consumers completes, used is decreased -> used is lower than ch len a. More Offer is called inserting since used is below capacity. however, ch is full. b. goroutine calling offer is holding the mutex but can’t release it as ch is full. c. no consumer can acquire mutex to complete previous onProcessingFinished This change returns an error if channel is full instead of waiting for it to unblock. #### Link to tracking issue Fixes # open-telemetry#11015 #### Testing - Added concurrent test in persistent queue that can reproduce the problem(note: need to re-run it 100 times as the race condition is not consistent). - Added unit test for sizedChannel #### Documentation Added comment in the block explaining it --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
…1063) #### Description This change fixes a potential deadlock bug for persistent queue. There is a race condition in persistent queue that caused `used` in `sizedChannel` to become out of sync with `ch` len. This causes `Offer` to be deadlocked in specific race condition. For example: 1. Multiple consumers are calling Consume 2. Multiple producers are calling Offer to insert into the queue a. All elements are taken from consumers. ch is empty 3. One consumer completes consume, calls onProcessingFinished a. Inside sizedChannel, syncSize is invoked, used is reset to 0 when other consumers are still waiting for lock to consume 4. More Offer is called inserting elements -> used and ch len should equal 5. As step 3a consumers completes, used is decreased -> used is lower than ch len a. More Offer is called inserting since used is below capacity. however, ch is full. b. goroutine calling offer is holding the mutex but can’t release it as ch is full. c. no consumer can acquire mutex to complete previous onProcessingFinished This change returns an error if channel is full instead of waiting for it to unblock. #### Link to tracking issue Fixes # open-telemetry#11015 #### Testing - Added concurrent test in persistent queue that can reproduce the problem(note: need to re-run it 100 times as the race condition is not consistent). - Added unit test for sizedChannel #### Documentation Added comment in the block explaining it --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
Description
This change fixes a potential deadlock bug for persistent queue.
There is a race condition in persistent queue that caused
used
insizedChannel
to become out of sync withch
len. This causesOffer
to be deadlocked in specific race condition. For example:a. All elements are taken from consumers. ch is empty
a. Inside sizedChannel, syncSize is invoked, used is reset to 0 when other consumers are still waiting for lock to consume
a. More Offer is called inserting since used is below capacity. however, ch is full.
b. goroutine calling offer is holding the mutex but can’t release it as ch is full.
c. no consumer can acquire mutex to complete previous onProcessingFinished
This change returns an error if channel is full instead of waiting for it to unblock.
Link to tracking issue
Fixes # #11015
Testing
Documentation
Added comment in the block explaining it