-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporter/Internal/queue size_channel.pop() is blocked even when queue is full #11015
Comments
adding test to replicate main...timannguyen:opentelemetry-collector:test-pq-concurrency
doesnt work. trying to recreate tests |
updated main...timannguyen:opentelemetry-collector:test-pq-concurrency issue where:
let me know if there is any issue with the test |
@timannguyen I noticed that in your test, |
#### Description This change fixes a potential deadlock bug for persistent queue. There is a race condition in persistent queue that caused `used` in `sizedChannel` to become out of sync with `ch` len. This causes `Offer` to be deadlocked in specific race condition. For example: 1. Multiple consumers are calling Consume 2. Multiple producers are calling Offer to insert into the queue a. All elements are taken from consumers. ch is empty 3. One consumer completes consume, calls onProcessingFinished a. Inside sizedChannel, syncSize is invoked, used is reset to 0 when other consumers are still waiting for lock to consume 4. More Offer is called inserting elements -> used and ch len should equal 5. As step 3a consumers completes, used is decreased -> used is lower than ch len a. More Offer is called inserting since used is below capacity. however, ch is full. b. goroutine calling offer is holding the mutex but can’t release it as ch is full. c. no consumer can acquire mutex to complete previous onProcessingFinished This change returns an error if channel is full instead of waiting for it to unblock. #### Link to tracking issue Fixes # #11015 #### Testing - Added concurrent test in persistent queue that can reproduce the problem(note: need to re-run it 100 times as the race condition is not consistent). - Added unit test for sizedChannel #### Documentation Added comment in the block explaining it --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
…1063) #### Description This change fixes a potential deadlock bug for persistent queue. There is a race condition in persistent queue that caused `used` in `sizedChannel` to become out of sync with `ch` len. This causes `Offer` to be deadlocked in specific race condition. For example: 1. Multiple consumers are calling Consume 2. Multiple producers are calling Offer to insert into the queue a. All elements are taken from consumers. ch is empty 3. One consumer completes consume, calls onProcessingFinished a. Inside sizedChannel, syncSize is invoked, used is reset to 0 when other consumers are still waiting for lock to consume 4. More Offer is called inserting elements -> used and ch len should equal 5. As step 3a consumers completes, used is decreased -> used is lower than ch len a. More Offer is called inserting since used is below capacity. however, ch is full. b. goroutine calling offer is holding the mutex but can’t release it as ch is full. c. no consumer can acquire mutex to complete previous onProcessingFinished This change returns an error if channel is full instead of waiting for it to unblock. #### Link to tracking issue Fixes # open-telemetry#11015 #### Testing - Added concurrent test in persistent queue that can reproduce the problem(note: need to re-run it 100 times as the race condition is not consistent). - Added unit test for sizedChannel #### Documentation Added comment in the block explaining it --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
…1063) #### Description This change fixes a potential deadlock bug for persistent queue. There is a race condition in persistent queue that caused `used` in `sizedChannel` to become out of sync with `ch` len. This causes `Offer` to be deadlocked in specific race condition. For example: 1. Multiple consumers are calling Consume 2. Multiple producers are calling Offer to insert into the queue a. All elements are taken from consumers. ch is empty 3. One consumer completes consume, calls onProcessingFinished a. Inside sizedChannel, syncSize is invoked, used is reset to 0 when other consumers are still waiting for lock to consume 4. More Offer is called inserting elements -> used and ch len should equal 5. As step 3a consumers completes, used is decreased -> used is lower than ch len a. More Offer is called inserting since used is below capacity. however, ch is full. b. goroutine calling offer is holding the mutex but can’t release it as ch is full. c. no consumer can acquire mutex to complete previous onProcessingFinished This change returns an error if channel is full instead of waiting for it to unblock. #### Link to tracking issue Fixes # open-telemetry#11015 #### Testing - Added concurrent test in persistent queue that can reproduce the problem(note: need to re-run it 100 times as the race condition is not consistent). - Added unit test for sizedChannel #### Documentation Added comment in the block explaining it --------- Co-authored-by: Dmitrii Anoshin <[email protected]>
Describe the bug
size_channel.pop() is blocking when queue is full. This looks to happen when there are high number of connections and high throughput.
Steps to reproduce
TBA. will be writing a test to replicate
What did you expect to see?
size_channel.pop() should receive data and returns when queue is not empty.
What did you see instead?
size_channel.pop() is blocked indefinitely even if queue is full.
What version did you use?
v0.103.0
What config did you use?
Environment
Linux
go 1.21.0
Additional context
The text was updated successfully, but these errors were encountered: