-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] ManagedCursor: mark delete no callback when create meta-ledger fail #16841
[fix][broker] ManagedCursor: mark delete no callback when create meta-ledger fail #16841
Conversation
@315157973 @codelipenghui @rdhabalia Could you take a look. |
We have synchronized (pendingMarkDeleteOps) for both create cursor failure and |
According to your tips, I found a new problem: see Flow-2 above. And I have modified the @codelipenghui @Technoboy- @mattisonchao Could you review this PR again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
I also encountered a similar scenario here. The pendingMarkDeleteOps has accumulated tens of thousands of requests, and the cursor status has always been |
5d22abb
to
9b86fb7
Compare
In
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
/pulsarbot rerun-failure-checks |
|
/pulsarbot rerun-failure-checks |
Thank you for the information,I have open a issue #16859, We can discuss there. |
…-ledger fail (apache#16841) (cherry picked from commit 5faac76) (cherry picked from commit 86ceb3f)
Fixes: #16711
Motivation
If the meta-ledger fails to be initialized when mark delete is executed, the callback of mark delete will not execute anymore. this case will occur in 1/1000 probability. You can reproduce it by doing this: "Run unit test
ManagedCursorTest.markDeleteWithZKErrors
1000 times".This problem also makes unit test
ManagedCursorTest.markDeleteWithZKErrors
flaky. #16711When the problem occurs, the actual execution process is as follows:
Flow-1. The process we expect
cursor mark deleted
meta thread
pendingMarkDeleteOps
pendingMarkDeleteOps
( High light )Flow-2. The process we did not expect. However, Thread
cursor mark deleted
andmeta thread
may be assigned to the same thread, the actual maybe execution process is as follows:cursor mark deleted
/meta thread
pendingMarkDeleteOps
pendingMarkDeleteOps
( High light ) reentrant lock by the same thread, which we did not expectstep-4: If the ledger fails to be created, will trigger a "fail back" for the pending requests, and the requests that have not been queued will be ignored.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 2570 to 2577 in c217b8f
step-5 ( High light ): If the meta ledger needs to be created, create ledger will be triggered first and the current request will be put into the
pending requests queue
. It is possible thatcreate ledger fail
has been completed before the request is put into the queue, so this request will not get the callback anymore( see flow2 above ).pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 1870 to 1876 in c217b8f
Modifications
Selected plain ( High light )
Makes
put request into the pending queue
executed beforecreate ledger
.Rejected plan
Makes
put request into the pending queue
andcreate ledger
execute serially. Why rejected this plan? Because plan-1 fewer changes.Documentation
Check the box below or label this PR directly.
Need to update docs?
doc-required
(Your PR needs to update docs and you will update later)
doc-not-needed
(Please explain why)
doc
(Your PR contains doc changes)
doc-complete
(Docs have been already added)