[broker] Cursor status has always been SwitchingLedger and pendingMarkDeleteOps has accumulated tens of thousands of requests #16859

poorbarcode · 2022-07-29T02:04:29Z

Describe the bug
The pendingMarkDeleteOps has accumulated tens of thousands of requests, and the cursor status has always been SwitchingLedger, and the retention cannot be executed, resulting in the situation that the disk space cannot be reclaimed.

Screenshots

The text was updated successfully, but these errors were encountered:

poorbarcode · 2022-07-29T02:07:36Z

Hi @keyboardbobo

Could you please provide the version used? If possible, you can upload the dump file ( High light which may cause data leakage of the company ).

Could you show the property ManagedCursorMXBean mbean of this Cursor ?

keyboardbobo · 2022-07-29T03:38:44Z

@poorbarcode My version is 2.9.2, the dump file is very large, and it is difficult to upload it on the intranet due to security reasons. The values of the ManagedCursorMXBean mbean are as follows:

persistLedgeSucceed：cellsBusy=0,base=63308,cells:null
persistLedgeFailed：cellsBusy=0,base=1,cells:null
persistZookeeperSucceed：cellsBusy=0,base=1,cells:null
persistZookeeperFailed：cellsBusy=0,base=0,cells:null
writeCursorLedgerSize：cellsBusy=0,base=3437666,cells:null
writeCursorLedgerLogicalSize：cellsBusy=0,base=1718833,cells:null
readCursorLedgerSize：cellsBusy=0,base=9,cells:null

poorbarcode · 2022-07-29T06:22:17Z

Hi @keyboardbobo

I now suspect that something is wrong with the pendingMarkDeletedSubmittedCount. Could you show the property ManagedCursorImpl.pendingMarkDeletedSubmittedCount ?

keyboardbobo · 2022-07-29T06:28:50Z

Hi @keyboardbobo

I now suspect that something is wrong with the pendingMarkDeletedSubmittedCount. Could you show the property ManagedCursorImpl.pendingMarkDeletedSubmittedCount ?

pendingMarkDeletedSubmittedCount =0

poorbarcode · 2022-07-29T06:42:46Z

Hi @keyboardbobo

pendingMarkDeletedSubmittedCount =0

Thanks, then it is not pendingMarkDeletedSubmittedCount's problem. Is there any other error log？

keyboardbobo · 2022-07-29T07:20:04Z

@poorbarcode
At that time, an abnormal scenario with a large network delay from the broker to the bookie was simulated, and there were many errors similar to the following:

Write of ledger entry to quorum failed
NotEnoughBookiesException: Not enough non-faulty bookies available
UpdateLoop(ledgerId=2780774,loopId=54e667fe) Exception updating

Then the disk space increases, and some partitions cannot be reclaim.

poorbarcode · 2022-07-29T08:13:14Z

Hi @keyboardbobo

there were many errors similar to the following:

Could you show more detail about the error log?

keyboardbobo · 2022-07-29T08:37:28Z

@poorbarcode

keyboardbobo · 2022-08-01T07:42:21Z

@poorbarcode I found that there are thousands of tasks in the "BookKeeperClientWorker-OrderedExecutor-59-%d" thread group, and I don't know what caused it. SafeRunnable should catch the exception, but I don't know why this happens

poorbarcode · 2022-08-01T08:00:33Z

Hi @keyboardbobo

I found that there are thousands of tasks in the "BookKeeperClientWorker-OrderedExecutor-59-%d" thread group, and I don't know what caused it. SafeRunnable should catch the exception, but I don't know why this happens

OrderedExecutor used in these is used in the following three scenarios:

Write Bookie
Read Bookie
Send messages to the client

@hangc0276 Is it normal to have so many tasks in the queue?

keyboardbobo · 2022-08-01T08:03:45Z

@poorbarcode
I took a look at some thread groups. Other thread groups are all 0. Only this one has a backlog of more than 7,000 tasks.

keyboardbobo · 2022-08-01T08:05:43Z

@poorbarcode The stack of the three threads of this thread group is as follows:

poorbarcode · 2022-08-01T08:26:58Z

Hi @keyboardbobo

The stack of the three threads of this thread group is as follows:

It seems that one topic is busy while others are not, but this seems have not related to the cursor state always being SwitchingLedger or maybe I didn't understand enough to realize the problem

keyboardbobo · 2022-08-01T09:42:02Z

@poorbarcode I suspect that when SwitchingLedger, some tasks are blocked in the queue, which makes it impossible to complete the execution.

It seems that one topic is busy while others are not, but this seems have not related to the cursor state always being SwitchingLedger or maybe I didn't understand enough to realize the problem

poorbarcode · 2022-08-02T05:52:16Z

Hi @keyboardbobo

Could you provide the BK configuration?

Ensemble size
Write quorum size
Ack quorum size

E.g. Ensemble size = 3, and Write quorum size = 3, When any bookie server does not work, the first write request will timeout, and other requests will backlog

keyboardbobo · 2022-08-02T07:21:38Z

@poorbarcode

managedLedgerDefaultEnsembleSize=3
managedLedgerDefaultWriteQuorum=3
managedLedgerDefaultAckQuorum=2

If bookie is not working, should all thread pools be backlogged, not just that thread pool?

github-actions · 2022-09-02T02:15:02Z

The issue had no activity for 30 days, mark with Stale label.

keyboardbobo · 2022-10-08T03:36:19Z

@poorbarcode There is a new development on the question, but I'm not sure if it's the same problem:
#17967

github-actions · 2022-11-09T02:16:29Z

The issue had no activity for 30 days, mark with Stale label.

poorbarcode · 2023-05-08T01:58:47Z

This issue might be fixed by #17971

poorbarcode added the type/bug The PR fixed a bug or issue reported a bug label Jul 29, 2022

poorbarcode mentioned this issue Jul 29, 2022

[fix][broker] ManagedCursor: mark delete no callback when create meta-ledger fail #16841

Merged

4 tasks

sijie mentioned this issue Jul 29, 2022

ISSUE-16859: [broker] Cursor status has always been SwitchingLedger and pendingMarkDeleteOps has accumulated tens of thousands of requests streamnative/pulsar-archived#4646

Open

github-actions bot added the Stale label Sep 2, 2022

keyboardbobo mentioned this issue Oct 8, 2022

[BUG]Unknown reason causes getPositionAfterN infinite loop #17967

Closed

2 tasks

github-actions bot removed the Stale label Oct 9, 2022

github-actions bot added the Stale label Nov 9, 2022

poorbarcode closed this as completed May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[broker] Cursor status has always been SwitchingLedger and pendingMarkDeleteOps has accumulated tens of thousands of requests #16859

[broker] Cursor status has always been SwitchingLedger and pendingMarkDeleteOps has accumulated tens of thousands of requests #16859

poorbarcode commented Jul 29, 2022 •

edited

Loading

poorbarcode commented Jul 29, 2022 •

edited

Loading

keyboardbobo commented Jul 29, 2022 •

edited

Loading

poorbarcode commented Jul 29, 2022

keyboardbobo commented Jul 29, 2022

poorbarcode commented Jul 29, 2022

keyboardbobo commented Jul 29, 2022

poorbarcode commented Jul 29, 2022

keyboardbobo commented Jul 29, 2022

keyboardbobo commented Aug 1, 2022 •

edited

Loading

poorbarcode commented Aug 1, 2022 •

edited

Loading

keyboardbobo commented Aug 1, 2022

keyboardbobo commented Aug 1, 2022 •

edited

Loading

poorbarcode commented Aug 1, 2022

keyboardbobo commented Aug 1, 2022 •

edited

Loading

poorbarcode commented Aug 2, 2022 •

edited

Loading

keyboardbobo commented Aug 2, 2022 •

edited

Loading

github-actions bot commented Sep 2, 2022

keyboardbobo commented Oct 8, 2022

github-actions bot commented Nov 9, 2022

poorbarcode commented May 8, 2023

[broker] Cursor status has always been SwitchingLedger and pendingMarkDeleteOps has accumulated tens of thousands of requests #16859

[broker] Cursor status has always been SwitchingLedger and pendingMarkDeleteOps has accumulated tens of thousands of requests #16859

Comments

poorbarcode commented Jul 29, 2022 • edited Loading

poorbarcode commented Jul 29, 2022 • edited Loading

keyboardbobo commented Jul 29, 2022 • edited Loading

poorbarcode commented Jul 29, 2022

keyboardbobo commented Jul 29, 2022

poorbarcode commented Jul 29, 2022

keyboardbobo commented Jul 29, 2022

poorbarcode commented Jul 29, 2022

keyboardbobo commented Jul 29, 2022

keyboardbobo commented Aug 1, 2022 • edited Loading

poorbarcode commented Aug 1, 2022 • edited Loading

keyboardbobo commented Aug 1, 2022

keyboardbobo commented Aug 1, 2022 • edited Loading

poorbarcode commented Aug 1, 2022

keyboardbobo commented Aug 1, 2022 • edited Loading

poorbarcode commented Aug 2, 2022 • edited Loading

keyboardbobo commented Aug 2, 2022 • edited Loading

github-actions bot commented Sep 2, 2022

keyboardbobo commented Oct 8, 2022

github-actions bot commented Nov 9, 2022

poorbarcode commented May 8, 2023

poorbarcode commented Jul 29, 2022 •

edited

Loading

poorbarcode commented Jul 29, 2022 •

edited

Loading

keyboardbobo commented Jul 29, 2022 •

edited

Loading

keyboardbobo commented Aug 1, 2022 •

edited

Loading

poorbarcode commented Aug 1, 2022 •

edited

Loading

keyboardbobo commented Aug 1, 2022 •

edited

Loading

keyboardbobo commented Aug 1, 2022 •

edited

Loading

poorbarcode commented Aug 2, 2022 •

edited

Loading

keyboardbobo commented Aug 2, 2022 •

edited

Loading