Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-broker] cursor: safe guard to avoid cursor-znode after cursor is closed #3556

Merged
merged 1 commit into from
Feb 9, 2019

Conversation

rdhabalia
Copy link
Contributor

@rdhabalia rdhabalia commented Feb 8, 2019

Motivation

Sometimes while shutting down the broker, unknown uncompleted callback tries to update managed-cursor even if managed-cursor is closed and because of that topic which is already loaded by new broker sees below exception while updating managed-cursor-metadata and cursor is not able to update mark-delete position until it unloaded again.

09:22:02.641 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper - [sample/global/ns1/persistent/TopicBadVersion] [pulsar.repl.us-east] Updating cursor info ledgerId=234567890 mark-delete=12345678900:5306
00:00:05.642 [bookkeeper-ml-workers-OrderedExecutor-14-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [sample/global/ns1/persistent/TopicBadVersion] Failed to update consumer pulsar.repl.us-east
org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[pulsar-broker-2.2.5-.jar:2.2.5-]
        at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$103(MetaStoreImplZookeeper.java:287) ~[managed-ledger-original-2.2.5-.jar:2.2.5-]
        at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.2.5-.jar:2.2.5-]
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.7.2.jar:4.7.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.22.Final.jar:4.1.22.Final]

It seems it always happens for replicator-cursor and not sure exact step to reproduce but it might happen when bookies might be in unstable state (not sure about the root cause). one of the example:

Behavior:
broker-1 unloads the topic by 09:21:59.451 and broker-2 loads the same topic at 09:22:00.135. But broker-1 has log of updating metadata-znode at 09:22:05.207. So, broker-2 fails to update cursor-metadata with exception BadVersionException

Previous-broker: broker-1
09:21:58.855 [shutdown-thread-49-1] INFO  org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: sample/global/ns1/0x2aaaaaa8_0x35555552
:
:
09:21:59.668 [shutdown-thread-49-1] INFO  org.apache.pulsar.broker.namespace.OwnedBundle - Unloading sample/global/ns1/0x2aaaaaa8_0x35555552 namespace-bundle with 345 topics completed in 812.0 ms
:
09:21:58.930 [shutdown-thread-49-1] INFO  org.apache.pulsar.broker.service.BrokerService - [persistent://sample/global/ns1/TopicBadVersion] Unloading topic
:
09:21:59.451 [bookkeeper-ml-workers-OrderedExecutor-14-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [sample/global/ns1/persistent/TopicBadVersion][pulsar.repl.us-east] Closed cursor at md-position=12345678900:5306
09:21:59.451 [bookkeeper-ml-workers-OrderedExecutor-14-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://sample/global/ns1/TopicBadVersion] Topic closed
:
09:22:05.207 [BookKeeperClientWorker-OrderedExecutor-2-0] WARN  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [sample/global/ns1/persistent/TopicBadVersion] Error updating cursor pulsar.repl.us-east position 12345678900:5306 in meta-ledger 14451380450: BookKeeper client is closed
09:22:05.207 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper - [sample/global/ns1/persistent/TopicBadVersion] [pulsar.repl.us-east] Updating cursor info ledgerId=-1 mark-delete=12345678900:5306
Current-broker: broker-2
09:22:00.135 [pulsar-ordered-OrderedExecutor-7-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger sample/global/ns1/persistent/TopicBadVersion

Modification

It happens when someone tries to update cursor metadata by calling persistPositionMetaStore(..) so, it should have safe guard and should not update metadata if cursor is already closed.

Result

It avoids any unexpected behavior while updating cursor-metadata in closed cursor state.

@rdhabalia rdhabalia added this to the 2.3.0 milestone Feb 8, 2019
@rdhabalia rdhabalia self-assigned this Feb 8, 2019
@rdhabalia rdhabalia force-pushed the cursor_update branch 2 times, most recently from ce18072 to 7857e90 Compare February 8, 2019 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants