-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] The topic might reference a closed ledger #22739
Conversation
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Show resolved
Hide resolved
Great work! |
@@ -1434,13 +1432,6 @@ public void testCleanupTopic() throws Exception { | |||
// Ok | |||
} | |||
|
|||
final CompletableFuture<Optional<Topic>> timedOutTopicFuture = topicFuture; | |||
// timeout topic future should be removed from cache | |||
retryStrategically((test) -> pulsar1.getBrokerService().getTopic(topicName, false) != timedOutTopicFuture, 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit test assumes a manager ledger that can never be completed to mock topic create timeout, which does not make sense in a production scenario.
Remove this assertion, and add an assertion afterward: Once ml is completed, the topic should be removed due to timeout.
&& FutureUtil.getException(topicFuture).get().equals(FAILED_TO_LOAD_TOPIC_TIMEOUT_EXCEPTION)) { | ||
return FutureUtil.failedFuture(FAILED_TO_LOAD_TOPIC_TIMEOUT_EXCEPTION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to suggest to follow the solution from this PR #22283
- The broker should clean up all the topicFutures for any exceptions finally
- The new get topic operation should not remove the topicFuture that created by previous get topic operation
- The topicFuture should only be created if there is no previous topicFuture. If the previous topicFuture is not removed from the map yet, the broker should always use the existing topicFuture (waiting for completion or return error to the client side directly)
The existing code has mixed them which will introduce contention for the get topic operation. Now, we checked the topic load timeout exception, but don't know how many others we also need to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with these points. Actually, I tried do that: https://github.com/shibd/pulsar/pull/38/files
But, I got some tests that not pass, So, I used this short-term solution.
I'm going to continue to look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new solution refer to this PR: #22860
Close with this comments: #22739 (comment) |
Motivation
We observe that a
normal topic
might reference aclosed ledger
and it never auto recover. will cause the producer and customer stuck.The root cause is related to
topic create timeout
. Assuming we have two concurrent requests to create a topic:firstCreate(Thread-1)
,secondCreate(Thread-2)
old ledger
being referenced tonew topic
and that stats is closeWhen the
firstCreate
topic timeout, will calltopic.close
. it will close the ledger, and remove it from the ledger cache.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Lines 1745 to 1752 in f07b3a0
If the
secondCreate
request successfully creates a topic before theold ledger closes
, the reference will be made to theold ledger
.Modifications
Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: shibd#37