-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] Do not close the socket if lookup failed due to LockBusyException #21993
[improve][broker] Do not close the socket if lookup failed due to LockBusyException #21993
Conversation
…kBusyException ### Motivation When a broker restarted, there is a case in `NamespaceService#findBrokerServiceUrl`: 1. `ownershipCache.getOwnerAsync(bundle)` got an empty data, then `searchForCandidateBroker` will be called 2. The broker itself was elected as the candidate broker. 3. Meanwhile, the other broker has acquired the distributed lock of the bundle, then `ownershipCache.tryAcquiringOwnership` will fail with ```java lookupFuture.completeExceptionally(new PulsarServerException( "Failed to acquire ownership for namespace bundle " + bundle, exception)); ``` See apache/pulsar-client-cpp#390 for the real world case. Then in `TopicLookupBase#handleLookupError`, this exception will be wrapped into a `ServiceNotReady` error to client. This case happens very frequently in our production environment when a broker restarted. If there is a `PulsarClient` that has many producers or consumers, the connection will be closed, which results in many reconnections, which brings much pressure to the cluster. ### Modifications In `handleLookupError`, check the `PulsarServerException` and unwrap the `CompletionException`. If the unwrapped exception is `MetadataStoreException`, return the `MetadataError` to avoid closing the connection at client side. Add `testLookupConnectionNotCloseIfFailedToAcquireOwnershipOfBundle` to simulate the case and verify the socket won't be closed.
@BewareMyPower Looks like a good proposal, I'll review later in more detail. One observation is that this problem is really common when the leader broker election is broken, like with the bug #21897 which was fixed by #21894 . |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21993 +/- ##
============================================
- Coverage 73.63% 73.63% -0.01%
- Complexity 32473 32494 +21
============================================
Files 1863 1863
Lines 138786 138814 +28
Branches 15207 15216 +9
============================================
+ Hits 102201 102216 +15
- Misses 28687 28695 +8
- Partials 7898 7903 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
pulsar-broker/src/main/java/org/apache/pulsar/broker/lookup/TopicLookupBase.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/lookup/TopicLookupBase.java
Show resolved
Hide resolved
…kBusyException (apache#21993) (cherry picked from commit bf5639f)
…kBusyException (apache#21993) (cherry picked from commit bf5639f)
Motivation
When a broker restarted, there is a case in
NamespaceService#findBrokerServiceUrl
:ownershipCache.getOwnerAsync(bundle)
got an empty data, thensearchForCandidateBroker
will be calledownershipCache.tryAcquiringOwnership
will fail withSee apache/pulsar-client-cpp#390 for the real world case.
Then in
TopicLookupBase#handleLookupError
, this exception will be wrapped into aServiceNotReady
error to client.This case happens very frequently in our production environment when a broker restarted. If there is a
PulsarClient
that has many producers or consumers, the connection will be closed, which results in many reconnections, which brings much pressure to the cluster.Modifications
In
handleLookupError
, check thePulsarServerException
and unwrap theCompletionException
. If the unwrapped exception isMetadataStoreException
, return theMetadataError
to avoid closing the connection at client side.Add
testLookupConnectionNotCloseIfFailedToAcquireOwnershipOfBundle
to simulate the case and verify the socket won't be closed.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: