Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connection based FailureDetector #8491

Merged
merged 3 commits into from
Apr 16, 2022

Conversation

Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Apr 7, 2022

For issue #8490

Description

  • Add FailureDetector module to the broker
  • Add QueryResponse interface
  • Add ConnectionFailureDetector to detect server connection failures

Release Notes

Added the following failure detector related configs to the broker config:

  • pinot.broker.failure.detector.type: can be NO_OP (default), CONNECTION, CUSTOM
  • pinot.broker.failure.detector.class: for CUSTOM type
  • pinot.broker.failure.detector.retry.initial.delay.ms: 5000 by default
  • pinot.broker.failure.detector.retry.delay.factor: 2 by default
  • pinot.broker.failure.detector.max.retries: 10 by default

@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Apr 7, 2022
@Jackie-Jiang Jackie-Jiang requested a review from xiangfu0 April 7, 2022 22:11
@Jackie-Jiang Jackie-Jiang force-pushed the failure_detector branch 2 times, most recently from d4426b0 to 2e82aeb Compare April 7, 2022 22:59
@codecov-commenter
Copy link

codecov-commenter commented Apr 7, 2022

Codecov Report

Merging #8491 (5b639bc) into master (58ffe94) will decrease coverage by 43.43%.
The diff coverage is 68.02%.

❗ Current head 5b639bc differs from pull request most recent head 86d6469. Consider uploading reports for the commit 86d6469 to get more accurate results

@@              Coverage Diff              @@
##             master    #8491       +/-   ##
=============================================
- Coverage     70.53%   27.10%   -43.44%     
=============================================
  Files          1681     1674        -7     
  Lines         87913    87842       -71     
  Branches      13309    13305        -4     
=============================================
- Hits          62009    23809    -38200     
- Misses        21601    61817    +40216     
+ Partials       4303     2216     -2087     
Flag Coverage Δ
integration1 27.10% <68.02%> (+0.30%) ⬆️
integration2 ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...roker/requesthandler/GrpcBrokerRequestHandler.java 0.00% <ø> (-78.38%) ⬇️
...ker/routing/instanceselector/InstanceSelector.java 100.00% <ø> (ø)
...rg/apache/pinot/core/transport/ServerResponse.java 100.00% <ø> (+3.84%) ⬆️
...ment/creator/impl/SegmentColumnarIndexCreator.java 0.00% <0.00%> (-86.72%) ⬇️
...ot/segment/spi/creator/SegmentGeneratorConfig.java 0.00% <0.00%> (-82.65%) ⬇️
.../apache/pinot/spi/config/table/IndexingConfig.java 0.00% <0.00%> (-92.21%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 0.00% <0.00%> (-23.64%) ⬇️
...rg/apache/pinot/core/transport/ServerInstance.java 26.00% <22.22%> (-27.20%) ⬇️
...a/org/apache/pinot/core/transport/QueryRouter.java 71.08% <40.00%> (-14.63%) ⬇️
...broker/failuredetector/FailureDetectorFactory.java 40.90% <40.90%> (ø)
... and 1248 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58ffe94...86d6469. Read the comment docs.

@Jackie-Jiang Jackie-Jiang force-pushed the failure_detector branch 5 times, most recently from 9e050a3 to 8a69cc8 Compare April 11, 2022 21:20
*/
@ThreadSafe
public abstract class BaseExponentialBackoffRetryFailureDetector implements FailureDetector {
protected final Logger _logger = LoggerFactory.getLogger(getClass());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this (and the _name variable) aren't static. Is there a convention in the Pinot code base? Just FYI, I found 437 instances of static final Logger LOGGER, and 14 of final Logger _logger (a few of which were actually static).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a base class, and we want to log the actual class name that extends this base class, so it cannot be static. E.g. I want the logger to log under ConnectionFailureDetector instead of BaseExponentialBackoffRetryFailureDetector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying...but why do you want classes that extend BaseExponentialBackoffRetryFailureDetector to use its logger? Is there some issue with the concrete class having its own logger?

Just asking because I see 25+ instances of abstract classes with private static loggers, but only one other class BaseSegmentJob that does the same thing as here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseSegmentFetcher is also following this pattern, but you are right the majority of the abstract classes use the static logger. IMO it doesn't matter because there is only one single failure detector instance running, so no extra overhead, but I'm okay changing it to a static one for consistency

- Add FailureDetector module to the broker
- Add QueryResponse interface
- Add ConnectionFailureDetector to detect server connection failures
Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@vvivekiyer
Copy link
Contributor

LGTM.

@Jackie-Jiang Jackie-Jiang merged commit ce0a08a into apache:master Apr 16, 2022
@Jackie-Jiang Jackie-Jiang deleted the failure_detector branch April 16, 2022 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants