Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing TraceContext NPE issue #8126

Merged
merged 1 commit into from
Feb 4, 2022

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Feb 4, 2022

Description

When query timed out it's possible that one thread tries to register the trace after the request is already unregistered. So need to check null before putting the trace.

Sample stacktrace:

2022/02/04 00:01:06.604 ERROR [ServerQueryExecutorV1Impl] [pqr-7] Exception processing requestId 1193
java.lang.RuntimeException: Caught exception while running CombinePlanNode.
        at org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:164) 
        at org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:41) 
        at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45) 
        at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:290) 
        at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:192) 
        at org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60) 
        at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:154) 
        at org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:138) 
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
        at shaded.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at shaded.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
        at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:?]
5:10
        at org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:154) 
        ... 15 more
Caused by: java.lang.NullPointerException
        at org.apache.pinot.core.util.trace.TraceContext.registerThreadToRequest(TraceContext.java:140) 
        at org.apache.pinot.core.util.trace.TraceCallable.call(TraceCallable.java:41) 
        ... 8 more

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2022

Codecov Report

Merging #8126 (e994021) into master (7d74c8c) will increase coverage by 0.08%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #8126      +/-   ##
============================================
+ Coverage     71.33%   71.41%   +0.08%     
  Complexity     4302     4302              
============================================
  Files          1624     1624              
  Lines         84140    84142       +2     
  Branches      12596    12597       +1     
============================================
+ Hits          60021    60094      +73     
+ Misses        20013    19947      -66     
+ Partials       4106     4101       -5     
Flag Coverage Δ
integration1 29.01% <0.00%> (+0.20%) ⬆️
integration2 27.56% <0.00%> (-0.05%) ⬇️
unittests1 67.94% <66.66%> (-0.01%) ⬇️
unittests2 14.18% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...org/apache/pinot/core/util/trace/TraceContext.java 73.58% <66.66%> (-0.93%) ⬇️
...a/org/apache/pinot/common/utils/ServiceStatus.java 60.00% <0.00%> (-7.15%) ⬇️
...he/pinot/segment/local/segment/store/IndexKey.java 70.00% <0.00%> (-5.00%) ⬇️
...core/query/pruner/SelectionQuerySegmentPruner.java 86.36% <0.00%> (-2.28%) ⬇️
...core/startree/operator/StarTreeFilterOperator.java 85.31% <0.00%> (-2.10%) ⬇️
.../pinot/server/starter/helix/BaseServerStarter.java 57.98% <0.00%> (-1.97%) ⬇️
...apache/pinot/controller/api/upload/ZKOperator.java 74.40% <0.00%> (-1.60%) ⬇️
...e/pinot/broker/broker/helix/BaseBrokerStarter.java 75.67% <0.00%> (-1.09%) ⬇️
.../aggregation/function/ModeAggregationFunction.java 88.10% <0.00%> (-0.55%) ⬇️
...va/org/apache/pinot/controller/ControllerConf.java 58.53% <0.00%> (-0.41%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d74c8c...e994021. Read the comment docs.

@xiangfu0 xiangfu0 force-pushed the fixing_trace_context_npe branch from 9c7ab22 to e994021 Compare February 4, 2022 05:58
@xiangfu0 xiangfu0 merged commit e59730a into apache:master Feb 4, 2022
@xiangfu0 xiangfu0 deleted the fixing_trace_context_npe branch February 4, 2022 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants