You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Waged pipeline will fail due to NPE during BestPossibleStateCalcStage as it will call checkAndReduceInstanceCapacity on an instance that is not in the WagedInstanceCapacity's _instanceCapacityMap. This will occur when the WagedInstanceCapacity is calculated at point A, a new instance is added at point B, and then at at point C the WagedInstanceCapacity is not refreshed to include this instance during the CurrentStateComputationStage The specific circumstances are detailed below
To Reproduce
Add at least 1 waged enabled resource to a cluster and rebalance so assignments are made.
Drop all resources from the cluster.
Add a new instance ("new_instance")to the cluster.
Add 1 waged enabled resource to the cluster
NPE will occur
This occurs because "new_instance" is an assignable instance and is in the newly calculated preference list. So checkAndReduceInstanceCapacity is called on the instance. However, WagedInstanceCapacity's _instanceCapacityMap has not been updated and therefore has a stale view that does not include "new_instance"
This is because the skipCapacityCalculation method (a very effective optimization) causes the CurrentStateComputationStage to not refresh the cache if there are no resources in the resourceMap. However, the resourceMap is constructed based off the idealStates in the cluster which does not exist at this point. When a resource is added, a ResourceConfigChange event is first fired. Afterwards, an IdealStateChange will fire. In this case of a new resource being added, the CurrentStateComputationStage will not recalculate the WagedInstanceCapacity as the resourceMap is empty when we encounter a ResourceConfigChange and then we do not recalculate on subsequent IdealStateChange
Adding a WAGED resource to a new cluster does not trigger this NPE because there is no WagedInstanceCapacity so
if (Objects.isNull(cache.getWagedInstanceCapacity())) {
return false;
}
will force it to be refreshed.
GrantPSpencer#32
The testcase in this draft PR will fail on master and follows the steps outlined above.
Expected behavior
WagedInstanceCapacity should be recalculated in the case of a new resource being added prior to the BestPossibleStateCalcStage.
Additional context
10539 [HelixController-pipeline-default-TestWagedNPE_cluster-(45df0f8d_DEFAULT)] ERROR org.apache.helix.controller.GenericHelixController [] - Exception while executing DEFAULT pipeline for cluster TestWagedNPE_cluster. Will not continue to next pipeline
java.lang.NullPointerException: null
at org.apache.helix.controller.rebalancer.waged.WagedInstanceCapacity.checkAndReduceInstanceCapacity(WagedInstanceCapacity.java:206) ~[classes/:?]
at org.apache.helix.controller.dataproviders.ResourceControllerDataProvider.checkAndReduceCapacity(ResourceControllerDataProvider.java:535) ~[classes/:?]
at org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossibleStateForPartition(DelayedAutoRebalancer.java:377) ~[classes/:?]
at org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:271) ~[classes/:?]
at org.apache.helix.controller.rebalancer.DelayedAutoRebalancer.computeBestPossiblePartitionState(DelayedAutoRebalancer.java:54) ~[classes/:?]
at org.apache.helix.controller.rebalancer.waged.WagedRebalancer.lambda$computeNewIdealStates$0(WagedRebalancer.java:281) ~[classes/:?]
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?]
at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1692) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) ~[?:?]
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:290) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java) ~[?:?]
at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408) ~[?:?]
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736) ~[?:?]
at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) ~[?:?]
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) ~[?:?]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) ~[?:?]
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661) ~[?:?]
at org.apache.helix.controller.rebalancer.waged.WagedRebalancer.computeNewIdealStates(WagedRebalancer.java:277) ~[classes/:?]
at org.apache.helix.controller.stages.BestPossibleStateCalcStage.computeResourceBestPossibleStateWithWagedRebalancer(BestPossibleStateCalcStage.java:445) ~[classes/:?]
at org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:289) ~[classes/:?]
at org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:94) ~[classes/:?]
at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:75) ~[classes/:?]
at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:903) [classes/:?]
at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1554) [classes/:?]
The text was updated successfully, but these errors were encountered:
Describe the bug
Waged pipeline will fail due to NPE during
BestPossibleStateCalcStage
as it will callcheckAndReduceInstanceCapacity
on an instance that is not in theWagedInstanceCapacity
's_instanceCapacityMap
. This will occur when theWagedInstanceCapacity
is calculated at point A, a new instance is added at point B, and then at at point C theWagedInstanceCapacity
is not refreshed to include this instance during theCurrentStateComputationStage
The specific circumstances are detailed belowTo Reproduce
"new_instance"
)to the cluster.This occurs because
"new_instance"
is an assignable instance and is in the newly calculated preference list. SocheckAndReduceInstanceCapacity
is called on the instance. However,WagedInstanceCapacity
's_instanceCapacityMap
has not been updated and therefore has a stale view that does not include"new_instance"
This is because the
skipCapacityCalculation
method (a very effective optimization) causes theCurrentStateComputationStage
to not refresh the cache if there are no resources in the resourceMap. However, the resourceMap is constructed based off the idealStates in the cluster which does not exist at this point. When a resource is added, aResourceConfigChange
event is first fired. Afterwards, anIdealStateChange
will fire. In this case of a new resource being added, theCurrentStateComputationStage
will not recalculate theWagedInstanceCapacity
as the resourceMap is empty when we encounter aResourceConfigChange
and then we do not recalculate on subsequentIdealStateChange
Adding a WAGED resource to a new cluster does not trigger this NPE because there is no WagedInstanceCapacity so
will force it to be refreshed.
GrantPSpencer#32
The testcase in this draft PR will fail on master and follows the steps outlined above.
Expected behavior
WagedInstanceCapacity should be recalculated in the case of a new resource being added prior to the BestPossibleStateCalcStage.
Additional context
The text was updated successfully, but these errors were encountered: