-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce adaptive tasklist scaler #6506
Conversation
1c60d92
to
a1fe45c
Compare
a.underLoad = true | ||
a.underLoadStartTime = a.timeSource.Now() | ||
} else if a.timeSource.Now().Sub(a.underLoadStartTime) > a.config.PartitionDownscaleSustainedDuration() { | ||
numWritePartitions = getNumberOfPartitions(partitionConfig.NumWritePartitions, qps, upscaleThreshold) // NOTE: this has to be upscaleThreshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach can make some counterintuitive scaling decisions. Any time (upscaleThreshold - downscaleThreshold) * numWritePartitions > (2 * upscaleThreshold)
we're only able to scale down by multiple partitions at once.
For example, if we have thresholds of (500, 1000) with 10,000 global traffic then we would end up with 10 partitions and we would never scale down until global traffic drops below 5,000, at which point we dramatically scale from 10 partitions to 5. If traffic goes back to 5,001 then we'd scale up to 6 partitions, but we'd only scale down again if traffic drops below 3,000.
This approach also can end up in kind of strange scenarios where we're continually underLoad but we never change the number of partitions. If we have thresholds of (500, 600) with 1800 global traffic then we would have 3 partitions. When traffic drops to 1499 we're considered underLoad and would try to update numWritePartitions
but it won't actually change the value until traffic drops below 1200.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this, I think to avoid fluctuation, the thresholds need to satisfy this inequality
downscale threshold <= upscale threshold * N / (N + 1) (for all N)
Let's take your (500, 600) thresholds as an example, if the global qps is 1801, then we would have 4 partitions. But if we have 4 partitions, the load of each partition will be around 450, which is less than 500, and we will need to downscale. When the downscale operation is triggered, we have to recalculate the number of partitions based on the qps of root partition. The estimation is around 450, so it could be 451 or 449. If the estimation is larger than 450, then we still have 4 partitions, so no change. But the estimation is 449, then downscale will be triggered.
f2c4533
to
ffbd993
Compare
5af33e8
to
ef9b3c4
Compare
a.overLoad = false | ||
} | ||
} else { | ||
a.overLoad = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation requires consecutive overloaded windows to scale up. If qps drops momentarily for some reason (rate limit, qps tracker calculation issue etc.) then we will not be able to scale up.
Have you considered doing this calculation more often (every 1s instead of every 15s) and generating a series of overloaded/not-overloaded results. After a minute you would have 60 data points representing whether it was overloaded for that particular second. Then determine whether scale up is needed if it's overloaded more than half of the time?
I just thought of this idea so it may not be ideal solution but something to address consecutiveness requirement would be needed IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can discuss this offline, but I think with downscaleFactor
, we can handle fluctuation. The default factor is 0.75, which means the number of partitions won't change unless the traffic drop by 25%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need matching simulator to validate different options of scale formula. I assume you will update simulation next and iterate on this. If so this looks like a good start. Can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't fit the simulation framework because the output of simulation tests assume that the number of partitions don't change. I can run bench tests instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simulation framework can be enhanced to support this. Feedback loop is faster in simulations and also more repeatable than benchmarking in dev env so let's invest in this.
This PR introduces a new component
AdaptiveScaler
to Matching's task List Manager. This component only runs in the root partition or a Normal task list and is turned on only if the following 2 dynamic config properties are set true:This component monitors the add task QPS of the root partition of a task list and decides whether the task list need more partitions or the number of partitions need to be decreased. It's based on the assumption that the add task QPS is evenly distributed among all task list partitions.
When the adaptive scaler decides to increase the number of partitions, it increases the number of read partitions and write partitions at the same time. When it decreases the number of partitions, it decreases the number of write partitions first and the number of read partitions is decreased only after all the backlog of the read partitions are drained.
The component is configured by the following 5 dynamic config properties:
MatchingAdaptiveScalerUpdateInterval configures how often it checks the QPS.
MatchingPartitionUpscaleSustainedDuration determines the minimum duration a high load must be sustained on matching task list to trigger the operation to increase the number of partitions.
MatchingPartitionDownscaleSustainedDuration determines the minimum duration a low load must be sustained on matching task list to trigger the operation to decrease the number of partitions.
High load definition:
total QPS > MatchingPartitionUpscaleRPS * Number of Write Partitions
Low load definition:
total QPS < MatchingPartitionUpscaleRPS * (Number of Write Partitions - 1) * MatchingPartitionDownscaleFactor
Other minor changes:
nil
and we want to update the number of partitions to 1, it should be no-op.