[Concurrent Segment Search] Perform buildAggregation in parallel #11673
Labels
enhancement
Enhancement or improvement to existing feature or request
Search:Performance
v2.14.0
v3.0.0
Issues and PRs related to version 3.0.0
Is your feature request related to a problem? Please describe
In the current concurrent search paradigm we will create
slice_count
times the number of collectors compared to non-concurrent search and for these collectorscollect
will be called in concurrently. Afterwards,reduce
is called on thesearch
threadpool sequentially for all of these collectors.There are 2 main scenarios where this sequential behavior can be problematic -- [1] whenever the aggregators are nested and
buildAggregation
needs to BFS/DFS through the collector tree and [2] whenbuildAggregation
itself is an expensive operation for the given collector.One such example of this is the
keyword-terms-numeric-terms
, which is a nested terms aggregation. Even in the non-concurrent search case nested terms aggregations will suffer from combinatorial explosion of buckets for each additional nested layer and for concurrent search this problem is essentially multiplied byslice_count
as the bucket creation duringbuildAggregation
is done sequentially. The combinatorial explosion was partially addressed by #11585 however the sequential work is still a bottleneck.Here is some basic query profiler breakdown to further illustrate this point:
We can see that for the
NumericTermsAggregator
build_aggregation
is taking 3-4x as long and happening sequentially as well as the combinatorial explosion in thecollect_count
.Describe the solution you'd like
The additional combinatorial explosion is due to
slice_size
>shard_size
, which is something we can revisit (ie doesslice_size
really need to be1.5*shard_size + 10
or canslice_size == shard_size
?). However the long pole is actually thebuild_aggregation
as a whole taking a long time as happening sequentially. I propose that we move thebuildAggregation
steps toprocessPostCollection
so that it can happen in parallel on theindex_searcher
thread.I will follow-up with a PR with this change as well as some benchmarking data to further discuss.
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: