FILTER Clauses for Aggregates #7916

atris · 2021-12-16T12:34:39Z

This PR implements support for FILTER clauses in aggregations:

SELECT SUM(COL1) FILTER(WHERE COL2 > 300), AVG(COL2) FILTER (WHERE COL2 < 50) FROM MyTable WHERE COL1 > 50;

The approach implements the swim lane design highlighted in the design document by splitting at the filter operator. The implementation gets the filter block for main predicate and each filter predicate, ANDs them together and returns a combined filter operator.

The main predicate is scanned only once and reused for all filter clauses.
The implementation allows each filter swim lane to use any available indices independently.

If two or more filter clauses have the same predicate, the result will be computed only once and fed to each of the aggregates.

https://docs.google.com/document/d/1ZM-2c0jJkbeJ61m8sJF0qj19t5UYLhnTFvIAz-HCJmk/edit?usp=sharing

Performance benchmark:

3 warm up iterations per run, 5 runs in total. Data set size -- 1.5 million documents. Apple M1 Pro, 32GB RAM

X axis represents number of iterations and Y axis represents latency in MS.

FILTER query, compared to its equivalent CASE query, is 120-140% faster on average.

GROUP BY is not supported yet and will be done in a follow up PR

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedTransformBlock.java

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedFilterBlock.java

pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java

...t-core/src/main/java/org/apache/pinot/core/operator/transform/CombinedTransformOperator.java

richardstartin

I think the code could be more concise in places and this would make the logic easier to follow.

pinot-core/src/main/java/org/apache/pinot/core/operator/SwimLaneDocIdSetOperator.java

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkFilteredAggregations.java

pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java

codecov-commenter · 2021-12-16T18:14:50Z

Codecov Report

Merging #7916 (bd19945) into master (0fe7ef8) will decrease coverage by 7.09%.
The diff coverage is 88.16%.

@@             Coverage Diff              @@
##             master    #7916      +/-   ##
============================================
- Coverage     71.24%   64.15%   -7.10%     
+ Complexity     4262     4180      -82     
============================================
  Files          1607     1602       -5     
  Lines         83409    83206     -203     
  Branches      12458    12441      -17     
============================================
- Hits          59426    53380    -6046     
- Misses        19941    25979    +6038     
+ Partials       4042     3847     -195

Flag	Coverage Δ
integration1	`28.97% <21.30%> (+<0.01%)`	⬆️
integration2	`27.63% <21.30%> (-0.04%)`	⬇️
unittests1	`67.97% <88.16%> (+0.10%)`	⬆️
unittests2	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...apache/pinot/core/operator/blocks/FilterBlock.java	`50.00% <50.00%> (-7.15%)`	⬇️
.../pinot/core/operator/docidsets/BitmapDocIdSet.java	`62.50% <60.00%> (-37.50%)`	⬇️
...t/core/operator/filter/CombinedFilterOperator.java	`70.00% <70.00%> (ø)`
...t/core/operator/docidsets/FilterBlockDocIdSet.java	`72.72% <72.72%> (ø)`
...inot/core/query/reduce/PostAggregationHandler.java	`91.86% <87.50%> (-0.45%)`	⬇️
...rg/apache/pinot/core/plan/AggregationPlanNode.java	`90.90% <88.09%> (-2.08%)`	⬇️
...re/operator/query/FilteredAggregationOperator.java	`90.00% <90.00%> (ø)`
...pinot/core/query/request/context/QueryContext.java	`97.58% <97.77%> (-0.33%)`	⬇️
...re/operator/docidsets/RangelessBitmapDocIdSet.java	`100.00% <100.00%> (ø)`
...ava/org/apache/pinot/core/plan/FilterPlanNode.java	`89.21% <100.00%> (+2.34%)`	⬆️
... and 225 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fe7ef8...bd19945. Read the comment docs.

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/ProjectionBlock.java

amrishlal · 2021-12-17T00:55:39Z

pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeDocIdSetPlanNode.java

@@ -31,16 +32,29 @@

 public class StarTreeDocIdSetPlanNode implements PlanNode {


From what I am seeing if the filterOperator is present, the existing implementation is completely overridden (new constructor and if statement in run method) to the point that the old code isn't being touched at all. I am wondering if it will be better to create a new class (for example StarTreeFilteredDocIdSetPlanNode) and doing that will also avoid the null checks?

This class is pretty brief, and I would honestly avoid adding new plan nodes unless there is a major functionality that is different.

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

pinot-core/src/main/java/org/apache/pinot/core/operator/SwimLaneDocIdSetOperator.java

...ain/java/org/apache/pinot/core/query/aggregation/function/FilterableAggregationFunction.java

MrNeocore · 2021-12-17T19:14:13Z

Thanks you @atris !

atris · 2021-12-24T07:18:52Z

@Jackie-Jiang is working on a parallel implementation, so closing this PR to avoid conflict

atris · 2022-01-04T07:12:37Z

Discussed with @Jackie-Jiang and we will be convening on this PR itself, so reviving it. Sorry for the confusion.

Jackie-Jiang

Overall logic looks good. Let's try to further clean up the code

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/FilterBlock.java

pinot-core/src/main/java/org/apache/pinot/core/operator/ProjectionOperator.java

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java

...re/src/test/java/org/apache/pinot/queries/InnerSegmentAggregationSingleValueQueriesTest.java

...ore/src/test/java/org/apache/pinot/queries/InterSegmentAggregationMultiValueQueriesTest.java

.../apache/pinot/core/query/request/context/utils/BrokerRequestToQueryContextConverterTest.java

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java

pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java

Jackie-Jiang

Mostly good. Please reformat the changes using the Pinot Style (you might want to checkout master to import the latest checkstyle settings)

pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeProjectionPlanNode.java

Jackie-Jiang · 2022-01-20T06:12:08Z

pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeTransformPlanNode.java

@@ -57,6 +57,7 @@ public StarTreeTransformPlanNode(StarTreeV2 starTreeV2,
      _groupByExpressions = Collections.emptyList();
      groupByColumns = null;
    }
+


Let's revert this file since it is not relevant

Jackie-Jiang · 2022-01-20T06:13:27Z

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

+    boolean hasFilteredPredicates = _queryContext.isHasFilteredAggregations();
+    BaseOperator<IntermediateResultsBlock> aggOperator;
+    if (hasFilteredPredicates) {
+      aggOperator = buildFilteredAggOperator();


(minor) this part can be more concise by directly return instead of putting the operator in an local variable

Jackie-Jiang · 2022-01-20T06:14:03Z

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

+   * @param numTotalDocs Number of total docs
+   */
+  private BaseOperator<IntermediateResultsBlock> buildOperatorForFilteredAggregations(
+      BaseFilterOperator mainPredicateFilterOperator,


The format still doesn't align with the pinot style

Jackie-Jiang · 2022-01-20T06:16:33Z

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

+  /**
+   * Builds the operator to be used for non filtered aggregations
+   */
+  private BaseOperator<IntermediateResultsBlock> buildNonFilteredAggOperator() {


What I meant is to move the current code into this method, and implement buildFilteredAggOperator() separately. The reason being:

The metadata/dictionary based operator and star-tree does not apply to the filtered aggregation

Sharing buildOperators() method can bring extra overhead to non-filtered aggregations

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

Jackie-Jiang · 2022-01-20T06:24:31Z

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

@@ -119,11 +122,13 @@ private QueryContext(String tableName, List<ExpressionContext> selectExpressions
      @Nullable FilterContext filter, @Nullable List<ExpressionContext> groupByExpressions,
      @Nullable FilterContext havingFilter, @Nullable List<OrderByExpressionContext> orderByExpressions, int limit,
      int offset, Map<String, String> queryOptions, @Nullable Map<String, String> debugOptions,
-      BrokerRequest brokerRequest) {
+      BrokerRequest brokerRequest, boolean hasFilteredAggregations,


hasFilteredAggregations should not be set through the constructor. It is updated in generateAggregationFunctions()

Good catch, fixed.

Jackie-Jiang · 2022-01-20T06:25:06Z

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

@@ -350,6 +381,7 @@ public String toString() {
    private List<ExpressionContext> _selectExpressions;
    private List<String> _aliasList;
    private FilterContext _filter;
+    private ExpressionContext _filterExpression;


The change in the Builder is not necessary

Jackie-Jiang · 2022-01-20T06:25:23Z

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

@@ -441,76 +480,106 @@ public QueryContext build() {
     */
    private void generateAggregationFunctions(QueryContext queryContext) {
      List<AggregationFunction> aggregationFunctions = new ArrayList<>();
-      List<Pair<AggregationFunction, FilterContext>> filteredAggregationFunctions = new ArrayList<>();
+      List<Pair<FilterContext, AggregationFunction>> aggregationFunctionsWithMetadata = new ArrayList<>();


Please rename the variable

Jackie-Jiang · 2022-01-20T06:26:21Z

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java


      // Add aggregation functions in the SELECT clause
      // NOTE: DO NOT deduplicate the aggregation functions in the SELECT clause because that involves protocol change.
-      List<FunctionContext> aggregationsInSelect = new ArrayList<>();
-      List<Pair<FunctionContext, FilterContext>> filteredAggregations = new ArrayList<>();
+      List<Pair<Pair<FilterContext, ExpressionContext>, FunctionContext>> aggregationsInSelect = new ArrayList<>();


I don't think we need to keep ExpressionContext here. List<FunctionContext, FilterContext> should be enough for the following computations

amrishlal · 2022-01-20T07:35:41Z

pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java

+@SuppressWarnings("rawtypes")
+public class FilteredAggregationOperator extends BaseOperator<IntermediateResultsBlock> {
+  private static final String OPERATOR_NAME = "FilteredAggregationOperator";
+  private static final String EXPLAIN_NAME = "FILTERED_AGGREGATE";


To match the naming pattern in AggregateGroupByOperator (and other Aggregation*Operator classes), the EXPLAIN_NAME should be set to AGGREGATE_FILTERED.

amrishlal · 2022-01-20T07:37:44Z

pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java

+
+  @Override
+  public List<Operator> getChildOperators() {
+    return _aggFunctionsWithTransformOperator.stream().map(Pair::getRight).collect(Collectors.toList());


Unless this has recently changed, I think stream api usage is not consistent with Pinot coding convention.

I haven't seen a coding convention mentioning the same, yet. Is this documented somewhere?

I'm unaware of such a convention and have seen plenty of code using the streams API for performance non-critical operations (like this one) recently.

@Jackie-Jiang Can you please clarify if stream api usage applies?

We should avoid using stream api for performance critical operations. This one is at query path, but might not be that performance critical (only called once). Using regular api could give slightly better performance, but IMO both way is okay

I agree with @richardstartin -- we only need to worry about streams when the code is invoked in a tight loop for multiple iterations -- none of which is applicable in this specific case

…er_split

Jackie-Jiang

LGTM. Only minor and code format comments. Please apply the pinot format and auto-reformat the changed files

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

Jackie-Jiang · 2022-01-28T21:39:24Z

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

+   * @param numTotalDocs Number of total docs
+   */
+  private BaseOperator<IntermediateResultsBlock> buildOperatorForFilteredAggregations(
+      BaseFilterOperator mainPredicateFilterOperator,


(code format) Please apply the pinot code format and use it to auto-reformat this file. Several changes do not comply with the format

pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java

Jackie-Jiang · 2022-03-10T00:57:15Z

@atris This is a great feature. Could you please add some release notes to the PR description which we can refer to when cutting the next release?

MrNeocore · 2022-04-07T14:12:18Z

Is IN_SUBQUERY inside FILTER supported ?

For example:

SELECT SUM(value) FILTER(WHERE IN_SUBQUERY(entityId, 'SELECT ID_SET(entityId) FROM other_table WHERE cond = <thing>') = 1) FROM table

I can't seem to make it work, but maybe I'm mistyping something
-> Unsupported function: insubquery not found

Jackie-Jiang · 2022-04-08T18:52:44Z

No it is not supported. For the example you give, SELECT SUM(value) WHERE IN_SUBQUERY(entityId, 'SELECT ID_SET(entityId) FROM other_table WHERE cond = <thing>') = 1 FROM table should work

MrNeocore · 2022-04-09T15:24:09Z

Thanks for the confirmation @Jackie-Jiang

Yes that's what we're using right now, but we've got a use case where we may have hundreds of sub query entities, which are currently translated into hundreds of Pinot queries so I was looking for a way to improve that :)

atris · 2022-04-09T15:53:12Z

Let me see if I can take a crack on this next week

…

On Sat, 9 Apr 2022, 20:54 MrNeocore, ***@***.***> wrote: Thanks for the confirmation @Jackie-Jiang <https://github.com/Jackie-Jiang> Yes that's what we're using right now, but we've got a use case where we may have hundreds of sub sub query entities, which are currently translated into hundreds of Pinot queries so I was looking for a way to improve that :) — Reply to this email directly, view it on GitHub <#7916 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANE5Y2RJWTAR3BY2G4S7S3VEGOKHANCNFSM5KGGHAHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

MrNeocore · 2022-04-10T19:47:23Z

Thanks for giving it a shot @atris

kishoreg · 2022-11-23T08:12:09Z

@atris did we add this to our docs?

atris mentioned this pull request Dec 16, 2021

Implement Multiple Predicate Execution (FILTER Clauses) #7830

Closed

atris requested review from Jackie-Jiang, mayankshriv and siddharthteotia and removed request for Jackie-Jiang December 16, 2021 12:36