-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FILTER Clauses for Aggregates #7916
Conversation
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedTransformBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedTransformBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/CombinedFilterBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
...t-core/src/main/java/org/apache/pinot/core/operator/transform/CombinedTransformOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code could be more concise in places and this would make the logic easier to follow.
pinot-core/src/main/java/org/apache/pinot/core/operator/SwimLaneDocIdSetOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkFilteredAggregations.java
Outdated
Show resolved
Hide resolved
pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkFilteredAggregations.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #7916 +/- ##
============================================
- Coverage 71.24% 64.15% -7.10%
+ Complexity 4262 4180 -82
============================================
Files 1607 1602 -5
Lines 83409 83206 -203
Branches 12458 12441 -17
============================================
- Hits 59426 53380 -6046
- Misses 19941 25979 +6038
+ Partials 4042 3847 -195
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/ProjectionBlock.java
Outdated
Show resolved
Hide resolved
@@ -31,16 +32,29 @@ | |||
|
|||
public class StarTreeDocIdSetPlanNode implements PlanNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I am seeing if the filterOperator
is present, the existing implementation is completely overridden (new constructor and if statement in run
method) to the point that the old code isn't being touched at all. I am wondering if it will be better to create a new class (for example StarTreeFilteredDocIdSetPlanNode
) and doing that will also avoid the null checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is pretty brief, and I would honestly avoid adding new plan nodes unless there is a major functionality that is different.
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/SwimLaneDocIdSetOperator.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/core/query/aggregation/function/FilterableAggregationFunction.java
Outdated
Show resolved
Hide resolved
Thanks you @atris ! |
@Jackie-Jiang is working on a parallel implementation, so closing this PR to avoid conflict |
Discussed with @Jackie-Jiang and we will be convening on this PR itself, so reviving it. Sorry for the confusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall logic looks good. Let's try to further clean up the code
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/FilterBlock.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/ProjectionOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationGroupByOrderByPlanNode.java
Outdated
Show resolved
Hide resolved
...re/src/test/java/org/apache/pinot/queries/InnerSegmentAggregationSingleValueQueriesTest.java
Outdated
Show resolved
Hide resolved
...ore/src/test/java/org/apache/pinot/queries/InterSegmentAggregationMultiValueQueriesTest.java
Show resolved
Hide resolved
.../apache/pinot/core/query/request/context/utils/BrokerRequestToQueryContextConverterTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/CombinedFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredAggregationOperator.java
Outdated
Show resolved
Hide resolved
6150f87
to
7d85659
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good. Please reformat the changes using the Pinot Style (you might want to checkout master to import the latest checkstyle settings)
pinot-core/src/main/java/org/apache/pinot/core/startree/plan/StarTreeProjectionPlanNode.java
Outdated
Show resolved
Hide resolved
@@ -57,6 +57,7 @@ public StarTreeTransformPlanNode(StarTreeV2 starTreeV2, | |||
_groupByExpressions = Collections.emptyList(); | |||
groupByColumns = null; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revert this file since it is not relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
boolean hasFilteredPredicates = _queryContext.isHasFilteredAggregations(); | ||
BaseOperator<IntermediateResultsBlock> aggOperator; | ||
if (hasFilteredPredicates) { | ||
aggOperator = buildFilteredAggOperator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) this part can be more concise by directly return instead of putting the operator in an local variable
* @param numTotalDocs Number of total docs | ||
*/ | ||
private BaseOperator<IntermediateResultsBlock> buildOperatorForFilteredAggregations( | ||
BaseFilterOperator mainPredicateFilterOperator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format still doesn't align with the pinot style
/** | ||
* Builds the operator to be used for non filtered aggregations | ||
*/ | ||
private BaseOperator<IntermediateResultsBlock> buildNonFilteredAggOperator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is to move the current code into this method, and implement buildFilteredAggOperator()
separately. The reason being:
- The metadata/dictionary based operator and star-tree does not apply to the filtered aggregation
- Sharing
buildOperators()
method can bring extra overhead to non-filtered aggregations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
@@ -119,11 +122,13 @@ private QueryContext(String tableName, List<ExpressionContext> selectExpressions | |||
@Nullable FilterContext filter, @Nullable List<ExpressionContext> groupByExpressions, | |||
@Nullable FilterContext havingFilter, @Nullable List<OrderByExpressionContext> orderByExpressions, int limit, | |||
int offset, Map<String, String> queryOptions, @Nullable Map<String, String> debugOptions, | |||
BrokerRequest brokerRequest) { | |||
BrokerRequest brokerRequest, boolean hasFilteredAggregations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasFilteredAggregations
should not be set through the constructor. It is updated in generateAggregationFunctions()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
@@ -350,6 +381,7 @@ public String toString() { | |||
private List<ExpressionContext> _selectExpressions; | |||
private List<String> _aliasList; | |||
private FilterContext _filter; | |||
private ExpressionContext _filterExpression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in the Builder
is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -441,76 +480,106 @@ public QueryContext build() { | |||
*/ | |||
private void generateAggregationFunctions(QueryContext queryContext) { | |||
List<AggregationFunction> aggregationFunctions = new ArrayList<>(); | |||
List<Pair<AggregationFunction, FilterContext>> filteredAggregationFunctions = new ArrayList<>(); | |||
List<Pair<FilterContext, AggregationFunction>> aggregationFunctionsWithMetadata = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename the variable
|
||
// Add aggregation functions in the SELECT clause | ||
// NOTE: DO NOT deduplicate the aggregation functions in the SELECT clause because that involves protocol change. | ||
List<FunctionContext> aggregationsInSelect = new ArrayList<>(); | ||
List<Pair<FunctionContext, FilterContext>> filteredAggregations = new ArrayList<>(); | ||
List<Pair<Pair<FilterContext, ExpressionContext>, FunctionContext>> aggregationsInSelect = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to keep ExpressionContext
here. List<FunctionContext, FilterContext>
should be enough for the following computations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@SuppressWarnings("rawtypes") | ||
public class FilteredAggregationOperator extends BaseOperator<IntermediateResultsBlock> { | ||
private static final String OPERATOR_NAME = "FilteredAggregationOperator"; | ||
private static final String EXPLAIN_NAME = "FILTERED_AGGREGATE"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the naming pattern in AggregateGroupByOperator (and other Aggregation*Operator classes), the EXPLAIN_NAME should be set to AGGREGATE_FILTERED.
|
||
@Override | ||
public List<Operator> getChildOperators() { | ||
return _aggFunctionsWithTransformOperator.stream().map(Pair::getRight).collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this has recently changed, I think stream api usage is not consistent with Pinot coding convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen a coding convention mentioning the same, yet. Is this documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unaware of such a convention and have seen plenty of code using the streams API for performance non-critical operations (like this one) recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang Can you please clarify if stream api usage applies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid using stream api for performance critical operations. This one is at query path, but might not be that performance critical (only called once). Using regular api could give slightly better performance, but IMO both way is okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @richardstartin -- we only need to worry about streams when the code is invoked in a tight loop for multiple iterations -- none of which is applicable in this specific case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only minor and code format comments. Please apply the pinot format and auto-reformat the changed files
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
* @param numTotalDocs Number of total docs | ||
*/ | ||
private BaseOperator<IntermediateResultsBlock> buildOperatorForFilteredAggregations( | ||
BaseFilterOperator mainPredicateFilterOperator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(code format) Please apply the pinot code format and use it to auto-reformat this file. Several changes do not comply with the format
pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/request/context/QueryContext.java
Outdated
Show resolved
Hide resolved
@atris This is a great feature. Could you please add some release notes to the PR description which we can refer to when cutting the next release? |
Is For example:
I can't seem to make it work, but maybe I'm mistyping something |
No it is not supported. For the example you give, |
Thanks for the confirmation @Jackie-Jiang Yes that's what we're using right now, but we've got a use case where we may have hundreds of sub query entities, which are currently translated into hundreds of Pinot queries so I was looking for a way to improve that :) |
Let me see if I can take a crack on this next week
…On Sat, 9 Apr 2022, 20:54 MrNeocore, ***@***.***> wrote:
Thanks for the confirmation @Jackie-Jiang
<https://github.com/Jackie-Jiang>
Yes that's what we're using right now, but we've got a use case where we
may have hundreds of sub sub query entities, which are currently translated
into hundreds of Pinot queries so I was looking for a way to improve that :)
—
Reply to this email directly, view it on GitHub
<#7916 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANE5Y2RJWTAR3BY2G4S7S3VEGOKHANCNFSM5KGGHAHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks for giving it a shot @atris |
@atris did we add this to our docs? |
This PR implements support for FILTER clauses in aggregations:
SELECT SUM(COL1) FILTER(WHERE COL2 > 300), AVG(COL2) FILTER (WHERE COL2 < 50) FROM MyTable WHERE COL1 > 50;
The approach implements the swim lane design highlighted in the design document by splitting at the filter operator. The implementation gets the filter block for main predicate and each filter predicate, ANDs them together and returns a combined filter operator.
The main predicate is scanned only once and reused for all filter clauses.
The implementation allows each filter swim lane to use any available indices independently.
If two or more filter clauses have the same predicate, the result will be computed only once and fed to each of the aggregates.
https://docs.google.com/document/d/1ZM-2c0jJkbeJ61m8sJF0qj19t5UYLhnTFvIAz-HCJmk/edit?usp=sharing
Performance benchmark:
3 warm up iterations per run, 5 runs in total. Data set size -- 1.5 million documents. Apple M1 Pro, 32GB RAM
X axis represents number of iterations and Y axis represents latency in MS.
FILTER query, compared to its equivalent CASE query, is 120-140% faster on average.
GROUP BY is not supported yet and will be done in a follow up PR