-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue 7519] Adds support for multiple filtered/unfiltered aggregations with GROUP BY #10000
[Issue 7519] Adds support for multiple filtered/unfiltered aggregations with GROUP BY #10000
Conversation
Will work on the checkstyle violations. Has there ever been consideration of using Spotless[1] which can both check as well as autofix linting issues? I think pinot-style specifically is not supported, but perhaps support could be added? Auto-fixing would be a huge win. |
db20331
to
67adb07
Compare
a7e6143
to
0327312
Compare
Codecov Report
@@ Coverage Diff @@
## master #10000 +/- ##
=============================================
+ Coverage 15.86% 70.40% +54.53%
- Complexity 176 5693 +5517
=============================================
Files 1931 1996 +65
Lines 104306 107719 +3413
Branches 15901 16376 +475
=============================================
+ Hits 16551 75842 +59291
+ Misses 86531 26577 -59954
- Partials 1224 5300 +4076
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@Jackie-Jiang this is ready for review 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well done! Mostly minor comments
private GroupKeyGenerator _groupKeyGenerator = null; | ||
|
||
public FilteredGroupByOperator( | ||
@Nullable AggregationFunction[] aggregationFunctions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be null. Actually if it is null, line 83 will throw NPE
private TableResizer _tableResizer; | ||
private GroupKeyGenerator _groupKeyGenerator = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) These 2 member variables can be converted to local
|
||
// Perform aggregation group-by on all the blocks | ||
DefaultGroupByExecutor groupByExecutor; | ||
if (_groupKeyGenerator == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very smart to share the same group-key generator. Let's add some comments about why we need to do so
@@ -130,7 +129,7 @@ void testGroupByTrim(QueryContext queryContext, int minSegmentGroupTrimSize, int | |||
// Extract the execution result | |||
List<Pair<Double, Double>> extractedResult = extractTestResult(resultsBlock.getTable()); | |||
|
|||
assertEquals(extractedResult, expectedResult); | |||
Assert.assertEquals(extractedResult, expectedResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) We usually use static import for Assert
in test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had changed this due to checkstyle violations:
[WARNING] (imports) AvoidStaticImport: Using a static member import should be avoided
I can revert back so long as checkstyle still passes
String nonFilterQuery = | ||
"SELECT SUM(INT_COL), SUM(CASE WHEN INT_COL > 25000 THEN INT_COL ELSE 0 END) AS total_sum FROM MyTable GROUP " | ||
+ "BY INT_COL"; | ||
testQuery(filterQuery, nonFilterQuery); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some more tests to guarantee that it works as expected
return nonFilteredGroupByPlan(); | ||
} | ||
|
||
private FilteredGroupByOperator filteredGroupByPlan() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Rename to buildFilteredGroupByPlan()
and buildNonFilteredGroupByPlan()
public static Set<ExpressionContext> collectExpressionsToTransform(AggregationFunction[] aggregationFunctions, | ||
@Nullable ExpressionContext[] groupByExpressions) { | ||
public static Set<ExpressionContext> collectExpressionsToTransform( | ||
@Nullable AggregationFunction[] aggregationFunctions, @Nullable ExpressionContext[] groupByExpressions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregationFunctions
should always be non-null
protected GroupKeyGenerator _groupKeyGenerator; | ||
protected GroupByResultHolder[] _groupByResultHolders; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 variables can still be final
|
||
public DefaultGroupByExecutor(QueryContext queryContext, ExpressionContext[] groupByExpressions, | ||
TransformOperator transformOperator) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Remove empty line
_aggregationFunctions = queryContext.getAggregationFunctions(); | ||
public DefaultGroupByExecutor(QueryContext queryContext, AggregationFunction[] aggregationFunctions, | ||
ExpressionContext[] groupByExpressions, TransformOperator transformOperator, | ||
GroupKeyGenerator groupKeyGenerator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotate groupKeyGenerator
as Nullable
// Perform aggregation group-by on all the blocks | ||
DefaultGroupByExecutor groupByExecutor; | ||
if (groupKeyGenerator == null) { | ||
// The group key generator should be shared across all AggregationFunctions so that agg results can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang please let me know where I may have misstated the reasoning
@Jackie-Jiang added more tests in aeebef0, let me know if there are other specific cases to cover. This PR is ready for re-review |
Oops, somehow clicking |
@Jackie-Jiang the integration test failure looks unrelated to me as the same tests are failing in the same way on other PRs / the failures don't seem to point to the changed areas of code. Can you confirm? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @egalpin for your contribution. Please add docs when you get a chance if its not done already |
Will do @kishoreg 👍 |
@egalpin : can you update documentation as well? https://docs.pinot.apache.org/users/user-guide-query/query-syntax/supported-aggregations#filter-clause-in-aggregation This still says the following:
|
Thanks! pinot-contrib/pinot-docs#250 |
Following @atris's awesome work[1] to support FILTER expressions[2][3], this PR aims to introduce support for FILTER expressions simultaneously with GROUP BY. The same swim-lane pattern established in the initial FILTER expression PR[1] is re-used/shared here by groupBy processing.
Fixes #7519
[1] #7916
[2] #7519
[3] https://docs.google.com/document/d/1ZM-2c0jJkbeJ61m8sJF0qj19t5UYLhnTFvIAz-HCJmk