Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 7519] Adds support for multiple filtered/unfiltered aggregations with GROUP BY #10000

Merged
merged 14 commits into from
Jan 5, 2023

Conversation

egalpin
Copy link
Member

@egalpin egalpin commented Dec 17, 2022

Following @atris's awesome work[1] to support FILTER expressions[2][3], this PR aims to introduce support for FILTER expressions simultaneously with GROUP BY. The same swim-lane pattern established in the initial FILTER expression PR[1] is re-used/shared here by groupBy processing.

Screen Shot 2022-12-16 at 17 19 04

Fixes #7519

[1] #7916
[2] #7519
[3] https://docs.google.com/document/d/1ZM-2c0jJkbeJ61m8sJF0qj19t5UYLhnTFvIAz-HCJmk

@egalpin
Copy link
Member Author

egalpin commented Dec 17, 2022

Will work on the checkstyle violations. Has there ever been consideration of using Spotless[1] which can both check as well as autofix linting issues? I think pinot-style specifically is not supported, but perhaps support could be added? Auto-fixing would be a huge win.

[1] https://github.com/diffplug/spotless

@egalpin egalpin force-pushed the egalpin/filter-with-groupby branch from db20331 to 67adb07 Compare December 17, 2022 01:47
@egalpin egalpin force-pushed the egalpin/filter-with-groupby branch from a7e6143 to 0327312 Compare December 19, 2022 17:44
@codecov-commenter
Copy link

codecov-commenter commented Dec 19, 2022

Codecov Report

Merging #10000 (751dc18) into master (d3ea8dc) will increase coverage by 54.53%.
The diff coverage is 82.87%.

@@              Coverage Diff              @@
##             master   #10000       +/-   ##
=============================================
+ Coverage     15.86%   70.40%   +54.53%     
- Complexity      176     5693     +5517     
=============================================
  Files          1931     1996       +65     
  Lines        104306   107719     +3413     
  Branches      15901    16376      +475     
=============================================
+ Hits          16551    75842    +59291     
+ Misses        86531    26577    -59954     
- Partials       1224     5300     +4076     
Flag Coverage Δ
integration1 25.10% <4.97%> (?)
integration2 24.17% <4.97%> (?)
unittests1 67.91% <82.87%> (?)
unittests2 13.55% <0.00%> (-2.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../apache/pinot/common/exception/QueryException.java 94.44% <ø> (+94.44%) ⬆️
...pinot/core/query/request/context/QueryContext.java 99.06% <ø> (+99.06%) ⬆️
...va/org/apache/pinot/query/runtime/QueryRunner.java 84.26% <ø> (+0.17%) ⬆️
...t/core/operator/query/FilteredGroupByOperator.java 70.00% <70.00%> (ø)
...inot/segment/local/customobject/VarianceTuple.java 64.28% <75.00%> (+64.28%) ⬆️
...aggregation/function/AggregationFunctionUtils.java 76.66% <86.04%> (+76.66%) ⬆️
.../apache/pinot/common/datablock/DataBlockUtils.java 90.43% <100.00%> (+90.43%) ⬆️
...rg/apache/pinot/core/plan/AggregationPlanNode.java 90.78% <100.00%> (+90.78%) ⬆️
...va/org/apache/pinot/core/plan/GroupByPlanNode.java 89.58% <100.00%> (+89.58%) ⬆️
...ry/aggregation/groupby/DefaultGroupByExecutor.java 96.92% <100.00%> (+96.92%) ⬆️
... and 1542 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@egalpin
Copy link
Member Author

egalpin commented Dec 19, 2022

@Jackie-Jiang this is ready for review 😊

@egalpin egalpin changed the title [Issue 7519] Adds support for multiple filtered/unfiltered with GROUP BY [Issue 7519] Adds support for multiple filtered/unfiltered aggregate with GROUP BY Dec 20, 2022
@egalpin egalpin changed the title [Issue 7519] Adds support for multiple filtered/unfiltered aggregate with GROUP BY [Issue 7519] Adds support for multiple filtered/unfiltered aggregations with GROUP BY Dec 20, 2022
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well done! Mostly minor comments

private GroupKeyGenerator _groupKeyGenerator = null;

public FilteredGroupByOperator(
@Nullable AggregationFunction[] aggregationFunctions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be null. Actually if it is null, line 83 will throw NPE

Comment on lines 65 to 66
private TableResizer _tableResizer;
private GroupKeyGenerator _groupKeyGenerator = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) These 2 member variables can be converted to local


// Perform aggregation group-by on all the blocks
DefaultGroupByExecutor groupByExecutor;
if (_groupKeyGenerator == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very smart to share the same group-key generator. Let's add some comments about why we need to do so

@@ -130,7 +129,7 @@ void testGroupByTrim(QueryContext queryContext, int minSegmentGroupTrimSize, int
// Extract the execution result
List<Pair<Double, Double>> extractedResult = extractTestResult(resultsBlock.getTable());

assertEquals(extractedResult, expectedResult);
Assert.assertEquals(extractedResult, expectedResult);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) We usually use static import for Assert in test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had changed this due to checkstyle violations:

[WARNING] (imports) AvoidStaticImport: Using a static member import should be avoided

I can revert back so long as checkstyle still passes

String nonFilterQuery =
"SELECT SUM(INT_COL), SUM(CASE WHEN INT_COL > 25000 THEN INT_COL ELSE 0 END) AS total_sum FROM MyTable GROUP "
+ "BY INT_COL";
testQuery(filterQuery, nonFilterQuery);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some more tests to guarantee that it works as expected

return nonFilteredGroupByPlan();
}

private FilteredGroupByOperator filteredGroupByPlan() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Rename to buildFilteredGroupByPlan() and buildNonFilteredGroupByPlan()

public static Set<ExpressionContext> collectExpressionsToTransform(AggregationFunction[] aggregationFunctions,
@Nullable ExpressionContext[] groupByExpressions) {
public static Set<ExpressionContext> collectExpressionsToTransform(
@Nullable AggregationFunction[] aggregationFunctions, @Nullable ExpressionContext[] groupByExpressions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregationFunctions should always be non-null

Comment on lines 58 to 59
protected GroupKeyGenerator _groupKeyGenerator;
protected GroupByResultHolder[] _groupByResultHolders;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 variables can still be final


public DefaultGroupByExecutor(QueryContext queryContext, ExpressionContext[] groupByExpressions,
TransformOperator transformOperator) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Remove empty line

_aggregationFunctions = queryContext.getAggregationFunctions();
public DefaultGroupByExecutor(QueryContext queryContext, AggregationFunction[] aggregationFunctions,
ExpressionContext[] groupByExpressions, TransformOperator transformOperator,
GroupKeyGenerator groupKeyGenerator) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotate groupKeyGenerator as Nullable

// Perform aggregation group-by on all the blocks
DefaultGroupByExecutor groupByExecutor;
if (groupKeyGenerator == null) {
// The group key generator should be shared across all AggregationFunctions so that agg results can be
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang please let me know where I may have misstated the reasoning

@egalpin
Copy link
Member Author

egalpin commented Jan 3, 2023

@Jackie-Jiang added more tests in aeebef0, let me know if there are other specific cases to cover.

This PR is ready for re-review

@egalpin egalpin requested review from Jackie-Jiang and removed request for mayankshriv January 3, 2023 23:33
@egalpin
Copy link
Member Author

egalpin commented Jan 3, 2023

Oops, somehow clicking request re-review from Jackie also removed Mayank? Removal was unintentional

@egalpin
Copy link
Member Author

egalpin commented Jan 4, 2023

@Jackie-Jiang the integration test failure looks unrelated to me as the same tests are failing in the same way on other PRs / the failures don't seem to point to the changed areas of code. Can you confirm?

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Jackie-Jiang Jackie-Jiang merged commit 303b1a7 into apache:master Jan 5, 2023
@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Jan 5, 2023
@kishoreg
Copy link
Member

kishoreg commented Jan 5, 2023

Thanks @egalpin for your contribution. Please add docs when you get a chance if its not done already

@egalpin
Copy link
Member Author

egalpin commented Jan 5, 2023

Will do @kishoreg 👍

@ankitsultana
Copy link
Contributor

@egalpin : can you update documentation as well? https://docs.pinot.apache.org/users/user-guide-query/query-syntax/supported-aggregations#filter-clause-in-aggregation

This still says the following:

NOTE: The FILTER clause is currently supported for aggregation-only queries, i.e., GROUP BY
is not supported.

@egalpin
Copy link
Member Author

egalpin commented Oct 13, 2023

Thanks! pinot-contrib/pinot-docs#250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FILTER Clause Support
5 participants