Fixes filtered agg result column naming and filtered agg order-by compat #10092

egalpin · 2023-01-10T19:05:59Z

This PR follows up on #7916 #10000 (relates to #7519). This PR fixes a bug with Filtered Aggregations where the result column name was previously unclear, and ordering by filtered aggregations previously resulted in NPE. Both are patched via this PR.

Current behaviour:
SELECT max(ArrDelay) filter (where DaysSinceEpoch >= 16090) would have a result column named:

max(ArrDelay)

After this PR:
SELECT max(ArrDelay) filter (where DaysSinceEpoch >= 16090) would have a result column named:

max(ArrDelay) filter (where DaysSinceEpoch >= 16090)

Examples:
Filtered agg name correction without grouping/ordering:

Filtered agg name correction with grouping/ordering:

…y expression

egalpin · 2023-01-10T19:06:14Z

cc @Jackie-Jiang

egalpin · 2023-01-10T22:02:54Z

...ore/src/main/java/org/apache/pinot/core/operator/blocks/results/AggregationResultsBlock.java

@@ -69,7 +84,14 @@ public DataSchema getDataSchema(QueryContext queryContext) {
    ColumnDataType[] columnDataTypes = new ColumnDataType[numColumns];
    for (int i = 0; i < numColumns; i++) {
      AggregationFunction aggregationFunction = _aggregationFunctions[i];
-      columnNames[i] = aggregationFunction.getColumnName();
+      String columnName = aggregationFunction.getResultColumnName();


I feel this is a very important change to call out, as I'm likely lacking context to understand all the implications. I know that at least one unit test failed initially due to this change where the resulting columns were named count_star previously, vs count(*) now. This may be an undesirable change.

I checked and we are not using the column name returned from the server in AggregationDataTableReducer. You may verify that by reverting the changes in this class and see if all the new tests still pass. If that is the case, we can also consider dropping the changes in this class to simplify the changes.

I had mistakenly assumed that this change was needed to ensure that non-group-by filtered aggs had their result column names accurately reflected. That appears to not be the case having confirmed that tests still pass after removing these changes.

Is it at all problematic that this section of code would be "out of sync" with other sections of code as it pertains to column naming?

egalpin · 2023-01-10T22:45:50Z

I ran the previously failing tests locally, and they pass. Are the tests in org.apache.pinot.segment.local.segment.index.loader.ForwardIndexHandlerTest known to be flaky?

…est and InterSegmentAggregationMultiValueRawQueriesTest

codecov-commenter · 2023-01-11T00:20:14Z

Codecov Report

Merging #10092 (04ee3d5) into master (04d09e5) will increase coverage by 40.74%.
The diff coverage is 97.72%.

@@              Coverage Diff              @@
##             master   #10092       +/-   ##
=============================================
+ Coverage     27.87%   68.62%   +40.74%     
- Complexity       53     5737     +5684     
=============================================
  Files          1979     1995       +16     
  Lines        107281   108088      +807     
  Branches      16323    16424      +101     
=============================================
+ Hits          29909    74176    +44267     
+ Misses        74417    28649    -45768     
- Partials       2955     5263     +2308

Flag	Coverage Δ
integration1	`24.65% <50.00%> (+0.01%)`	⬆️
integration2	`?`
unittests1	`68.02% <97.72%> (?)`
unittests2	`13.71% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...org/apache/pinot/core/data/table/TableResizer.java	`89.71% <92.85%> (+22.71%)`	⬆️
...ore/operator/blocks/results/ResultsBlockUtils.java	`91.52% <100.00%> (+2.63%)`	⬆️
...t/core/operator/query/FilteredGroupByOperator.java	`71.08% <100.00%> (+71.08%)`	⬆️
...va/org/apache/pinot/core/plan/GroupByPlanNode.java	`89.79% <100.00%> (+29.37%)`	⬆️
...aggregation/function/AggregationFunctionUtils.java	`75.53% <100.00%> (+39.97%)`	⬆️
...core/query/reduce/AggregationDataTableReducer.java	`68.42% <100.00%> (-1.58%)`	⬇️
...pinot/core/data/manager/realtime/TimerService.java	`0.00% <0.00%> (-100.00%)`	⬇️
...t/core/plan/StreamingInstanceResponsePlanNode.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ore/operator/streaming/StreamingResponseUtils.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ager/realtime/PeerSchemeSplitSegmentCommitter.java	`0.00% <0.00%> (-100.00%)`	⬇️
... and 1478 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jackie-Jiang

Mostly good. Thanks for adding the tests!

pinot-common/src/main/java/org/apache/pinot/common/request/context/FilterContext.java

pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java

Jackie-Jiang · 2023-01-11T23:31:56Z

...ore/src/main/java/org/apache/pinot/core/operator/blocks/results/AggregationResultsBlock.java

@@ -69,7 +84,14 @@ public DataSchema getDataSchema(QueryContext queryContext) {
    ColumnDataType[] columnDataTypes = new ColumnDataType[numColumns];
    for (int i = 0; i < numColumns; i++) {
      AggregationFunction aggregationFunction = _aggregationFunctions[i];
-      columnNames[i] = aggregationFunction.getColumnName();
+      String columnName = aggregationFunction.getResultColumnName();


I checked and we are not using the column name returned from the server in AggregationDataTableReducer. You may verify that by reverting the changes in this class and see if all the new tests still pass. If that is the case, we can also consider dropping the changes in this class to simplify the changes.

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/results/ResultsBlockUtils.java

Jackie-Jiang · 2023-01-11T23:46:29Z

pinot-core/src/main/java/org/apache/pinot/core/operator/query/FilteredGroupByOperator.java

@@ -60,8 +61,10 @@ public class FilteredGroupByOperator extends BaseOperator<GroupByResultsBlock> {
  private long _numEntriesScannedPostFilter;
  private final DataSchema _dataSchema;
  private final QueryContext _queryContext;
+  private final IdentityHashMap<AggregationFunction, Integer> _resultHolderIndexMap;


We probably can keep it as local variable. Don't see it being used in other places

My thought on this change was to avoid the need to recompute this map on every call to getNextBlock and instead compute once on instantiation and reuse the map across calls to getNextBlock.

Thoughts?

For this operator, getNextBlock can only be called once, so it should be fine. For the same reason, FilteredAggregationOperator is also keeping it local

Jackie-Jiang · 2023-01-11T23:47:29Z

pinot-core/src/main/java/org/apache/pinot/core/plan/GroupByPlanNode.java

@@ -59,6 +59,7 @@ public Operator<GroupByResultsBlock> run() {
    assert _queryContext.getGroupByExpressions() != null;

    if (_queryContext.hasFilteredAggregations()) {
+      assert _queryContext.getFilteredAggregationFunctions() != null;


For the context, this is never null even if there is no filter on aggregation

I've updated QueryContext annotation for this field to be Nonnull rather than Nullable as this has caught me a few times

We cannot annotate it as Nonnull because it is Nonnull only for aggregation queries. Basically _queryContext.getAggregationFunctions() and _queryContext.getFilteredAggregationFunctions() will both be null or non-null.

…d agg name handling

egalpin · 2023-01-12T18:29:03Z

@Jackie-Jiang thanks for the review! I've made all the updates I believe, so this is ready to be reviewed again.

Jackie-Jiang

LGTM

egalpin added 3 commits January 6, 2023 11:55

Fixes column naming for filtered group aggs

47dd147

Fixes column naming for filtered aggs operator

9f98018

Ensures that a filtered agg function will be used properly in order b…

288e2ed

…y expression

egalpin commented Jan 10, 2023

View reviewed changes

egalpin added 2 commits January 10, 2023 14:03

Fixes expected result column names for filtered agg tests

b46133d

Removes unused method

9bf0f72

Fixes filtered agg tests in InterSegmentAggregationMultiValueQueriesT…

5a996fb

…est and InterSegmentAggregationMultiValueRawQueriesTest

Adds order-by test for filtered aggs

562fe1e

Jackie-Jiang reviewed Jan 11, 2023

View reviewed changes

egalpin added 4 commits January 12, 2023 09:25

Adds getResultColumnName to AggregationFunctionUtils.java for filtere…

f64cd7e

…d agg name handling

Reverts filtered agg additions from AggregationResultsBlock.java

a01f624

Various PR review clean up

a076228

Removes code after other code removals render them useless

10f4529

egalpin added 4 commits January 12, 2023 11:58

Reverts to prior expected test results

a0937a1

Annotates filtered aggs as nullable again

9780446

Makes resultHolderIndexMap local var again

43f51a7

Formatting

04ee3d5

Jackie-Jiang approved these changes Jan 13, 2023

View reviewed changes

Jackie-Jiang added the bugfix label Jan 13, 2023

Jackie-Jiang merged commit 36307cb into apache:master Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes filtered agg result column naming and filtered agg order-by compat #10092

Fixes filtered agg result column naming and filtered agg order-by compat #10092

egalpin commented Jan 10, 2023 •

edited

Loading

egalpin commented Jan 10, 2023

egalpin Jan 10, 2023

Jackie-Jiang Jan 11, 2023

egalpin Jan 12, 2023

egalpin commented Jan 10, 2023

codecov-commenter commented Jan 11, 2023 •

edited

Loading

Jackie-Jiang left a comment

Jackie-Jiang Jan 11, 2023

Jackie-Jiang Jan 11, 2023

egalpin Jan 12, 2023

Jackie-Jiang Jan 12, 2023

Jackie-Jiang Jan 11, 2023

egalpin Jan 12, 2023

Jackie-Jiang Jan 12, 2023

egalpin commented Jan 12, 2023

Jackie-Jiang left a comment

Fixes filtered agg result column naming and filtered agg order-by compat #10092

Fixes filtered agg result column naming and filtered agg order-by compat #10092

Conversation

egalpin commented Jan 10, 2023 • edited Loading

egalpin commented Jan 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egalpin commented Jan 10, 2023

codecov-commenter commented Jan 11, 2023 • edited Loading

Codecov Report

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egalpin commented Jan 12, 2023

Jackie-Jiang left a comment

Choose a reason for hiding this comment

egalpin commented Jan 10, 2023 •

edited

Loading

codecov-commenter commented Jan 11, 2023 •

edited

Loading