-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes filtered agg result column naming and filtered agg order-by compat #10092
Fixes filtered agg result column naming and filtered agg order-by compat #10092
Conversation
@@ -69,7 +84,14 @@ public DataSchema getDataSchema(QueryContext queryContext) { | |||
ColumnDataType[] columnDataTypes = new ColumnDataType[numColumns]; | |||
for (int i = 0; i < numColumns; i++) { | |||
AggregationFunction aggregationFunction = _aggregationFunctions[i]; | |||
columnNames[i] = aggregationFunction.getColumnName(); | |||
String columnName = aggregationFunction.getResultColumnName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is a very important change to call out, as I'm likely lacking context to understand all the implications. I know that at least one unit test failed initially due to this change where the resulting columns were named count_star
previously, vs count(*)
now. This may be an undesirable change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and we are not using the column name returned from the server in AggregationDataTableReducer
. You may verify that by reverting the changes in this class and see if all the new tests still pass. If that is the case, we can also consider dropping the changes in this class to simplify the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had mistakenly assumed that this change was needed to ensure that non-group-by filtered aggs had their result column names accurately reflected. That appears to not be the case having confirmed that tests still pass after removing these changes.
Is it at all problematic that this section of code would be "out of sync" with other sections of code as it pertains to column naming?
…est and InterSegmentAggregationMultiValueRawQueriesTest
Codecov Report
@@ Coverage Diff @@
## master #10092 +/- ##
=============================================
+ Coverage 27.87% 68.62% +40.74%
- Complexity 53 5737 +5684
=============================================
Files 1979 1995 +16
Lines 107281 108088 +807
Branches 16323 16424 +101
=============================================
+ Hits 29909 74176 +44267
+ Misses 74417 28649 -45768
- Partials 2955 5263 +2308
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good. Thanks for adding the tests!
pinot-common/src/main/java/org/apache/pinot/common/request/context/FilterContext.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java
Outdated
Show resolved
Hide resolved
@@ -69,7 +84,14 @@ public DataSchema getDataSchema(QueryContext queryContext) { | |||
ColumnDataType[] columnDataTypes = new ColumnDataType[numColumns]; | |||
for (int i = 0; i < numColumns; i++) { | |||
AggregationFunction aggregationFunction = _aggregationFunctions[i]; | |||
columnNames[i] = aggregationFunction.getColumnName(); | |||
String columnName = aggregationFunction.getResultColumnName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and we are not using the column name returned from the server in AggregationDataTableReducer
. You may verify that by reverting the changes in this class and see if all the new tests still pass. If that is the case, we can also consider dropping the changes in this class to simplify the changes.
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/results/ResultsBlockUtils.java
Outdated
Show resolved
Hide resolved
@@ -60,8 +61,10 @@ public class FilteredGroupByOperator extends BaseOperator<GroupByResultsBlock> { | |||
private long _numEntriesScannedPostFilter; | |||
private final DataSchema _dataSchema; | |||
private final QueryContext _queryContext; | |||
private final IdentityHashMap<AggregationFunction, Integer> _resultHolderIndexMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably can keep it as local variable. Don't see it being used in other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought on this change was to avoid the need to recompute this map on every call to getNextBlock
and instead compute once on instantiation and reuse the map across calls to getNextBlock
.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this operator, getNextBlock
can only be called once, so it should be fine. For the same reason, FilteredAggregationOperator
is also keeping it local
@@ -59,6 +59,7 @@ public Operator<GroupByResultsBlock> run() { | |||
assert _queryContext.getGroupByExpressions() != null; | |||
|
|||
if (_queryContext.hasFilteredAggregations()) { | |||
assert _queryContext.getFilteredAggregationFunctions() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the context, this is never null
even if there is no filter on aggregation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated QueryContext
annotation for this field to be Nonnull
rather than Nullable
as this has caught me a few times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot annotate it as Nonnull
because it is Nonnull
only for aggregation queries. Basically _queryContext.getAggregationFunctions()
and _queryContext.getFilteredAggregationFunctions()
will both be null
or non-null
.
@Jackie-Jiang thanks for the review! I've made all the updates I believe, so this is ready to be reviewed again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR follows up on #7916 #10000 (relates to #7519). This PR fixes a bug with Filtered Aggregations where the result column name was previously unclear, and ordering by filtered aggregations previously resulted in NPE. Both are patched via this PR.
Current behaviour:
SELECT max(ArrDelay) filter (where DaysSinceEpoch >= 16090)
would have a result column named:max(ArrDelay)
After this PR:
SELECT max(ArrDelay) filter (where DaysSinceEpoch >= 16090)
would have a result column named:max(ArrDelay) filter (where DaysSinceEpoch >= 16090)
Examples:
data:image/s3,"s3://crabby-images/31625/316252b1835d96b9e71b9c6b66a6a67ae0fe7acc" alt="Screen Shot 2023-01-10 at 10 41 48"
data:image/s3,"s3://crabby-images/45800/45800cc8dc5a4cbdca34dae0d43b36591990db98" alt="Screen Shot 2023-01-09 at 16 00 09"
Filtered agg name correction without grouping/ordering:
Filtered agg name correction with grouping/ordering: