-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: just allow having use expr in groupby
or aggr
#4579
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jackwener
I am not sure I would call this an "API break" -- more like a bugfix were DataFusion was previously allowing malformed SQL. 😆
I had one negative case I think we may need to check but otherwise the code looks good to me 👍
@@ -194,7 +194,7 @@ async fn csv_query_group_by_and_having_and_where() -> Result<()> { | |||
async fn csv_query_having_without_group_by() -> Result<()> { | |||
let ctx = SessionContext::new(); | |||
register_aggregate_csv(&ctx).await?; | |||
let sql = "SELECT c1, c2, c3 FROM aggregate_test_100 HAVING c2 >= 4 AND c3 > 90"; | |||
let sql = "SELECT c1, c2, c3 FROM aggregate_test_100 where c2 >= 4 AND c3 > 90"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the whole test as we definitely have coverage for selecting with predicates in the where clause and the name of the test doesn't make sense csv_query_having_without_group_by
\n Filter: person.age > Int64(100) AND person.age < Int64(200)\ | ||
\n TableScan: person"; | ||
quick_test(sql, expected); | ||
let err = logical_plan(sql).expect_err("query should have failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend one test (that should also fail) where there is a GROUP clause but the having clause refers to an invalid column, such as
SELECT id, MAX(age)
FROM PERSON
GROUP BY id
-- first_name is a valid column but does not appear in the grouping output (group column nor aggregate)
HAVING first_name = 'M'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice advice❤️
Added it.
.is_empty() | ||
|| !aggr_exprs.is_empty() | ||
{ | ||
self.aggregate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to check here that the having expr only uses columns in the logical aggregation phase (group by columns or agg exprs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been checked by check_columns_satisfy_exprs()
.
groupby
or aggr
groupby
or aggr
Thanks @alamb review, very helpful advice❤️. |
f3a2890
to
05baf1f
Compare
HAVING first_name = 'M'"; | ||
let err = logical_plan(sql).expect_err("query should have failed"); | ||
assert_eq!( | ||
"Plan(\"HAVING clause references non-aggregate values: Expression person.first_name could not be resolved from available columns: person.id, MAX(person.age)\")", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Thanks @jackwener |
@jackwener Could you please add a UT to cover this case ?
Looks like this is allowed in PG and SparkSQL. |
Which issue does this PR close?
Part of #4556 .
having just use expr in groupby/aggr
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?