-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix return type for sum(REAL) Spark aggregate #9818
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@mbasmanova Can you help to review? Thanks. |
@mbasmanova it's a quick bug fix for window operator. It causes Gluten failure in some users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -449,5 +449,16 @@ TEST_F(SumAggregationTest, decimalRangeOverflow) { | |||
{expected}, | |||
{}); | |||
} | |||
|
|||
TEST_F(SumAggregationTest, sumFloat) { | |||
auto data = makeRowVector({makeFlatVector<float>({2.00, 1.00})}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this test fail before the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FelixYBW I wonder what was the failure. I tried running this test on 'main' and it passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( @JkSelf Did you do the test? Looks the UT can't detect the type mismatch.
The PR does solved the customer issue but the issue is caused by window function validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the sum(float) aggregate window function with gluten, the following error occurs:
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Unexpected return type for window function sum(REAL). Expected REAL. Got DOUBLE.
Retriable: False
This is due to a mismatch between the function signatures in Spark and Velox. In Spark, the sum(real) function is expected to return a double type, whereas in Velox, the same function is registered to return a real type, leading to incompatibility. It is hard to reproduce this exception in Velox. I have modified the unit test to trigger an overflow error if the current patch is not applied. Please help to review again. Thanks.
@@ -449,5 +449,17 @@ TEST_F(SumAggregationTest, decimalRangeOverflow) { | |||
{expected}, | |||
{}); | |||
} | |||
|
|||
TEST_F(SumAggregationTest, sumFloat) { | |||
auto data = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test fail without the change? Looks like it is same with Presto test which result type is float.
Wonder do we need to backport SumTest to sparksql? https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/tests/SumTest.cpp#L78
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh
Without this patch, the current test will throw an overflow error.
I believe there is no need to test all SumTests again, as the only differences between SumAggregate in Spark SQL and Presto now are the decimal type and the conversion of sum(real) -> double in this PR. The registration of other functions is the same. Of course, if deemed necessary, we can open another PR later to conduct separate tests.
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: The return type of `sum(real)` in spark sql should be `double`, not `real`https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81. Pull Request resolved: facebookincubator#9818 Reviewed By: Yuhta Differential Revision: D57584541 Pulled By: pedroerp fbshipit-source-id: 4e750122ce49fc4b446fd5d126d6c213c0ddd553
Summary: The return type of `sum(real)` in spark sql should be `double`, not `real`https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81. Pull Request resolved: facebookincubator#9818 Reviewed By: Yuhta Differential Revision: D57584541 Pulled By: pedroerp fbshipit-source-id: 4e750122ce49fc4b446fd5d126d6c213c0ddd553
Summary: The return type of `sum(real)` in spark sql should be `double`, not `real`https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81. Pull Request resolved: facebookincubator#9818 Reviewed By: Yuhta Differential Revision: D57584541 Pulled By: pedroerp fbshipit-source-id: 4e750122ce49fc4b446fd5d126d6c213c0ddd553
The return type of
sum(real)
in spark sql should bedouble
, notreal
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81.