Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix return type for sum(REAL) Spark aggregate #9818

Closed
wants to merge 2 commits into from

Conversation

JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented May 15, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2024
Copy link

netlify bot commented May 15, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ea6a159
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66457e1c3a01320008d18972

@JkSelf
Copy link
Collaborator Author

JkSelf commented May 15, 2024

@mbasmanova Can you help to review? Thanks.

@FelixYBW
Copy link

@mbasmanova it's a quick bug fix for window operator. It causes Gluten failure in some users.

@mbasmanova mbasmanova changed the title Fix the wrong return type for sum(Real) in spark sql Fix return type for sum(REAL) Spark aggregate May 15, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -449,5 +449,16 @@ TEST_F(SumAggregationTest, decimalRangeOverflow) {
{expected},
{});
}

TEST_F(SumAggregationTest, sumFloat) {
auto data = makeRowVector({makeFlatVector<float>({2.00, 1.00})});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this test fail before the change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FelixYBW I wonder what was the failure. I tried running this test on 'main' and it passed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( @JkSelf Did you do the test? Looks the UT can't detect the type mismatch.

The PR does solved the customer issue but the issue is caused by window function validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova @FelixYBW

When running the sum(float) aggregate window function with gluten, the following error occurs:

Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Unexpected return type for window function sum(REAL). Expected REAL. Got DOUBLE.
Retriable: False

This is due to a mismatch between the function signatures in Spark and Velox. In Spark, the sum(real) function is expected to return a double type, whereas in Velox, the same function is registered to return a real type, leading to incompatibility. It is hard to reproduce this exception in Velox. I have modified the unit test to trigger an overflow error if the current patch is not applied. Please help to review again. Thanks.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label May 15, 2024
@JkSelf JkSelf force-pushed the sum-real-double branch from 602e1f3 to ea6a159 Compare May 16, 2024 03:31
@@ -449,5 +449,17 @@ TEST_F(SumAggregationTest, decimalRangeOverflow) {
{expected},
{});
}

TEST_F(SumAggregationTest, sumFloat) {
auto data =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test fail without the change? Looks like it is same with Presto test which result type is float.

Wonder do we need to backport SumTest to sparksql? https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/tests/SumTest.cpp#L78

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh
Without this patch, the current test will throw an overflow error.
I believe there is no need to test all SumTests again, as the only differences between SumAggregate in Spark SQL and Presto now are the decimal type and the conversion of sum(real) -> double in this PR. The registration of other functions is the same. Of course, if deemed necessary, we can open another PR later to conduct separate tests.

@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pedroerp merged this pull request in d12fa90.

Copy link

Conbench analyzed the 1 benchmark run on commit d12fa90b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

NEUpanning pushed a commit to NEUpanning/velox that referenced this pull request May 22, 2024
Summary:
The return type of `sum(real)`  in spark sql should be `double`, not `real`https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81.

Pull Request resolved: facebookincubator#9818

Reviewed By: Yuhta

Differential Revision: D57584541

Pulled By: pedroerp

fbshipit-source-id: 4e750122ce49fc4b446fd5d126d6c213c0ddd553
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
Summary:
The return type of `sum(real)`  in spark sql should be `double`, not `real`https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81.

Pull Request resolved: facebookincubator#9818

Reviewed By: Yuhta

Differential Revision: D57584541

Pulled By: pedroerp

fbshipit-source-id: 4e750122ce49fc4b446fd5d126d6c213c0ddd553
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
Summary:
The return type of `sum(real)`  in spark sql should be `double`, not `real`https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L81.

Pull Request resolved: facebookincubator#9818

Reviewed By: Yuhta

Differential Revision: D57584541

Pulled By: pedroerp

fbshipit-source-id: 4e750122ce49fc4b446fd5d126d6c213c0ddd553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants