Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33845][SQL] Remove unnecessary if when trueValue and falseValue are foldable boolean types #30849

Closed
wants to merge 5 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Dec 19, 2020

What changes were proposed in this pull request?

Improve SimplifyConditionals.
Simplify If(cond, TrueLiteral, FalseLiteral) to cond.
Simplify If(cond, FalseLiteral, TrueLiteral) to Not(cond).

The use case is:

create table t1 using parquet as select id from range(10);
select if (id > 2, false, true) from t1;

Before this pr:

== Physical Plan ==
*(1) Project [if ((id#1L > 2)) false else true AS (IF((id > CAST(2 AS BIGINT)), false, true))#2]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>

After this pr:

== Physical Plan ==
*(1) Project [(id#1L <= 2) AS (IF((id > CAST(2 AS BIGINT)), false, true))#2]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>

Why are the changes needed?

Improve query performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Dec 19, 2020
@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Test build #133045 has finished for PR 30849 at commit 2da6885.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37645/

@wangyum wangyum marked this pull request as draft December 19, 2020 03:29
@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37645/

@wangyum wangyum changed the title [SPARK-33798][SQL][FOLLOW-UP] Improve SimplifyConditionals and PushFoldableIntoBranches [SPARK-33845][SQL] Remove unnecessary if when trueValue and falseValue are foldable boolean types Dec 19, 2020
@wangyum wangyum marked this pull request as ready for review December 19, 2020 04:37
@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37651/

@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Test build #133051 has finished for PR 30849 at commit 090890c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Test build #133052 has finished for PR 30849 at commit f4d8f6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37738/

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37738/

@@ -199,4 +199,20 @@ class SimplifyConditionalSuite extends PlanTest with ExpressionEvalHelper with P
If(Factorial(5) > 100L, b, nullLiteral).eval(EmptyRow))
}
}

test("remove unnecessary if when the outputs are boolean type") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add JIRA prefix, though.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (except @HyukjinKwon 's the JIRA prefix comment)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged to master for Apache Spark 3.2.0. Thank you, @wangyum and all.
(The last commit is only adding JIRA id prefix to the test case name.)

@wangyum wangyum deleted the SPARK-33798-2 branch December 21, 2020 12:33
@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Test build #133139 has finished for PR 30849 at commit c11dbd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Dec 29, 2020
### What changes were proposed in this pull request?

This is a followup of #30849, to fix a correctness issue caused by null value handling.

### Why are the changes needed?

Fix a correctness issue. `If(null, true, false)` should return false, not true.

### Does this PR introduce _any_ user-facing change?

Yes, but the bug only exist in the master branch.

### How was this patch tested?

updated tests.

Closes #30953 from cloud-fan/bug.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants