Evaluate expressions after type coercion #3444

Dandandan · 2022-09-11T08:26:29Z

Which issue does this PR close?

Closes #3431

Rationale for this change

See issue

What changes are included in this PR?

Are there any user-facing changes?

datafusion/optimizer/src/type_coercion.rs

andygrove · 2022-09-11T14:51:42Z

Thanks @Dandandan. This looks like a great improvement.

codecov-commenter · 2022-09-11T18:15:07Z

Codecov Report

Merging #3444 (96334db) into master (c5c1dae) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3444      +/-   ##
==========================================
+ Coverage   85.68%   85.69%   +0.01%     
==========================================
  Files         298      298              
  Lines       54645    54667      +22     
==========================================
+ Hits        46820    46846      +26     
+ Misses       7825     7821       -4

Impacted Files	Coverage Δ
datafusion/core/tests/sql/aggregates.rs	`99.37% <ø> (ø)`
datafusion/core/tests/sql/decimal.rs	`100.00% <ø> (ø)`
datafusion/core/tests/sql/explain_analyze.rs	`83.87% <ø> (ø)`
datafusion/core/tests/sql/subqueries.rs	`94.95% <ø> (ø)`
datafusion/optimizer/src/type_coercion.rs	`99.04% <100.00%> (+0.07%)`	⬆️
datafusion/core/src/physical_plan/metrics/value.rs	`86.93% <0.00%> (-0.51%)`	⬇️
datafusion/core/tests/sql/select.rs	`99.78% <0.00%> (+<0.01%)`	⬆️
datafusion/sql/src/planner.rs	`80.94% <0.00%> (+0.05%)`	⬆️
datafusion/common/src/scalar.rs	`85.12% <0.00%> (+0.06%)`	⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

liukun4515 · 2022-09-12T00:13:52Z

datafusion/core/tests/sql/aggregates.rs

-        "+--------------+-------------------------+-------------------------+-------------------------+",
-        "| 1.5          | 2.5                     | 3.5                     | 2.5                     |",
-        "+--------------+-------------------------+-------------------------+-------------------------+",
+        "+--------------+---------------------------+---------------------------+---------------------------+",


I just have the comments about the header of the expr.
The input sql is AGG(C1) + 1, 1 is the int64 data type, but the header is convert to float after casted

Do we have the method to make the header consistent, and it can be changed with the changes of the optimizer plan.
cc @andygrove @alamb

This is a concern I also have for a longer time and had a PR open once.

One approach would be to add an alias for every unnamed expression based on the original query SQL or expression.
This would avoid having the column names changed by the optimizers.

I really like the idea of adding an alias once (maybe as the initial optimizer pass?)

I am not sure how valuable adding the types in the column names is in general, to be honest. I wouldn't mind if rather than Int(1) this was simply rendered 1

I really like the idea of adding an alias once (maybe as the initial optimizer pass?)

I am not sure how valuable adding the types in the column names is in general, to be honest. I wouldn't mind if rather than Int(1) this was simply rendered 1

Do you have plan or a draft pr for that? @Dandandan

Perhaps @Dandandan was referring to #280 / #279

Yes indeed, we can give those a second life 🎉

I had some concerns with the PR, but I believe it is still a big improvement over the current state of things.

alamb

I think it looks like a good improvement to me. Perhaps we can file a follow on ticket for the column renaming?

alamb · 2022-09-12T13:14:58Z

datafusion/core/tests/sql/aggregates.rs

-        "+--------------+-------------------------+-------------------------+-------------------------+",
-        "| 1.5          | 2.5                     | 3.5                     | 2.5                     |",
-        "+--------------+-------------------------+-------------------------+-------------------------+",
+        "+--------------+---------------------------+---------------------------+---------------------------+",


I really like the idea of adding an alias once (maybe as the initial optimizer pass?)

I am not sure how valuable adding the types in the column names is in general, to be honest. I wouldn't mind if rather than Int(1) this was simply rendered 1

alamb · 2022-09-12T13:15:18Z

datafusion/core/tests/sql/explain_analyze.rs

@@ -653,7 +653,7 @@ order by
    let expected = "\
    Sort: #revenue DESC NULLS FIRST\
    \n  Projection: #customer.c_custkey, #customer.c_name, #SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount) AS revenue, #customer.c_acctbal, #nation.n_name, #customer.c_address, #customer.c_phone, #customer.c_comment\
-    \n    Aggregate: groupBy=[[#customer.c_custkey, #customer.c_name, #customer.c_acctbal, #customer.c_phone, #nation.n_name, #customer.c_address, #customer.c_comment]], aggr=[[SUM(#lineitem.l_extendedprice * CAST(Int64(1) AS Float64) - #lineitem.l_discount)]]\
+    \n    Aggregate: groupBy=[[#customer.c_custkey, #customer.c_name, #customer.c_acctbal, #customer.c_phone, #nation.n_name, #customer.c_address, #customer.c_comment]], aggr=[[SUM(#lineitem.l_extendedprice * Float64(1) - #lineitem.l_discount)]]\


datafusion/optimizer/src/type_coercion.rs

alamb · 2022-09-12T13:19:16Z

This PR appears to have some conflicts now

…rcion

andygrove · 2022-09-12T17:34:53Z

@Dandandan This now needs a rebase

ursabot · 2022-09-12T19:33:39Z

Benchmark runs are scheduled for baseline = 97b3a4b and contender = f48a997. f48a997 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Evaluate expressions after type coercion

ace211f

github-actions bot added the optimizer Optimizer rules label Sep 11, 2022

Fix some explains

40bb2ad

github-actions bot added the core Core DataFusion crate label Sep 11, 2022

andygrove reviewed Sep 11, 2022

View reviewed changes

datafusion/optimizer/src/type_coercion.rs Show resolved Hide resolved

Dandandan added 4 commits September 11, 2022 17:54

Fix some explains

ce3f1a1

Fix some explains

47b4d31

Update test

9ec01c9

Update test

f08452b

liukun4515 reviewed Sep 12, 2022

View reviewed changes

alamb approved these changes Sep 12, 2022

View reviewed changes

Dandandan added 3 commits September 12, 2022 17:33

Merge remote-tracking branch 'upstream/master' into evaluate_type_coe…

15cc7bd

…rcion

Update test

64b1e5f

Update more tests

96334db

Dandandan added 3 commits September 12, 2022 20:02

Merge

3337dcb

Fix tests

3f9dea7

Use supported date string

672715e

Dandandan merged commit f48a997 into apache:master Sep 12, 2022

This was referenced Sep 15, 2022

inlist: move type coercion to logical phase #3472

Merged

make the header not changed when do optimization #3568

Open

liukun4515 mentioned this pull request Sep 22, 2022

make type coercion simple and remove the evaluate logic #3585

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate expressions after type coercion #3444

Evaluate expressions after type coercion #3444

Dandandan commented Sep 11, 2022 •

edited

Loading

andygrove commented Sep 11, 2022

codecov-commenter commented Sep 11, 2022 •

edited

Loading

liukun4515 Sep 12, 2022

liukun4515 Sep 12, 2022

Dandandan Sep 12, 2022

alamb Sep 12, 2022

liukun4515 Sep 13, 2022

alamb Sep 14, 2022

Dandandan Sep 14, 2022

alamb left a comment

alamb Sep 12, 2022

alamb Sep 12, 2022

alamb commented Sep 12, 2022

andygrove commented Sep 12, 2022

ursabot commented Sep 12, 2022

Evaluate expressions after type coercion #3444

Evaluate expressions after type coercion #3444

Conversation

Dandandan commented Sep 11, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

andygrove commented Sep 11, 2022

codecov-commenter commented Sep 11, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Sep 12, 2022

andygrove commented Sep 12, 2022

ursabot commented Sep 12, 2022

Dandandan commented Sep 11, 2022 •

edited

Loading

codecov-commenter commented Sep 11, 2022 •

edited

Loading