`common_sub_expression_eliminate` exists bug #4575

jackwener · 2022-12-10T05:04:02Z

Describe the bug

It will make some strange problem.

Error: Internal("Optimizer rule 'common_sub_expression_eliminate' failed, due to generate a different schema, original schema: 

DFSchema { fields: [
DFField { qualifier: Some(\"t1\"), field: Field { name: \"t1_id\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t1\"), field: Field { name: \"t1_name\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t1\"), field: Field { name: \"t1_int\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t2\"), field: Field { name: \"t2_id\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t2\"), field: Field { name: \"t2_name\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t2\"), field: Field { name: \"t2_int\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }, new schema: 

DFSchema { fields: [
DFField { qualifier: None, field: Field { name: \"t2.t2_int < UInt32(10)UInt32(10)t2.t2_int\", data_type: Boolean, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t1\"), field: Field { name: \"t1_id\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t1\"), field: Field { name: \"t1_name\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t1\"), field: Field { name: \"t1_int\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t2\"), field: Field { name: \"t2_id\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t2\"), field: Field { name: \"t2_name\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }, 
DFField { qualifier: Some(\"t2\"), field: Field { name: \"t2_int\", data_type: UInt32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }")

original:
"Filter: t2.t2_int < Int64(10) OR t1.t1_int > Int64(2) AND t2.t2_name != Utf8("w")
  Left Join: t1.t1_id = t2.t2_id
    TableScan: t1
    TableScan: t2"

after:
"Filter: (t2.t2_int < UInt32(10)UInt32(10)t2.t2_int AS t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)) AND (t2.t2_int < UInt32(10)UInt32(10)t2.t2_int AS t2.t2_int < UInt32(10) OR t2.t2_name != Utf8("w"))
  Projection: t2.t2_int < UInt32(10) AS t2.t2_int < UInt32(10)UInt32(10)t2.t2_int, t1.t1_id, t1.t1_name, t1.t1_int, t2.t2_id, t2.t2_name, t2.t2_int
    Inner Join: t1.t1_id = t2.t2_id
      TableScan: t1 projection=[t1_id, t1_name, t1_int]
      TableScan: t2 projection=[t2_id, t2_name, t2_int]"

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

jackwener · 2022-12-10T05:07:08Z

multiple_or_predicates() generate nasty result, look like this bug cause it. #4487 (comment) it's a related comment

jackwener · 2022-12-10T06:11:16Z

Current this method is depend on the Self::desc_expr(expr)

let desc = Self::desc_expr(expr);

It will cause some strange expr.

id_array:

[
(15, "(t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)) AND (t2.t2_int < UInt32(10) OR t2.t2_name != Utf8(\"w\"))t2.t2_int < UInt32(10) OR t2.t2_name != Utf8(\"w\")t2.t2_name != Utf8(\"w\")Utf8(\"w\")t2.t2_namet2.t2_int < UInt32(10)UInt32(10)t2.t2_intt2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)t1.t1_int > UInt32(2)UInt32(2)t1.t1_intt2.t2_int < UInt32(10)UInt32(10)t2.t2_int")
, (7, "t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)t1.t1_int > UInt32(2)UInt32(2)t1.t1_intt2.t2_int < UInt32(10)UInt32(10)t2.t2_int")
, (3, "t2.t2_int < UInt32(10)UInt32(10)t2.t2_int")
, (1, "")
, (2, "")
, (6, "t1.t1_int > UInt32(2)UInt32(2)t1.t1_int")
, (4, "")
, (5, "")
, (14, "t2.t2_int < UInt32(10) OR t2.t2_name != Utf8(\"w\")t2.t2_name != Utf8(\"w\")Utf8(\"w\")t2.t2_namet2.t2_int < UInt32(10)UInt32(10)t2.t2_int")
, (10, "t2.t2_int < UInt32(10)UInt32(10)t2.t2_int")
, (8, "")
, (9, "")
, (13, "t2.t2_name != Utf8(\"w\")Utf8(\"w\")t2.t2_name")
, (11, "")
, (12, "")
]

id_array/arrays_list -> affected_id -> project_exprs/predicate

it will cause strange column/expr

jackwener · 2022-12-10T06:13:10Z

problem in #3635 look like exist multiple bug.

cc @liukun4515 @andygrove @alamb @alex-natzka

alamb · 2022-12-10T10:42:09Z

cc @waynexia whom I think contributed the optimization originally, and perhaps might have some suggestions

alex-spies · 2022-12-13T12:06:17Z

Hi, I'm afraid I won't have time to look into this. I did go over the error message though, here's what I think is happening:

I think some other optimization rule than common_sub_expression_eliminate transforms the filter
t2.t2_int < Int64(10) OR t1.t1_int > Int64(2) AND t2.t2_name != Utf8("w")
into
(t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2)) AND (t2.t2_int < UInt32(10) OR t2.t2_name != Utf8("w"))
which is weird but technically not wrong.
Then common_sub_expression_eliminate sees that t2.t2_int < UInt32(10) occurs twice, so it adds a projection before the filter where this expression gets computed. (This is correct and expected.) The projection creates the additional column t2.t2_int < UInt32(10)UInt32(10)t2.t2_int as alias for t2.t2_int < UInt32(10).
This changes the output schema, because we don't project out the column again.

The last point is the problem IMO. I guess common_sub_expression_eliminate would need to add another projection after the filter that gets rid of the extra column.

waynexia · 2022-12-15T03:44:20Z

Appreciate the investigation @alex-natzka ❤️

This changes the output schema, because we don't project out the column again.

The last point is the problem IMO. I guess common_sub_expression_eliminate would need to add another projection after the filter that gets rid of the extra column.

This point sounds reasonable to me. I revisited the unit tests in common_sub_expression_eliminate, and looks like I only used the plans that have their own output schema like Projection or Aggregator. Which will ignore the additional column by itself. I plan to look into it this week (and sorry for the late reply 🥲).

waynexia · 2022-12-26T12:10:47Z

Hi, I submit #4733 to fix this. Please check it out 🥰

jackwener added the bug Something isn't working label Dec 10, 2022

jackwener mentioned this issue Dec 10, 2022

reimplement push_down_projection and prune_column. #4465

Merged

jackwener mentioned this issue Dec 10, 2022

fix push_down_projection push redundant columns. #4487

Merged

jackwener mentioned this issue Dec 11, 2022

Optimizer adds redundant projections on casts with confusing alias #3786

Open

waynexia mentioned this issue Dec 26, 2022

fix: add one more projection to recover output schema #4733

Merged

alamb closed this as completed in #4733 Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`common_sub_expression_eliminate` exists bug #4575

`common_sub_expression_eliminate` exists bug #4575

jackwener commented Dec 10, 2022 •

edited

Loading

jackwener commented Dec 10, 2022

jackwener commented Dec 10, 2022

jackwener commented Dec 10, 2022

alamb commented Dec 10, 2022

alex-spies commented Dec 13, 2022

waynexia commented Dec 15, 2022

waynexia commented Dec 26, 2022

common_sub_expression_eliminate exists bug #4575

common_sub_expression_eliminate exists bug #4575

Comments

jackwener commented Dec 10, 2022 • edited Loading

jackwener commented Dec 10, 2022

jackwener commented Dec 10, 2022

jackwener commented Dec 10, 2022

alamb commented Dec 10, 2022

alex-spies commented Dec 13, 2022

waynexia commented Dec 15, 2022

waynexia commented Dec 26, 2022

`common_sub_expression_eliminate` exists bug #4575

`common_sub_expression_eliminate` exists bug #4575

jackwener commented Dec 10, 2022 •

edited

Loading