Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimplifyExpressions errors when simplifying power fn #5996

Closed
wolffcm opened this issue Apr 13, 2023 · 4 comments · Fixed by #6077
Closed

SimplifyExpressions errors when simplifying power fn #5996

wolffcm opened this issue Apr 13, 2023 · 4 comments · Fixed by #6077
Assignees
Labels
bug Something isn't working

Comments

@wolffcm
Copy link
Contributor

wolffcm commented Apr 13, 2023

Describe the bug

When calling the power function with a constant in the second argument, I get an error:

Internal error: Optimizer rule 'simplify_expressions' failed due to unexpected error: Internal error: The expr has more than one schema, could not determine data type. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker.

To Reproduce

with datafusion-cli:

select power(f, 100) from (values(10.0)) as t(f);

I was also able to reproduce this with a unit test in datafusion/optimizer/src/simplify_expressions/simlify_exprs.rs:

    #[test]
    fn simplify_power() -> Result<()> {
        let schema = Schema::new(vec![
            Field::new("f", DataType::Float64, false),
        ]);
        let table_scan = table_scan(Some("test"), &schema, None)
            .expect("creating scan")
            .build()
            .expect("building plan");

        let plan = LogicalPlanBuilder::from(table_scan)
            .project(vec![call_fn("power", vec![col("f"), lit(1)])?])?
            .build()?;
        let rule = SimplifyExpressions::new();
        let _optimized_plan = rule
            .try_optimize(&plan, &OptimizerContext::new())
            .unwrap()
            .expect("failed to optimize plan");
        Ok(())
    }

Expected behavior

Calling power with a constant is a reasonable thing to want to do, so I would expect this to work.

Additional context

Unless the optimizer is configured with skip_failed_rules set to false, this could manifest in other, or the query might succeed just fine.

Related issue: #4685

I found this specifically when invoking power but this same issue probably occurs when simplifying other expressions.

In IOx we want to run with skip_failed_rules set to false specifically so we can return reasonable error messages to the user: https://github.com/influxdata/influxdb_iox/issues/7330

This test failure seems to happen in simpl_pow when trying to get the data type of the exponent argument:
https://github.com/apache/arrow-datafusion/blob/fcd8b899e2a62f798413c536f47078289ece9d05/datafusion/optimizer/src/simplify_expressions/utils.rs#L403
And then the SimplifyContext can't get the type because there is "more than one schema":
https://github.com/apache/arrow-datafusion/blob/fcd8b899e2a62f798413c536f47078289ece9d05/datafusion/optimizer/src/simplify_expressions/context.rs#L135

@wolffcm wolffcm added the bug Something isn't working label Apr 13, 2023
@alamb
Copy link
Contributor

alamb commented Apr 15, 2023

I am not sure why this is happening -- I think some debugging is in order.

Maybe one way to start is to figure out what the two schema are:

https://github.com/apache/arrow-datafusion/blob/03851ce6892c4a682058bbc8689dd7d6d7990ad0/datafusion/optimizer/src/simplify_expressions/context.rs#L130-L138

@wolffcm
Copy link
Contributor Author

wolffcm commented Apr 19, 2023

I am looking into this

@wolffcm
Copy link
Contributor Author

wolffcm commented Apr 20, 2023

For the simple example in the unit test, the plan is:

Projection: power(test.f, Int32(1))
  TableScan: test

And the two schemas in the error message are:

[
    DFSchema { fields: [
        DFField { qualifier: None, field: Field { name: "power(test.f,Int32(1))", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } }
    ], metadata: {} }, 
    DFSchema { fields: [
        DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "f", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }
    ], metadata: {} }
]

The first one is from the Project and the other is from the Scan.

After spending some time looking at this today I think I might have a potential fix, will draft a PR tomorrow.

@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

I believe the upstream fix is now available in IOx after https://github.com/influxdata/influxdb_iox/pull/7630

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants