Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1576

Closed
xudong963 opened this issue Jan 15, 2022 · 3 comments · Fixed by #1601
Closed

casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1576

xudong963 opened this issue Jan 15, 2022 · 3 comments · Fixed by #1601
Labels
bug Something isn't working datafusion Changes in the datafusion crate

Comments

@xudong963
Copy link
Member

Describe the bug
During running tpch8, I got the following error:

➜  benchmarks git:(fix_cross_join) cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./data --format tbl --query 8 --batch-size 4096
    Finished release [optimized] target(s) in 0.71s
     Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 8, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "./data", file_format: "tbl", mem_table: false }
thread 'thread 'tokio-runtime-workertokio-runtime-worker' panicked at '' panicked at 'false_values downcast failedfalse_values downcast failed', ', datafusion/src/physical_plan/expressions/case.rsdatafusion/src/physical_plan/expressions/case.rs::218218::3030

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: ArrowError(ExternalError(ArrowError(ExternalError(Execution("Arrow error: External error: oneshot canceled")))))

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@xudong963 xudong963 added the bug Something isn't working label Jan 15, 2022
@xudong963
Copy link
Member Author

xudong963 commented Jan 15, 2022

I've found out why it failed.

let's see query 8, there is when ... then ... else ....

select
    o_year,
    sum(case
            when nation = 'BRAZIL' then volume
            else 0
        end) / sum(volume) as mkt_share
...

In datafusion, the following code to process when ... then ... else ...

fn if_then_else(
    bools: &BooleanArray,
    true_values: ArrayRef,
    false_values: ArrayRef,
    data_type: &DataType,
) -> Result<ArrayRef> {
    match data_type {
        ...
        DataType::Float64 => if_then_else!(
            array::Float64Builder,
            array::Float64Array,
            bools,
            true_values,
            false_values
        ),
    }
}

In query8, 0 is Int64, then during casing to Float64, it fails.

macro_rules! if_then_else {
    ($BUILDER_TYPE:ty, $ARRAY_TYPE:ty, $BOOLS:expr, $TRUE:expr, $FALSE:expr) => {{
        let true_values = $TRUE
            .as_ref()
            .as_any()
            .downcast_ref::<$ARRAY_TYPE>()
            .expect("true_values downcast failed");

        let false_values = $FALSE
            .as_ref()
            .as_any()
            .downcast_ref::<$ARRAY_TYPE>()
            .expect("false_values downcast failed");

There are two potential approaches to solve the issue

  • Let arrow provide implicit conversion.
  • Do type conversion in datafusion, such as we convert true_values and false_values type to final type in advance.

Any thoughts? @alamb @houqp

@alamb
Copy link
Contributor

alamb commented Jan 16, 2022

Do type conversion in datafusion, such as we convert true_values and false_values type to final type in advance.

I think it would be most appropriate to have datafusion do the conversion (coercion) in this case, rather than implicitly in arrow.

Perhaps by calling coerce https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/type_coercion.rs#L45

@xudong963
Copy link
Member Author

Thanks @alamb , I'll try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants