Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable TPCH query 6 (#4024) #4229

Closed
wants to merge 2 commits into from

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #4024

Rationale for this change

#4199 might have resolved this 🤞 (my laptop is too piddly to confirm locally)

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@tustvold
Copy link
Contributor Author

Sadly it would appear not 😢

@andygrove
Copy link
Member

Getting close ... I think there are still some decimal rounding issues to be resolved

   left: `"Decimal128(Some(12314107823),15,2)"`,
 right: `"Decimal128(Some(12314055479),15,2)"`', benchmarks/src/bin/tpch.rs:1208:31

@viirya
Copy link
Member

viirya commented Nov 15, 2022

Maybe related to decimal casting rounding issue apache/arrow-rs#1043 (comment)

@Dandandan
Copy link
Contributor

Looking at the plan, it looks like the decimal filter values is coming through nicely (though I think that we should see if we can default DataFusion to parse_float_as_decimal=true in the parser).

So my guess is that either the multiplication of lineitem.l_extendedprice * lineitem.l_discount is causing the error or the SUM aggregation is causing it.

ProjectionExec: expr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)@0 as revenue], metrics=[output_rows=1, elapsed_compute=250ns, spill_count=0, spilled_bytes=0, mem_used=0]
  AggregateExec: mode=Final, gby=[], aggr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)], metrics=[output_rows=1, elapsed_compute=665ns, spill_count=0, spilled_bytes=0, mem_used=0]
    CoalescePartitionsExec, metrics=[output_rows=2, elapsed_compute=1µs, spill_count=0, spilled_bytes=0, mem_used=0]
      AggregateExec: mode=Partial, gby=[], aggr=[SUM(lineitem.l_extendedprice * lineitem.l_discount)], metrics=[output_rows=2, elapsed_compute=1.550495ms, spill_count=0, spilled_bytes=0, mem_used=0]
        CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=114160, elapsed_compute=1.228926ms, spill_count=0, spilled_bytes=0, mem_used=0]
          FilterExec: l_shipdate@3 >= 8766 AND l_shipdate@3 < 9131 AND l_discount@2 >= Some(5),15,2 AND l_discount@2 <= Some(7),15,2 AND l_quantity@0 < Some(2400),15,2, metrics=[output_rows=114160, elapsed_compute=20.476516ms, spill_count=0, spilled_bytes=0, mem_used=0]
            RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1, metrics=[fetch_time=1.795540669s, repart_time=1ns, send_time=24.394149ms]
              CsvExec: files={1 group: [[Users/danielheres/Code/arrow-datafusion/benchmarks/tpch-dbgen/lineitem.tbl]]}, has_header=false, limit=None, projection=[l_quantity, l_extendedprice, l_discount, l_shipdate], metrics=[output_rows=6001215, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, time_elapsed_scanning_total=1.819781372s, time_elapsed_scanning_until_data=4.780458ms, time_elapsed_processing=1.79542109s, time_elapsed_opening=59.708µs]

@Dandandan
Copy link
Contributor

Dandandan commented Feb 1, 2023

Looks like both multiplication as addition is wrong.

pub(crate) fn multiply_decimal(
    left: &Decimal128Array,
    right: &Decimal128Array,
) -> Result<Decimal128Array> {
    let divide = 10_i128.pow(left.scale() as u32);
    let array = multiply(left, right)?;
    let array = divide_scalar(&array, divide)?
        .with_precision_and_scale(left.precision(), left.scale())?;
    Ok(array)
}

Correct logic without losing information should be to multiply and add the scales instead of using division.
Fixing this results in 123141101.83, bringing the result much closer (difference only 23,60) .

I think there is some error in addition as well:

pub(crate) fn add_decimal(
    left: &Decimal128Array,
    right: &Decimal128Array,
) -> Result<Decimal128Array> {
    let array =
        add(left, right)?.with_precision_and_scale(left.precision(), left.scale())?;
    Ok(array)
}

pub(crate) fn add_decimal_scalar(
    left: &Decimal128Array,
    right: i128,
) -> Result<Decimal128Array> {
    let array = add_scalar(left, right)?
        .with_precision_and_scale(left.precision(), left.scale())?;
    Ok(array)
}

This only looks at the left scale / precision, but it should increase the precision to capture.

@Dandandan
Copy link
Contributor

#5143

@Dandandan
Copy link
Contributor

Completed in main :)

@Dandandan Dandandan closed this Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

benchmark q6 producing incorrect result
4 participants