Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add test for decimal and pruning for decimal column #2960

Merged
merged 2 commits into from
Jul 26, 2022

Conversation

liukun4515
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Jul 24, 2022

// parquet use the fixed length binary as the physical type to store decimal
// TODO: arrow-rs don't support convert the physical type of binary to decimal
// let exec = get_exec("byte_array_decimal.parquet", None, None).await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow-rs doesn't support read decimal value from parquet with binary physical type.
why not support this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you have a variable length decimal type? What would this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tustvold
In the parquet of Java version, we can write decimal with the binary or fixed-length binary, so we can get the decimal using the physical type of binary.

we can get the definition of decimal from the format spec

Decimal can be stored in the parquet as INT32, INT64, FIXED_LEN_BYTE_ARRAY or BYTE_ARRAY.

I can help to implement the BYTE_ARRAY in the arrow-rs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr apache/arrow-rs#2160 for supporting read decimal from binary physical type in parquet.

@liukun4515 liukun4515 force-pushed the add_decimal_parquet_prune_test branch from ee1cc40 to bb5dece Compare July 24, 2022 09:57
@codecov-commenter
Copy link

Codecov Report

Merging #2960 (bb5dece) into master (834924f) will increase coverage by 0.21%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2960      +/-   ##
==========================================
+ Coverage   85.43%   85.64%   +0.21%     
==========================================
  Files         275      279       +4     
  Lines       49839    51033    +1194     
==========================================
+ Hits        42578    43707    +1129     
- Misses       7261     7326      +65     
Impacted Files Coverage Δ
...afusion/core/src/datasource/file_format/parquet.rs 85.89% <100.00%> (+0.87%) ⬆️
datafusion/core/src/physical_optimizer/pruning.rs 94.17% <100.00%> (+0.38%) ⬆️
datafusion/optimizer/src/test/mod.rs 94.23% <0.00%> (-5.77%) ⬇️
datafusion/core/tests/sql/mod.rs 98.26% <0.00%> (-0.56%) ⬇️
datafusion/core/src/physical_plan/metrics/value.rs 86.93% <0.00%> (-0.51%) ⬇️
datafusion/optimizer/src/decorrelate_where_in.rs 94.11% <0.00%> (ø)
datafusion/core/tests/sql/subqueries.rs 94.32% <0.00%> (ø)
...usion/optimizer/src/decorrelate_scalar_subquery.rs 92.72% <0.00%> (ø)
...tafusion/optimizer/src/decorrelate_where_exists.rs 93.42% <0.00%> (ø)
datafusion/sql/src/planner.rs 81.31% <0.00%> (+0.04%) ⬆️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

23,
2,
),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some large decimal numbers to test the decimal(23,2) cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is just used to test the pruning logic.
I will file follow-up pull request to fix the #2962 with parquet rowgroup filter/prune.

@liukun4515 liukun4515 changed the title test: add test for parquet decimal and pruning for decimal column test: add test for decimal and pruning for decimal column Jul 25, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb
Copy link
Contributor

alamb commented Jul 25, 2022

Thanks @liukun4515

@liukun4515 liukun4515 requested a review from tustvold July 26, 2022 07:26
@alamb alamb merged commit 0f19990 into apache:master Jul 26, 2022
@ursabot
Copy link

ursabot commented Jul 26, 2022

Benchmark runs are scheduled for baseline = f386f7a and contender = 0f19990. 0f19990 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants