-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support byte array for decimal in parquet page and row group filters #4742
support byte array for decimal in parquet page and row group filters #4742
Conversation
@@ -627,6 +638,56 @@ mod tests { | |||
); | |||
|
|||
// TODO: BYTE_ARRAY support read decimal from parquet, after the 20.0.0 arrow-rs release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment can probably be removed as well
.collect(); | ||
Some(Arc::new(array)) | ||
} | ||
Index::BYTE_ARRAY(index) => match $self.target_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this code is covered? Is there some test that does page filtering on Decimal values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because there is no UT for the the page filter
, but I can add some test cases in the follow up pr for the page filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false, | ||
)], | ||
); | ||
let rgm2 = get_row_group_meta_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth adding a test for row group metadata that has nulls for min and / or max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you means we need add null
value in the decimal column to check min/max?
I think we can refactor the test cases to do that in the follow up pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth adding a test for row group metadata that has nulls for min and / or max?
do you means the min or the max value is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add the metadata for one row group like bellow
vec![ParquetStatistics::int32(
None,
Some(600),
None,
0,
false,
)],
or
vec![ParquetStatistics::int32(
Some(100),
None,
None,
0,
false,
)],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you means the min or the max value is null?
Yes this is what I had in mind
Thank you @liukun4515 |
Benchmark runs are scheduled for baseline = 54ae432 and contender = f91f623. f91f623 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?