-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reading orc column statistics #6588
Conversation
✅ Deploy Preview for meta-velox canceled.
|
8b689ff
to
70fa5e9
Compare
@@ -97,9 +97,9 @@ std::unique_ptr<BinaryStripeStreams> BinaryStreamReader::next() { | |||
stripeReaderBase_, columnSelector_, stripeIndex_++); | |||
} | |||
|
|||
std::unordered_map<uint32_t, proto::ColumnStatistics> | |||
std::unordered_map<uint32_t, ColumnStatisticsWrapper> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some problem here, the wrapper is just a reference but here we are returning actual values. If ORC does not need it, can we keep the signature and make it DWRF only (with a check)?
@@ -111,7 +111,8 @@ BinaryStreamReader::getStatistics() const { | |||
"Corrupted file detected, Footer stats are missing, but stripes are present"); | |||
for (auto node = 0; node < typesSize; node++) { | |||
if (columnSelector_.shouldReadNode(node)) { | |||
stats[node] = proto::ColumnStatistics(); | |||
const proto::ColumnStatistics cs = proto::ColumnStatistics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta Thank you for your review, It should be that temporary local variables are used here, I will modify it.
7dd28a8
to
9c15412
Compare
auto cs = | ||
google::protobuf::Arena::CreateMessage<proto::ColumnStatistics>( | ||
stripeReaderBase_.getReader().arena()); | ||
stats.emplace(node, ColumnStatisticsWrapper(cs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have the same problem, I think for this function you should keep it as it is, and create another one returning proto::orc::ColumnStatistics
if it is required by ORC as well. Wrappers can only be created on call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta I looked at the code again, ORC does not use this method. I'll keep the previous code logic. Thank you for testing this.
2af7412
to
c5c5986
Compare
@@ -436,12 +766,18 @@ class FooterWrapper : public ProtoWrapperBase { | |||
return dwrfPtr()->statistics(); | |||
} | |||
|
|||
const ::facebook::velox::dwrf::proto::ColumnStatistics& statistics( | |||
const ::facebook::velox::dwrf::proto::ColumnStatistics& statisticsByIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name is dwrfStatistics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
0d21b06
to
4b3960e
Compare
Hi @Yuhta Do you have any other comment for this PR? thank you. |
Hi @Yuhta PTAL, Thank you. |
6b3e52d
to
5deef28
Compare
0fbf1b2
to
d886334
Compare
HI @kevinwilfong I moved the implementation of |
5ec1137
to
cfc7d0c
Compare
@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@kevinwilfong merged this pull request in 0c0a973. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Currently, the dwrf module only supports reading the column statistics of dwrf format, and does not support reading the column statistics of orc format. When I use velox to read the orc table, the following exception occurs:
with this pr, we support read orc column statistics, than we can read orc data through velox:
CC: @Yuhta