-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Clean up Parquet generic and internal readers #12102
Parquet: Clean up Parquet generic and internal readers #12102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is great, thanks for splitting into multiple commits, that made it easy to review. We'll need to update revAPI with the new ParquetValueReader.createStructReader
interface which doesn't take in explicit types
protected abstract ParquetValueReader<T> createStructReader( | ||
List<Type> types, List<ParquetValueReader<?>> fieldReaders, Types.StructType structType); | ||
List<ParquetValueReader<?>> fieldReaders, Types.StructType structType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RevAPI is failing due to the new abstract method that implementations will need to implement (which I understand the rationale, the previous types argument were not used). I think we'll need to add the breaking change to revAPI
./gradlew :iceberg-parquet:revapiAcceptBreak --justification "Implementations of ParquetValueReader.createStructReader should not have to pass in explicit types" \
--code "java.method.abstractMethodAdded" \
--new "method org.apache.iceberg.parquet.ParquetValueReader<T> org.apache.iceberg.data.parquet.BaseParquetReaders<T>::createStructReader(java.util.List<org.apache.iceberg.parquet.ParquetValueReader<?>>, org.apache.iceberg.types.Types.StructType)"
Thanks for cleaning this part of the code. I can work on moving parquet to core (already had a PR long back) after this to avoid conflicts for this PR. |
parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java
Show resolved
Hide resolved
parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java
Outdated
Show resolved
Hide resolved
Thanks @rdblue and @ajantha-bhat for reviewing! |
// this will throw an UnsupportedOperationException | ||
return Optional.empty(); | ||
} | ||
Preconditions.checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wanted to throw the exception when it is signed right?
Now it throws when it is unsigned. check should be inverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, got it. Based on the error message, we should not read unsigned. So, previous check was wrong!
While fixing nits, we actually fixed the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have lot of gaps in test coverage for parquet readers like milli-time and timestamp reading, int96 reading, all kinds of signed, unsigned reads. I will create a ticket to follow this up.
This is a refactor that cleans up a few issues I noticed while reviewing #11904 and while working on Parquet variant readers.
ParquetValueReaders
to keep implementations privatetypes
argument fromParquetValueReader.StructReader
subclasses. Some accessible uses had to be left, but most are removed. This allowed removing types from the reader builders as well.TimeUnit
param from factory methods added in Parquet: Add readers and writers for the internal object model #11904setPageSource
LogicalTypeAnnotation
classes to cut down on multi-line method signaturesChanges are grouped into separate commits for easier review.