You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?
Describe the bug, including details regarding any error messages, version, and platform.
I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?
https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826
Component(s)
No response
The text was updated successfully, but these errors were encountered: