Unable to set dictionary_page_offset when encoding_stats are missing #2962

mothukur · 2024-07-18T13:21:23Z

Describe the bug, including details regarding any error messages, version, and platform.

I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?

https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826

Component(s)

No response

wgtmac · 2024-07-22T01:53:11Z

Thanks for reporting the issue! I think there is a similar effort to resolve this issue but it looks more complicated than it appears: #1340

…ing_stats are missing

mothukur · 2024-09-16T06:01:00Z

I've submitted a PR with the fix. Could you please review it?

…ing (#3012)

mothukur added the Type: bug label Jul 18, 2024

mothukur added a commit to mothukur/parquet-java that referenced this issue Sep 13, 2024

apacheGH-2962: Fix to set dictionary_page_offset correctly when encod…

b33dbc0

…ing_stats are missing

mothukur added a commit to mothukur/parquet-java that referenced this issue Sep 13, 2024

apacheGH-2962: Fix to set dictionary_page_offset correctly when encod…

f495b59

…ing_stats are missing

mothukur mentioned this issue Sep 13, 2024

GH-2962: Set dictionary_page_offset even when encoding_stats are missing #3012

Merged

wgtmac pushed a commit that referenced this issue Sep 24, 2024

GH-2962: Set dictionary_page_offset even when encoding_stats are miss…

ac6a5a0

…ing (#3012)

wgtmac closed this as completed in #3012 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to set dictionary_page_offset when encoding_stats are missing #2962

Unable to set dictionary_page_offset when encoding_stats are missing #2962

mothukur commented Jul 18, 2024

wgtmac commented Jul 22, 2024

mothukur commented Sep 16, 2024

Unable to set dictionary_page_offset when encoding_stats are missing #2962

Unable to set dictionary_page_offset when encoding_stats are missing #2962

Comments

mothukur commented Jul 18, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

wgtmac commented Jul 22, 2024

mothukur commented Sep 16, 2024