Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Reading Nested List Arrays #993

Closed
andrei-ionescu opened this issue Dec 2, 2021 · 7 comments · Fixed by #1588
Closed

Support Reading Nested List Arrays #993

andrei-ionescu opened this issue Dec 2, 2021 · 7 comments · Fixed by #1588
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@andrei-ionescu
Copy link

andrei-ionescu commented Dec 2, 2021

Describe the bug

When trying to read parquet files with deeply nested fields we end up with the following error:

Parquet reader thread terminated due to error: ParquetError(ArrowError("reading List(GroupType {
...
}) into arrow not supported yet"))

To Reproduce

This is easily visible from the code found at array_reader.rs#L1516-L1522

Expected behavior

To have support for reading nested parquet files into arrow.

Additional context

This issue, in my particular case, has been hidden under #1383.

@alamb
Copy link
Contributor

alamb commented Jan 21, 2022

This may have been fixed as there has been significant work on reading of nested structures by @helgikrs and others

@Igosuki
Copy link

Igosuki commented Feb 18, 2022

@alamb https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/array_reader.rs#L1622

@kesavkolla
Copy link
Contributor

The nested data structure support is been pending for long time. I am also eagerly looking for this support. I have a fairly large JSON data which I used apache spark to write to parquet files. I wanted to read them via rust arrow have no success.

@alamb
Copy link
Contributor

alamb commented Mar 2, 2022

Hi @kesavkolla -- could you possibly share an example file we could use to test any fix?

@Igosuki
Copy link

Igosuki commented Mar 2, 2022

@chauhanVritul
Copy link

chauhanVritul commented Mar 3, 2022

@Igosuki
Copy link

Igosuki commented Mar 3, 2022

Yes, this use case is supported in datafusion. For instance, you can read nested avro and json schemas, so the issue should be raised with arrow-rs, I added a comment here #720 with our files.

@alamb alamb added the parquet Changes to the parquet crate label Mar 7, 2022
@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog and removed bug labels Apr 18, 2022
@tustvold tustvold changed the title Reading List(...) into arrow not supported yet Support Reading Nested List Arrays Apr 18, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Apr 18, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Apr 18, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue Apr 18, 2022
@tustvold tustvold self-assigned this Apr 19, 2022
tustvold added a commit to tustvold/arrow-rs that referenced this issue May 4, 2022
tustvold added a commit that referenced this issue May 9, 2022
#1588)

* Add support for nested list arrays (#993)

* More tests

* Minor cleanup

* Filter nulls

* Update comments

* Fix doc

* Fix clippy

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <[email protected]>

* More tests

* Add sanity check to ListArrayReader

* Fix test_struct_array_reader

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants