Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3589: Add support for Parquet files similar to CSV/Arrow #3711

Merged
merged 1 commit into from
Aug 21, 2023

Conversation

vga91
Copy link
Collaborator

@vga91 vga91 commented Aug 3, 2023

Fixes #3589

This PR adds the support for Apache Parquet export/import/load


  • Added 4 export procedures that streams a list of byte[] one per each batch: apoc.export.parquet.all.stream, apoc.export.parquet.graph.stream, apoc.export.parquet.query.stream, apoc.export.parquet.data.stream

  • Added 4 export procedures which create a Parquet file and return a ProgressInfo result, like the CSV ones: apoc.export.parquet.all, apoc.export.parquet.graph, apoc.export.parquet.query, apoc.export.parquet.data

  • Added one load procedure apoc.load.parquet that reads a Parquet byte[] or a Parquet file and returns a map for each row

  • Added one import procedure apoc.import.parquet that import data from a Parquet byte[] or a Parquet file

In order to load/import complex data not recognized by parquet, like Duration, Point, List of Duration, etc... , which will be stringified during export,
we can use the mapping: {keyToConvert: valueTypeName} config to convert them.
For example apoc.import.parquet(fileName, {mapping: {foo: "DurationArray"}}) in order to convert a key foo to a List of Duration


Created a follow-up card to create doc files and any other additions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Parquet files similar to CSV/Arrow
2 participants