Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Parquet files similar to CSV/Arrow #3589

Closed
jexp opened this issue May 23, 2023 · 2 comments · Fixed by #3711
Closed

Add support for Parquet files similar to CSV/Arrow #3589

jexp opened this issue May 23, 2023 · 2 comments · Fixed by #3711
Assignees

Comments

@jexp
Copy link
Member

jexp commented May 23, 2023

Sorry, had forgotten to create the issue for this.

Parquet files also provide already schema information, which should help with smoothing the import.

And we should add support for pushdown predicates and column selection to minimize the amount of data that is loaded from the parquet infrastructure.

@michael-simons
Copy link
Contributor

Here's some inspiration michael-simons/neo4j-load-parquet@0d87e79

Feel free to copy what you need. I used the least invasive library I could find for reading Parquet in Java. If you use the default Avro, you get Hadoop, Spark, a banana, the monkey holding the banana and a bit of the jungle where it lives…

Ping me if you have questions, @conker84 this has been done for a PoC how fast we can get data into the database.

@conker84
Copy link
Collaborator

@michael-simons thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants