-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Delta Lake TableProvider #525
Comments
FYI @houqp what do you think of integrating this into DataFusion? |
fwiw, imo this should be discussed over the mailing list. |
I agree, if we have some positive reactions I will send something over the mailing list. |
I am all for this. I think this is a good move, especially for ballista. I am happy to help maintain the deltalake support in datafusion going forward as well. If we go with this route, I would like to drop the table provider implementation in I am also planning to promote datafusion as the default query engine for executing native delta lake queries in |
I like this approach, and I think there might be other approaches to adding IO support to datafusion. How about separate crates implementing functionality through traits, then having a contrib section in the README listing them? |
Delta Sharing may be a pragmatic alternative. It appears to be nothing more than a small REST API for Parquet catalogs (the client fetches the data directly from S3, etc). The propaganda is that this is intended to be a data exchange protocol, so not tied directly to any particular product. |
I think we can close this now since the table provider has already been implemented in https://github.com/delta-io/delta-rs/blob/8c67d78c8c67fdc9dd16c7e0d1fa9867ae7c1a5d/rust/src/delta_datafusion.rs#L322 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Delta is used more and more as a storage format, as it has some nice features like ACID transactions, collection of table statistics and storage optimization.
Describe the solution you'd like
Use
delta-rs
to add support for reading delta datasets. The library already has aTableProvider
(which might be used for inspiration) and some other features like bin packing.Describe alternatives you've considered
Additional context
Mailing List Thread: https://lists.apache.org/thread.html/r334e90fb7c53930272f264b66aaf2911ba778e55ef4e41f6a938f514%40%3Cdev.arrow.apache.org%3E
The text was updated successfully, but these errors were encountered: