-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single File Per ParquetExec, AvroExec, etc... #2293
Comments
Do you mean pub struct PartitionedFile {
/// Path for the file (e.g. URL, filesystem path, etc)
pub file_meta: FileMeta,
/// Values of partition columns to be appended to each row
pub partition_values: Vec<ScalarValue>,
/// An optional file range for a more fine-grained parallel execution
pub range: Option<FileRange>,
} Another question: regarding we do a one-to-one mapping between Spark physical plan and DataFusion physical plan, and serde the plan with proto, is this possible we do this |
Yes, although removing the partition_values is likely follow up work
I would rather keep the translation logic out of the file format specific operators, but having a free function that can be called by |
Sounds great! |
This sounds like a great idea. The serial file processing in |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Part of #2079
Following on from #2292 and #2291 it should be possible to pull the multi-file handling out of each individual file operator, and delegate it to the physical plan. As described in #2079 this will greatly simplify the implementations, whilst also hiding fewer details from the physical plan.
Describe the solution you'd like
Currently a FileScanConfig would result
ListingTable::scan
generating a physical plan that looks something likeI propose instead generating something like
Whilst this is more complex, it results in less complexity in the file format operators, and should hopefully lead to less bugs due to things like #2170 or #2000
Describe alternatives you've considered
We could not do this
FYI @thinkharderdev @matthewmturner
The text was updated successfully, but these errors were encountered: