-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.14.0 Regression] scan_delta on empty partitioned tables fails #19854
Comments
Btw I was having issues with delta-rs scanner and moving to the new version of the reader fixes them, so for those interested, this serve as a DIRTY workaround meanwhile ;_; :
|
Hi, I have an issue that is likely related: filtering after scan_delta gets me a ShapeError with polars 1.14.0 and not polars 1.13.1 |
did you try with use_pyarrow=True? that's likely the old behaviour, I changed that in my code to rollback to 1.13.1 behaviour for now |
That solves the issue! Thank you for your help! |
@AdrienDart you should still create an issue, it might be a different thing |
I was having some issues with delta reader and schema evolution (as of now if a parquet file has a nested struct and a new field in the struct gets added ONLY TO THE delta schema, i.e. no parquet writes are done). The schema won't be found. I modified the scan_delta function (locally / in my wrappers). This fixes two issues:
I don't have time to PR it as of now (trying to fix my things locally...) and I'm using it in my "architecture-wrapper", but if someone wants to propose it as a fix before I have time... feel free to use the code This is the "Last part" of scan_delta within "delta.py" code of 1.14.0 polars as seen here
|
Fixes pola-rs#19854 Fixes: 1. The issue with empty tables not fully complying to the schema + also parquet files not FULLY complying to the delta schema 2. Issues with fields missing in nested structs of the parquet files, but added to the delta schema, now they will succeed Questions: 3. I'm not sure if diagonal_relaxed is what you want as "polars" (it works for me locally of course), as it might have side-effects on data sizes... they should be minor though, as in theory the parquet files within a delta table should be compatible with the schema 4. Should we rechunk also in the concat? I think its not needed as its just an empty_dataframe + the real already-rechunked dataframe
@ion-elgreco I would like to but I'm currently failing to create a 'minimal reproducible code' with a simple example.
and I get ShapeError('unable to vstack, columns names don't match: ... ) |
Checks
Reproducible example
test_table.zip
Log output
or even worse if I remove the filter active
With polars 1.13.1
Issue description
I come from here #19103
New changes to delta reader broke reading on empty tables (i.e. most of my tests :) )
Expected behavior
Partition filter working as expected
Installed versions
The text was updated successfully, but these errors were encountered: