Incorrect results using multiple buckets with Azure Data Lake Storage URI #20347
Closed
2 tasks done
Labels
A-io-cloud
Area: reading/writing to cloud storage
accepted
Ready for implementation
bug
Something isn't working
P-high
Priority: high
python
Related to Python Polars
Checks
Reproducible example
Where
file1.parquet
exists only inbucket1
Log output
No response
Issue description
When using an Azure Data Lake Storage URI1, the first bucket that gets used becomes hardcoded into the polars object store cache. Subsequent scan operations will ignore the bucket specified in the URI and always uses the first bucket.
Expected behavior
The 2nd
scan_parquet
in the example should fail, as the file does not exist inbucket2
, instead of incorrectly scanning frombucket1
The resolved path should also include the
bucket@
segment:Installed versions
1.17.1
Footnotes
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri ↩
The text was updated successfully, but these errors were encountered: