-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
write_parquet encoding no longer recognized by PBI Service parquet connector after Polars 1.5.0 onwards #18819
Comments
My guess is that this has to do with the Boolean Hybrid-RLE encoding. |
Yes, this is it exactly. When I drop my boolean columns from my parquet file in the latest Polars, PBI Service refreshes the file successfully. |
It seems that the service only supports older parquet formats/encodings. For now you can circumvent the issue by writing via pyarrow which allows you to select different encodings. This is something we could also support to a limited extend. |
Writing with pyarrow for the meantime worked. I tried creating an issue with the PowerBI team, but it got caught with their triaging vendor who was claiming it had to do with Polars and not Power BI so they wouldn't escalate it to the product team and recommended I downgrade instead. |
I came across the same error. Here's how I worked around it:
|
I ended up using the following, as I wanted to keep booleans in their proper data type:
In fact, my file was compressing more than Polars when I tested writing compression='zstd' and compression_level=22 in both at the time. Hopefully Polars adds a way to make itself compatible with PBI service when booleans are in the dataset in the future, as I almost guarantee based off of my awful experience dealing with their "support team" that they will not fix it on their end. |
This same issue breaks Amazon Redshift import from parquet files. Working around for now with pyarrow. |
This is not really ready yet unless we have compatibility profiles. Fixes pola-rs#18819.
Checks
Reproducible example
Log output
No response
Issue description
Just to explain the setup a bit:
Parquet gets written to a network drive. Report published to PBI Service connects to this parquet file using an on-premises gateway.
Refreshing works on local copy of PBI file, but through PBI Service specifically, it is now giving an error:
Data source error: {"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error":{"code":"DM_GWPipeline_Gateway_MashupataAccessError","parameters":{},,"details":[{"code":"DM_errorDetailNameCode_UnderlyingErrorCode","detail":{"type":1,"value":"-2147467259"}},{"code":"DM_ErrorDetailNameCode_UnderlyingErrorMessage","detail":{"type":1,"value":"Parquet: class parquet::ParquetException (message: 'Unknown encoding type.'"}}, {"code": "DM_ErrorDetailNameCode_UnderlyingHResult", "detail":{"type":1,"value":"-2147467259"}},"code":"Microsoft.Data.Mashup.ValueError.Reason","detail":{"type":1,"value":"DataFormat.Error"}}]"eceptionCulprit":1}}}
This refreshes fine locally -- the problem is PBI Service specifically. I tested generating my parquet files version to version from Polars 1.2 up until current, and I start getting these messages as of Polars 1.5.0's write_parquet specifically.
I believe something changed specifically in the write_parquet output that is causing it to no longer be compatible with the PBI Service's parquet connector in newer versions. I have analyzed the schema and the meta data and they are exactly the same in the old output versus new output.
Expected behavior
As nothing has changed in my schema or meta data, the files should be refreshing, but it seems like write_parquet's encoding is not recognized by PBI Service as of 1.5.0 onwards.
Installed versions
The text was updated successfully, but these errors were encountered: