-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Union dtype support #10827
Comments
As a data analysis library I think having native union types would result in a lot of very annoying corner-cases for... pretty much any feature ever. Almost every part of the code base would have to accommodate them in some fashion and I personally am not convinced that this is a good idea. Could you give an example of how you would want to use them? |
Classic examples would be analysis or ETL. It's not always that you control the upstream generators of data. For instance someone might be lucky enough to interact with an API that implements union-types, or need to consume a file created by another system, e.g., duckdb (https://duckdb.org/docs/sql/data_types/union.html). Parquet also has an open RFC for supporting this (https://issues.apache.org/jira/browse/PARQUET-756, apache/parquet-format#44) though it seems somewhat stale. It might not be feasible in polars, I don't have the depth in the rust implementation to have a take on this. |
I would not be opposed to adding some method of ingesting data that has union types, allowing some choice to be made regarding how to convert it to a Polars type. I'm mainly hesitant in having union types inside Polars itself. |
I don't have strong opinions on the level of support, but for context our main issue was not analysis / modification of data, but during IO where we had issues due to lack of support of union types (read & write parquet/other). |
Yeah I think this would be useful to support ingestion of at least, maybe into a struct. Polars should probably support reading all of arrow/parauet's physical types |
Problem description
Hi, I looked through the docs and code following an error we had trying to serialize/deserialize union-typed data with polars.
From the data types documentation, it seems that the arrow union type is not implemented/supported (https://arrow.apache.org/docs/format/Columnar.html#union-layout, https://arrow.apache.org/docs/python/generated/pyarrow.UnionType.html)
This is a core data type, so I guess it makes sense to request it / support it. I didn't find any other issues mentioning this.
The text was updated successfully, but these errors were encountered: