-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't write + read null Array
type data
#15130
Comments
I'm not able to reproduce this issue with JSON write+read: import polars as pl
pl.show_versions()
df = pl.DataFrame(
[
pl.Series("Array_1", [[1, 3], [2, 5]]),
pl.Series("Array_2", [[1, 7, 3], None]),
],
schema={
"Array_1": pl.Array(pl.Int64, 2),
"Array_2": pl.Array(pl.Int64, 3),
},
)
print(df)
df.write_json("repro.json")
df = pl.read_json("repro.json")
print(df) Output:
|
With only a single column, we can see the null entry is dropped instead:
|
This appears to be a more fundamental limitation with arrow + parquet: apache/arrow#24425. Until this is supported, I think failing the |
Had the same issue, if this a fundamental limitation of the pl.Array type, I think it should be properly mentioned in the docs? |
This is fixed and tested via |
Checks
Reproducible example
Log output
Issue description
Null
Array
entries are dropped when the Parquet file is written.Expected behavior
The dataframe should be identical after writing to and reading from parquet.
Installed versions
The text was updated successfully, but these errors were encountered: