You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
importpolarsasplfrompolars.testingimportassert_series_equalvalues= [[1, 2], None, [4, 5], [None, 3]]
# A list series:list_series=pl.Series(values, dtype=pl.List(pl.Int64()))
# An array series with the same values:array_series=pl.Series(values, dtype=pl.Array(pl.Int64(), 2))
# Convert the array series into a list series:list_from_array_series=array_series.cast(list_series.dtype)
# These two series are seemingly identical:assert_series_equal(list_series, list_from_array_series)
# BUT! When we encode them, the encoded result is different:defrow_encode(series: pl.Series) ->pl.Series:
df=pl.DataFrame({"a": series})
returndf._row_encode([(False, False, True)])
encoded1=row_encode(list_series)
encoded2=row_encode(list_from_array_series)
# Using encoded1.dtype or encoded1.to_list() panics in different ways (one# of them says "entered unreachable code"!!!) However, the _way_ the# list_from_array_series gets row-encoded as bytes is something I've seen# with pure Rust that doesn't use DataFrame._row_encode but rather# encode_rows_unordered() so I think it's pretty low-level issue.print(encoded1)
print(encoded2)
# Looking at the encoded bytes output from encoded2... it just looks wrong.# There's no 3 in there, for example.
Log output
We expect the two results to be the same, and yet when we run the above:
Two seemingly identical Series ought to have same row encoding, or at the very least semantically matching row encoding. However... they don't. I've seen this behavior with much lower API access, so I don't think it's due to DataFrame._row_encode(), I think it's the low level code, and from manual testing with Rust I suspect this happened in past few weeks.
Expected behavior
The output should identical in both cases.
Installed versions
This is latest git as of December 4, 2024, on Linux.
The text was updated successfully, but these errors were encountered:
This also suggests a need for more thorough unit testing, although to be fair I'm not sure I would ever think to write a test that looks like the reproducer 😀
Checks
Reproducible example
Log output
We expect the two results to be the same, and yet when we run the above:
Issue description
Two seemingly identical Series ought to have same row encoding, or at the very least semantically matching row encoding. However... they don't. I've seen this behavior with much lower API access, so I don't think it's due to
DataFrame._row_encode()
, I think it's the low level code, and from manual testing with Rust I suspect this happened in past few weeks.Expected behavior
The output should identical in both cases.
Installed versions
This is latest git as of December 4, 2024, on Linux.
The text was updated successfully, but these errors were encountered: