-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct with decimals not read properly in parquet #16692
Comments
Also read/write between the df = pl.DataFrame(data)
df.write_parquet("test.parquet", use_pyarrow=True)
df == pl.read_parquet("test.parquet")
|
This is fixed in |
@cmdlineluser I completely didn't catch that there was a release just 2 days ago... Just upgraded. |
@cmdlineluser Still broken for read operations when the internal values are Decimals AND some other type. from decimal import Decimal
print(pl.__version__)
pl.Config.activate_decimals(True)
df = pl.DataFrame(
[
{
"tiers": [
{
"in_tier": 10.0,
"overage_cents": Decimal("0E-12"),
},
{
"in_tier": 0.0,
"overage_cents": Decimal("0E-12"),
},
]
},
{
"tiers": [
{
"in_tier": 10.0,
"overage_cents": Decimal("0.001000000000"),
}
]
},
]
)
print(df.schema)
df.write_parquet("tiers.parquet")
pl.read_parquet("tiers.parquet") |
Additionally, the decimal values inside the struct aren't being written or read from the file... from decimal import Decimal
print(pl.__version__)
pl.Config.activate_decimals(True)
df = pl.DataFrame(
[
{
"tiers": [
{
# "in_tier": 10.0,
"overage_cents": Decimal("0E-12"),
},
{
# "in_tier": 0.0,
"overage_cents": Decimal("0E-12"),
},
]
},
{
"tiers": [
{
# "in_tier": 10.0,
"overage_cents": Decimal("0.001000000000"),
}
]
},
]
)
print(df.schema)
print(df)
df.write_parquet("tiers.parquet")
print(pl.read_parquet("tiers.parquet"))
"""
0.20.31
OrderedDict([('tiers', List(Struct({'overage_cents': Decimal(precision=None, scale=12)})))])
| tiers |
| --- |
| list[struct[1]] |
|--------------------------------------|
| [{0.000000000000}, {0.000000000000}] |
| [{0.001000000000}] |
| tiers |
| --- |
| list[struct[1]] |
|-----------------|
""" |
D'oh - apologies. Just for reference, the previous report was (But wasn't decimal related.) |
@cmdlineluser no worries. Want me to open a separate issue for decimals specifically? |
All of the examples above seem to now be working on main and tested, e.g. |
Checks
Marked this as a python bug since that is where I encountered it however, I would expect the same bug to exist in Rust.
Reproducible example
Minimum reproducible example that I can figure out. Removal of ANY row/field or unnesting the top level struct results in a success.
Table
Log output
Issue description
I am attempting to write out a parquet file of data that I fetched from the Stripe api. The api json response is extremely nested. When writing the data structure in the example the write fails due to a differing number of children. If
use_pyarrow=True
is set then the write will be successful.From trial and error it seems to very specifically require a column which is a struct containing a struct field and a list field. Any values deeper than
col.struct.{struct,list}
don't appear to affect the outcome and the list can in fact be empty and it will still fail.Expected behavior
Dataframe should write to parquet successfully.
Installed versions
The text was updated successfully, but these errors were encountered: