-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe memory issues with rolling
and group_by
#18525
Comments
Do we know what the underlying issue here ? Would be great to get this resolved :) |
I think exactly the same holds for the |
I dont get it, I get 0.2Gb. Is this fixed now as of 1.22.0? |
Got the same issue, is it actually fixed in 1.22.0? |
Hm, the memory issues are unchanged for me in 1.22.0. Are you sure that you commented out |
@MariusMerkleQC memory issue persists with 1.22.0, can confirm.
group_by=None: Peak memory in GB: 0.000143393874168396 Additionally, I also monitored the memory usage using htop. it seems like the usage is much beyond what's shown here. |
Rn I'm trying #21132 (comment) tip by @ritchie46 has not worked for me, the peak memory has stayed the same. Not very convinced I'm doing it right though |
Just forwarding a reply from Ritchie:
The expressions: e.g. df.with_columns(
pl.col("value").rolling_mean_by("timestamp", "5y").over("category")
) @gillan-krishna Perhaps you can try the expressions if they are suitable for your use case. |
This was a pathological memory explosion issue in the implemenation. This will be fixed in new release. |
@ritchie46 does the new implementation v1.23.0 require the columns to be sorted in a different manner than before? I'm getting a |
Your data is sorted within the group? Got a repro? (I might know what it is, just want to be sure). |
I am also getting import polars as pl
df = pl.DataFrame(
{
"n": [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10],
"col1": ["A", "B"] * 11,
}
)
print(df.rolling("n", period="1i", group_by="col1").agg()) |
I think #21444 is where it is being fixed. |
Yes, fixed in #21444 |
Checks
Reproducible example
Log output
No response
Issue description
When using
group_by
in therolling()
operation, memory consumption skyrockets, even for small data frames. When usingN=100_000
, the memory reaches the following peak:group_by=None
: 0.18 GBgroup_by="category"
: 18.01 GBExpected behavior
The peak memory when using
group_by=None
andgroup_by="category"
should be similar.Installed versions
The text was updated successfully, but these errors were encountered: