groupby for list and struct type columns #4175

peterlietz · 2022-07-29T09:04:37Z

Thank you for this absolutely wonderful library!

I'm afraid I hit a snag. What I tried to do was to group by a nested data type, as in:

df = pl.DataFrame({"a": [1, 2, 3], "b": [[1, 3, 4], [2, 4, 6], [17]]})
df.groupby("b").agg(pl.sum("a"))

This results in a not implemented panic.

I'm curious as to whether this is simply not implemented yet or whether this would contradict the underlying philosophy of polars.

Best regards
Peter

The text was updated successfully, but these errors were encountered:

ritchie46 · 2022-07-29T09:13:21Z

We do not support grouping by a column of type list. I think we should improve the error message on that.

peterlietz · 2022-07-29T09:17:16Z

Thank you very much for the quick answer!

pepelovesvim · 2022-07-29T20:19:26Z

We do not support grouping by a column of type list. I think we should improve the error message on that.

@ritchie46 what do you think should be the error that comes out? DataTypeMisMatch?

ritchie46 · 2022-07-29T21:13:44Z

I think a ComputeError would be most consistent.

For structs we could temporarily unnest -> do the groupby -> and nest again.

peterlietz · 2022-07-30T16:52:43Z

Just in case anybody else stumbles upon this, the workaround I am now using is to convert to "str". Not ideal, but does the trick.

df = pl.DataFrame({"a": [1, 2, 3], "b": [[1, 3, 4], [2, 4, 6], [17]]})
df = df.with_column(pl.col("b").arr.eval(pl.element().cast(pl.Utf8)).arr.join("|"))
df.groupby("b").agg(pl.sum("a"))

peterlietz added the feature label Jul 29, 2022

avimallu mentioned this issue Jun 26, 2023

Group by crashing with pl.Array #9559

Closed

2 tasks

stinodego added enhancement New feature or an improvement of an existing feature and removed feature labels Jul 14, 2023

This was referenced Mar 3, 2024

Joining on list columns is not implemented #14826

Open

Unintuitive behavior when hashing list[cat] columns #14829

Open

lukemanley mentioned this issue Jan 30, 2025

test: Add tests for resolved issues #20999

Merged

ritchie46 closed this as completed in #20999 Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupby for list and struct type columns #4175

groupby for list and struct type columns #4175

peterlietz commented Jul 29, 2022

ritchie46 commented Jul 29, 2022

peterlietz commented Jul 29, 2022

pepelovesvim commented Jul 29, 2022

ritchie46 commented Jul 29, 2022

peterlietz commented Jul 30, 2022

groupby for list and struct type columns #4175

groupby for list and struct type columns #4175

Comments

peterlietz commented Jul 29, 2022

ritchie46 commented Jul 29, 2022

peterlietz commented Jul 29, 2022

pepelovesvim commented Jul 29, 2022

ritchie46 commented Jul 29, 2022

peterlietz commented Jul 30, 2022