Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby for list and struct type columns #4175

Closed
peterlietz opened this issue Jul 29, 2022 · 5 comments · Fixed by #20999
Closed

groupby for list and struct type columns #4175

peterlietz opened this issue Jul 29, 2022 · 5 comments · Fixed by #20999
Labels
enhancement New feature or an improvement of an existing feature

Comments

@peterlietz
Copy link

Thank you for this absolutely wonderful library!

I'm afraid I hit a snag. What I tried to do was to group by a nested data type, as in:

df = pl.DataFrame({"a": [1, 2, 3], "b": [[1, 3, 4], [2, 4, 6], [17]]})
df.groupby("b").agg(pl.sum("a"))

This results in a not implemented panic.

I'm curious as to whether this is simply not implemented yet or whether this would contradict the underlying philosophy of polars.

Best regards
Peter

@ritchie46
Copy link
Member

We do not support grouping by a column of type list. I think we should improve the error message on that.

@peterlietz
Copy link
Author

Thank you very much for the quick answer!

@pepelovesvim
Copy link

We do not support grouping by a column of type list. I think we should improve the error message on that.

@ritchie46 what do you think should be the error that comes out? DataTypeMisMatch?

@ritchie46
Copy link
Member

I think a ComputeError would be most consistent.

For structs we could temporarily unnest -> do the groupby -> and nest again.

@peterlietz
Copy link
Author

Just in case anybody else stumbles upon this, the workaround I am now using is to convert to "str". Not ideal, but does the trick.

df = pl.DataFrame({"a": [1, 2, 3], "b": [[1, 3, 4], [2, 4, 6], [17]]})
df = df.with_column(pl.col("b").arr.eval(pl.element().cast(pl.Utf8)).arr.join("|"))
df.groupby("b").agg(pl.sum("a"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants