-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: fix groupby std() with nullable dtypes #37433
REGR: fix groupby std() with nullable dtypes #37433
Conversation
At some point we need to properly support nullable dtypes in the cython groupby operations (by passing+using the mask), but for now this ensure it at least works (although performance will not be ideal) |
@@ -277,3 +277,38 @@ def test_read_only_buffer_source_agg(agg): | |||
expected = df.copy().groupby(["species"]).agg({"sepal_length": agg}) | |||
|
|||
tm.assert_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use all_numeric_reductions fixture (although slightly different) doesn't include sem
or count
but does include kurt
and skew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kurt
is not implemented for groupby, it seems. I mainly copied this from another test in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah can you create an issue for folks to replace these with fixtures (good first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorisvandenbossche lgtm pending green.
(Did try to bisect to be sure of changes that caused regression but 1.0.0.rc raised TypeError: cannot safely cast non-equivalent float64 to int32
)
rerun bisect with updated code sample for runner see #37415 (comment), so fix seems reasonable |
Ah, yes, sorry I should have mentioned, but I already narrowed it down to that PR. Thanks anyway! Ci failure is unrelated I think |
df2 = df.assign(B=df["B"].astype("float64")) | ||
expected = getattr(df2.groupby("A")["B"], op_name)() | ||
|
||
if op_name != "count": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah also an issue to fix this up (I think we have another issue about this) as this is slightly tricky, e.g. are we returning a nullable for the count itself (I don't know if we ever decided that)
thanks @jorisvandenbossche |
@meeseeksdev backport 1.1.x |
…able dtypes) (#37444) Co-authored-by: Joris Van den Bossche <[email protected]>
Fixes #37415