REGR: fix groupby std() with nullable dtypes #37433

jorisvandenbossche · 2020-10-26T21:56:27Z

jorisvandenbossche · 2020-10-26T21:57:50Z

At some point we need to properly support nullable dtypes in the cython groupby operations (by passing+using the mask), but for now this ensure it at least works (although performance will not be ideal)

simonjayhawkins · 2020-10-27T10:20:09Z

pandas/tests/groupby/aggregate/test_cython.py

@@ -277,3 +277,38 @@ def test_read_only_buffer_source_agg(agg):
    expected = df.copy().groupby(["species"]).agg({"sepal_length": agg})

    tm.assert_equal(result, expected)
+
+
+@pytest.mark.parametrize(


could use all_numeric_reductions fixture (although slightly different) doesn't include sem or count but does include kurt and skew

kurt is not implemented for groupby, it seems. I mainly copied this from another test in this file.

ok for now I guess.

yeah can you create an issue for folks to replace these with fixtures (good first)

simonjayhawkins

Thanks @jorisvandenbossche lgtm pending green.

(Did try to bisect to be sure of changes that caused regression but 1.0.0.rc raised TypeError: cannot safely cast non-equivalent float64 to int32)

simonjayhawkins · 2020-10-27T11:39:24Z

(Did try to bisect to be sure of changes that caused regression but 1.0.0.rc raised TypeError: cannot safely cast non-equivalent float64 to int32)

rerun bisect with updated code sample for runner see #37415 (comment), so fix seems reasonable

jorisvandenbossche · 2020-10-27T11:53:36Z

Ah, yes, sorry I should have mentioned, but I already narrowed it down to that PR. Thanks anyway!

Ci failure is unrelated I think

jreback · 2020-10-27T12:38:04Z

pandas/tests/groupby/aggregate/test_cython.py

+    df2 = df.assign(B=df["B"].astype("float64"))
+    expected = getattr(df2.groupby("A")["B"], op_name)()
+
+    if op_name != "count":


yeah also an issue to fix this up (I think we have another issue about this) as this is slightly tricky, e.g. are we returning a nullable for the count itself (I don't know if we ever decided that)

jreback · 2020-10-27T12:38:18Z

thanks @jorisvandenbossche

simonjayhawkins · 2020-10-27T12:50:46Z

@meeseeksdev backport 1.1.x

…types

…able dtypes) (#37444) Co-authored-by: Joris Van den Bossche <[email protected]>

jorisvandenbossche · 2020-10-29T19:30:59Z

For follow-ups, I opened #37493 for general "native" support for masked arrays in groupby algos, and #37494 for the specific issue to use the correct dtype for the result.

REGR: fix groupby std() with nullable dtypes

a0df763

jorisvandenbossche requested a review from rhshadrach October 26, 2020 21:56

jorisvandenbossche added this to the 1.1.4 milestone Oct 26, 2020

jorisvandenbossche added Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Oct 26, 2020

jorisvandenbossche mentioned this pull request Oct 26, 2020

BUG: groupby with std aggregation of pandas integer dtype throws exception: 'IntegerArray' object has no attribute 'reshape' #37415

Closed

1 task

simonjayhawkins reviewed Oct 27, 2020

View reviewed changes

simonjayhawkins approved these changes Oct 27, 2020

View reviewed changes

jreback reviewed Oct 27, 2020

View reviewed changes

jreback merged commit c642fda into pandas-dev:master Oct 27, 2020

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 27, 2020

Backport PR pandas-dev#37433: REGR: fix groupby std() with nullable d…

4cd824d

…types

meeseeksmachine mentioned this pull request Oct 27, 2020

Backport PR #37433 on branch 1.1.x (REGR: fix groupby std() with nullable dtypes) #37444

Merged

simonjayhawkins pushed a commit that referenced this pull request Oct 27, 2020

Backport PR #37433 on branch 1.1.x (REGR: fix groupby std() with null…

9a25cc1

…able dtypes) (#37444) Co-authored-by: Joris Van den Bossche <[email protected]>

jorisvandenbossche deleted the fix-groupby-std-nullable branch October 29, 2020 19:08

jorisvandenbossche mentioned this pull request Oct 29, 2020

ENH: improve the resulting dtype for groupby operations on nullable dtypes #37494

Open

arw2019 mentioned this pull request Oct 30, 2020

ENH: implement Kleene versions of GroupBy any/all kernels for nullable dtypes #37506

Closed

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

REGR: fix groupby std() with nullable dtypes (pandas-dev#37433)

57af43e

ukarroum pushed a commit to ukarroum/pandas that referenced this pull request Nov 2, 2020

REGR: fix groupby std() with nullable dtypes (pandas-dev#37433)

56cf009

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: fix groupby std() with nullable dtypes #37433

REGR: fix groupby std() with nullable dtypes #37433

jorisvandenbossche commented Oct 26, 2020

jorisvandenbossche commented Oct 26, 2020

simonjayhawkins Oct 27, 2020 •

edited

Loading

jorisvandenbossche Oct 27, 2020

simonjayhawkins Oct 27, 2020

jreback Oct 27, 2020

simonjayhawkins left a comment

simonjayhawkins commented Oct 27, 2020

jorisvandenbossche commented Oct 27, 2020

jreback Oct 27, 2020

jreback commented Oct 27, 2020

simonjayhawkins commented Oct 27, 2020

jorisvandenbossche commented Oct 29, 2020

REGR: fix groupby std() with nullable dtypes #37433

REGR: fix groupby std() with nullable dtypes #37433

Conversation

jorisvandenbossche commented Oct 26, 2020

jorisvandenbossche commented Oct 26, 2020

simonjayhawkins Oct 27, 2020 • edited Loading

Choose a reason for hiding this comment

jorisvandenbossche Oct 27, 2020

Choose a reason for hiding this comment

simonjayhawkins Oct 27, 2020

Choose a reason for hiding this comment

jreback Oct 27, 2020

Choose a reason for hiding this comment

simonjayhawkins left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Oct 27, 2020

jorisvandenbossche commented Oct 27, 2020

jreback Oct 27, 2020

Choose a reason for hiding this comment

jreback commented Oct 27, 2020

simonjayhawkins commented Oct 27, 2020

jorisvandenbossche commented Oct 29, 2020

simonjayhawkins Oct 27, 2020 •

edited

Loading