Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: add test case for user-defined function taking correct path in groupby transform #29631

Merged
merged 7 commits into from
Nov 20, 2019

Conversation

jbrockmendel
Copy link
Member

I kept this one in place because I thought fast_path could be user-defined (see comment on L1425), but it turns out we define it in _define_paths a few lines above. So this try/except is unnecessary.

@jorisvandenbossche
Copy link
Member

it turns out we define it in _define_paths a few lines above

But the func in _define_paths can still be a UDF?

@jreback
Copy link
Contributor

jreback commented Nov 15, 2019

@jbrockmendel these are all failing for the web / docs stuff?

@jbrockmendel
Copy link
Member Author

But the func in _define_paths can still be a UDF?

Yes, but the sole call to _choose_path is itself wrapped in a try/except TypeError.

It's entirely possible that I'm missing something. In recent threads you've shown a talent for finding corner cases that are currently missing from the tests. Any ideas for what cases will raise here that we should worry about?

@jbrockmendel
Copy link
Member Author

these are all failing for the web / docs stuff?

Not sure yet whats going on there. Could well be an example we can/should add to the tests!

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 18, 2019

Not sure yet whats going on there. Could well be an example we can/should add to the tests!

Probably yes, it's in any case a transform with a UDF that is failing

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche i found the case from cookbook.rst that is failing in this PR, but it isn't clear to me that it is behaving correctly on master either. I'd like to get a second opinion.

df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, -1, 1, 2]})
gb = df.groupby('A')

def replace(g):
    mask = g < 0
    return g.where(mask, g[~mask].mean())

result = gb.transform(replace)

>>> result
     B
0  1.0
1 -1.0
2  1.5
3  1.5

I expected result.loc[1] to match result.loc[0]. Can you double-check me on this?

@jorisvandenbossche
Copy link
Member

It's what the function seems to be doing:

In [22]: group1 = gb.get_group(1)

In [23]: replace(group1['B']) 
Out[23]: 
0    1
1   -1
Name: B, dtype: int64

So maybe the function is doing something unexpected, but it seems the transform is correctly using the result of the function on the group.

I find where always a difficult one to reason about, but basically this where call is keeping the original negative values, and replacing the positive values with the mean of those positive values. So in this case, that result seems correct to me (as the mean of the positive values is also 1).

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche for the immediate purpose of this PR, I'm implementing a test case where fast_path raises, but I don't want to use a not-clearly-correct case. Can you suggest one? (this might be obvious, but i need caffeine)

The follow-on issues are a) making sure the example in the docs is correct and b) making sure where is correct.

@jorisvandenbossche
Copy link
Member

As far as I understand, the example in the docs is correct

@jbrockmendel
Copy link
Member Author

Updated. Now the behavior is unchanged, just added a test that requires the current behavior. Thanks for helping me understanding it @jorisvandenbossche


def replace(g):
mask = g < 0
return g.where(mask, g[~mask].mean())
Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be good to use a simpler function to test here (otherwise somebody later looking at the test will have the same "what the heck is this UDF doing" thought as we had now). It just needs to be a function that accepts a Series / returns a Series of the same shape, I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jorisvandenbossche jorisvandenbossche changed the title REF: no need to catch Exception in choose_path TST: add test case for user-defined function taking correct path in groupby transform Nov 20, 2019
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@jreback jreback added Groupby Testing pandas testing functions or related to the test suite labels Nov 20, 2019
@jreback jreback added this to the 1.0 milestone Nov 20, 2019
@jreback jreback merged commit 7e2bbd1 into pandas-dev:master Nov 20, 2019
@jreback
Copy link
Contributor

jreback commented Nov 20, 2019

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the faster-choose_path branch November 20, 2019 15:16
jacobaustin123 pushed a commit to jacobaustin123/pandas that referenced this pull request Nov 20, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants