Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: idxmax/min (and argmax/min) for Series with underlying ExtensionArray #37924

Merged
merged 24 commits into from
Jan 1, 2021
Merged

BUG: idxmax/min (and argmax/min) for Series with underlying ExtensionArray #37924

merged 24 commits into from
Jan 1, 2021

Conversation

tonyyyyip
Copy link
Contributor

@tonyyyyip tonyyyyip commented Nov 17, 2020

Currently pd.Series([1, 2, 3]).astype('Int8').idxmax() raises an error.
This request proposes a fix.
My first time using Github and opening a pull request. Apologise if it's no good.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tonyyyyip for the PR (and welcome to pandas!)

A few things to get this on the road:

  • is there are an issue in the tracker related to this? If no, let's open one.
  • Link the issue here (just mention #....)
  • Can you add tests to illustrate the new behavior? They should fail on master and pass on this branch. Tests are typically the first thing we look at when reviewing

Ping once you've got the tests up and we'll take another look. Don't hesistate to post any questions you might have at any point!

FYI We have more detailed info on opening PRs in the contributors' guide in case that's helpful now or later on

@pep8speaks
Copy link

pep8speaks commented Nov 18, 2020

Hello @tonyyyyip! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-31 01:56:35 UTC

@tonyyyyip
Copy link
Contributor Author

Thanks @arw2019!

This should solve #33719. Currently the code sample raises TypeError: argmax() takes 1 positional argument but 2 were given. This is because the implementation of argmax/min of ExtensionArray takes only 1 argument self but the "interface" passes an additional argument None to it.

Added test. Not sure if I have done right though. Thanks again.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on the test, lmk if you have any questions

@arw2019
Copy link
Member

arw2019 commented Nov 18, 2020

You also wanna look at the CI checks (in my experience easiest thing is to use the pre-commit tool locally)

@tonyyyyip
Copy link
Contributor Author

Thanks @arw2019. I am completely new to testing. I will check that out.

@jorisvandenbossche
Copy link
Member

This is possibly also related to #35178 (although that speaks about argmin/argmax, and not idxmin/idxmax)

@tonyyyyip tonyyyyip requested a review from arw2019 November 18, 2020 13:21
@tonyyyyip
Copy link
Contributor Author

This is possibly also related to #35178 (although that speaks about argmin/argmax, and not idxmin/idxmax)

Yes both the argmax and idxmax Series methods call the argmax method of the underlying _values with the additional argument None (for the axis), which is normally fine when the _values is an nparray, but breaks when it's an EA, because the current EA implementation of argmax does not take any additional arguments. This PR fixes that.

@arw2019
Copy link
Member

arw2019 commented Nov 19, 2020

This is possibly also related to #35178 (although that speaks about argmin/argmax, and not idxmin/idxmax)

Yes both the argmax and idxmax Series methods call the argmax method of the underlying _values with the additional argument None (for the axis), which is normally fine when the _values is an nparray, but breaks when it's an EA, because the current EA implementation of argmax does not take any additional arguments. This PR fixes that.

Can you add tests for argmax/argmin so we can decide if this resolves #35178 (if yes great!)

@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 19, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@tonyyyyip tonyyyyip changed the title BUG: idxmax/min for Series with underlying 'Int' datatype BUG: idxmax/min (and argmax/min) for Series with underlying ExtensionArray Dec 2, 2020
@tonyyyyip
Copy link
Contributor Author

Gentle ping to see if this is going anywhere.

Please let me know if my changes are not what what the requests wanted, as I might have misunderstood the requests.

Also I understand it was decided that argmax and idxmax should be separated because argmax is likely understated. I am happy to follow that philosophy here.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyyyyip I think the way forward here is to first get consensus on #33942 regarding the skipna issue so you know what behavior to implement.

Then we can decide where the code goes (into the Series impl or elsewhere) and bring this across the finish line

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine right as it is. If we decide to change #33942 it would just move logic here inside the EA, no external change. meanwhile this fixes the bug.

pls add a whatsnew note
does this have a reference issue?

@tonyyyyip
Copy link
Contributor Author

@jreback Added whatsnew entry and referenced issues. Thanks.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments on the tests

@jreback jreback added this to the 1.3 milestone Jan 1, 2021
@jreback jreback merged commit a380726 into pandas-dev:master Jan 1, 2021
@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

thanks @tonyyyyip very nice

@@ -2076,8 +2076,7 @@ def idxmin(self, axis=0, skipna=True, *args, **kwargs):
>>> s.idxmin(skipna=False)
nan
"""
skipna = nv.validate_argmin_with_skipna(skipna, args, kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer validating args/kwargs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry missed this comment

cc @tonyyyyip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think if we let idxmax call self.argmax(axis=axis, skipna=skipna, *args, **kwargs) then the args and kwargs can be validated by argmax's validator. I can make another PR if this is desirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be great @tonyyyyip

@tonyyyyip
Copy link
Contributor Author

thanks @tonyyyyip very nice

Thanks.

@tonyyyyip tonyyyyip mentioned this pull request Jan 2, 2021
5 tasks
jreback pushed a commit that referenced this pull request Jan 3, 2021
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
6 participants