-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate non-keyword arguments for drop_duplicates. #41500
Deprecate non-keyword arguments for drop_duplicates. #41500
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is off to a good start, thanks @jmholzer !
|
||
|
||
def test_drop_duplicates_pos_args_deprecation(): | ||
# test deprecation warning message for positional arguments GH#41485 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just GH#41485
should be fine here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in the latest commit.
|
||
|
||
def test_drop_duplicates_pos_args_deprecation(): | ||
# test deprecation warning message for positional arguments GH#41485 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in latest commit.
# test deprecation warning message for positional arguments GH#41485 | ||
df = DataFrame({"a": [1, 1, 2], "b": [1, 1, 3], "c": [1, 1, 3]}) | ||
msg = ( | ||
r"Starting with Pandas version 2\.0 all arguments of drop_duplicates except for " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is likely too long, did you run the linting checks before submitting? If you enable pre-commit (pre-commit install
) they'll be run for you, else you can run them manually with pre-commit run --files <any file you've modified>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it was too long. I changed it in my latest commit. I was using git outside of my development environment for convenience. In my latest commits, I made sure to have pre-commit enabled, thanks for the hint.
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -647,6 +647,7 @@ Deprecations | |||
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`) | |||
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`) | |||
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`) | |||
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` and :meth:`Series.drop_duplicates` (:issue:`41485`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps mention that subset
is allowed (e.g. "except for subset
"), other than that, if the tests all pass, this looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks, I made that change and committed again.
I have checked that all tests on the methods (not just mine) still pass without issue also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -647,6 +647,7 @@ Deprecations | |||
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`) | |||
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`) | |||
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`) | |||
- Deprecated passing arguments as positional in :meth:`DataFrame.drop_duplicates` (except for ``subset``) and :meth:`Series.drop_duplicates` (:issue:`41485`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should do the same for Index at the same time for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will get error: Signature of "drop_duplicates" incompatible with supertype "IndexOpsMixin" [override]
otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add the warning decorator to Index.drop_duplicates + test in a commit on this branch / pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be great, thanks @jmholzer !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made these changes in the commit 064268f.
However, it looks like a lot of commits that I pulled from the master branch ended up in this pull request also, I don't have enough knowledge of git to understand why this happened (I used rebase, but didn't think the commits from master would show up in the PR). Is this a problem? If so, can I fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you'll need to rebase (see https://youtu.be/hv8dhOEzQcM), currently this is showing lots of unrelated changes. Something like
git fetch upstream
git rebase -i upstream/master
and then in the interactive window choose which commits to keep/drop/fixup/edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guidance, I appreciate it.
I did the following to fix things:
- Reverted the changes on my local branch to before the rebase.
- Cherry-picked my commit with the changes for Index from the remote.
- Pulled remote master changes to my local master branch.
- Merged my local feature branch with my local master branch.
- Force-pushed my local feature branch to the remote.
That has gotten rid of the spurious commits. I tried using a rebase instead of a merge in step 3, as you suggested, but I couldn't do it cleanly.
Are steps 3-4 acceptable for working on pandas, or is a rebase generally preferred to a merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's fine, commits will get squashed anyway
@MarcoGorelli I made the changes to drop_duplicates that we discussed in the comments of #41551, they are in the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks - couple of minor things, then this looks good to me
msg = ( | ||
"In a future version of pandas all arguments of " | ||
"Index.drop_duplicates will be keyword-only" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message shows Index
- to get it to show MultiIndex
, you'll need to define drop_duplicates
in the MultiIndex
class and then call super().drop_duplicates
inside it - see interpolate
for an example of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1738,3 +1738,20 @@ def test_construct_from_memoryview(klass, extra_kwargs): | |||
result = klass(memoryview(np.arange(2000, 2005)), **extra_kwargs) | |||
expected = klass(range(2000, 2005), **extra_kwargs) | |||
tm.assert_index_equal(result, expected) | |||
|
|||
|
|||
def test_drop_duplicates_pos_args_deprecation(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also put the issue number here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this change in 2cb482f.
@MarcoGorelli regarding the warning message generation for https://docs.python.org/3.8/library/typing.html#typing.final |
sure. we tend to use @Final to identify methods that aren't overloaded rather than methods that shouldn't be overloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there 💪
pandas/core/indexes/multi.py
Outdated
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"]) | ||
def drop_duplicates(self: _IndexT, keep: str_t | bool = "first") -> _IndexT: | ||
return super(Index, self).drop_duplicates(keep=keep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to annotate self as _IndexT
? If think you should be able to type the return type as MultiIndex
and leave self
untyped, see MultiIndex.dropna
Also, does super().drop_duplicates(keep=keep)
not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, str
should be fine in this fine in this file, I think it's only used in pandas/core/indexes/base.py
because there there's str = CachedAccessor("str", StringMethods)
(note to self: there should probably be a pre-commit rule for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all the changes suggested in both comments, they are in fbf70a2.
def test_drop_duplicates_pos_args_deprecation(): | ||
# GH#41485 | ||
df = DataFrame({"a": [1, 1, 2], "b": [1, 1, 3], "c": [1, 1, 3]}) | ||
|
||
msg = ( | ||
"In a future version of pandas all arguments of " | ||
"DataFrame.drop_duplicates except for the argument 'subset' " | ||
"will be keyword-only" | ||
) | ||
|
||
with tm.assert_produces_warning(FutureWarning, match=msg): | ||
result = df.drop_duplicates(["b", "c"], "last") | ||
|
||
expected = DataFrame({"a": [1, 2], "b": [1, 3], "c": [1, 3]}, index=[1, 2]) | ||
|
||
tm.assert_frame_equal(expected, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove all the extra newlines here, this test could read as a single paragraph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change to all my tests in fbf70a2, they now all read as one paragraph.
…ithub.com/jmholzer/pandas into deprecate-nonkeyword-args-drop_duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks
* ENH: Deprecate non-keyword arguments for drop_duplicates. * leave newline * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * remove redundant line * ENH: Deprecate non-keyword arguments for drop_duplicates. Co-authored-by: Marco Gorelli <[email protected]>
* ENH: Deprecate non-keyword arguments for drop_duplicates. * leave newline * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * ENH: Deprecate non-keyword arguments for drop_duplicates. * remove redundant line * ENH: Deprecate non-keyword arguments for drop_duplicates. Co-authored-by: Marco Gorelli <[email protected]>
inplace
#41485