Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Make doc decorator a class and replace Appender by doc #33160

Merged
merged 15 commits into from
May 22, 2020

Conversation

HH-MWB
Copy link
Contributor

@HH-MWB HH-MWB commented Mar 31, 2020

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Mar 31, 2020

Error

The error in commit 6e4b036 was:

Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "/home/runner/work/pandas/pandas/pandas/__init__.py", line 54, in <module>
    from pandas.core.api import (
  File "/home/runner/work/pandas/pandas/pandas/core/api.py", line 15, in <module>
    from pandas.core.arrays import Categorical
  File "/home/runner/work/pandas/pandas/pandas/core/arrays/__init__.py", line 8, in <module>
    from pandas.core.arrays.datetimes import DatetimeArray
  File "/home/runner/work/pandas/pandas/pandas/core/arrays/datetimes.py", line 46, in <module>
    from pandas.core.arrays import datetimelike as dtl
  File "/home/runner/work/pandas/pandas/pandas/core/arrays/datetimelike.py", line 51, in <module>
    from pandas.tseries import frequencies
  File "/home/runner/work/pandas/pandas/pandas/tseries/frequencies.py", line 29, in <module>
    from pandas.tseries.offsets import (
  File "/home/runner/work/pandas/pandas/pandas/tseries/offsets.py", line 1289, in <module>
    class CustomBusinessMonthEnd(_CustomBusinessMonth):
TypeError: function() argument 1 must be code, not str
Check import. No warnings, and blacklist some optional dependencies DONE
##[error]Process completed with exit code 1.

Fix

We could fix this by converting doc decorator from function to class.

The change was made in commit 21fa045.

@datapythonista datapythonista changed the title ENH: convert doc decorator from function to class DOC: Replace Appender by doc Mar 31, 2020
@datapythonista datapythonista changed the title DOC: Replace Appender by doc DOC: Make doc decorator a class and replace Appender by doc Mar 31, 2020
@datapythonista
Copy link
Member

Do you mind providing some more details of what are you doing here? Looks reasonable, but not sure I understand what's the problem, and how this change fixes it.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Mar 31, 2020

Do you mind providing some more details of what are you doing here? Looks reasonable, but not sure I understand what's the problem, and how this change fixes it.

Hi @datapythonista. I didn't investigate this error very thoroughly, but my guess will be this:

Our original decorator returns a function, not a class. We will get this kind of error when we are using doc to decorate a class and then have another class take the decorated class as a base class. The following example will give the same TypeError:

@doc("This is a sample docstring")
class A:
    pass

class B(A):
    pass

The fix here is to convert function decorator to class/callable decorator.

@datapythonista
Copy link
Member

I see, didn't think of that. I don't think you need to inherit from the class to be a problem, even if you don't inherit I think our decorator is transforming it to a function and won't work.

But I see two different changes here:

  • Applying the doc changes to func instead of creating a wrapper
  • Implementing the decorator as a class, and not as a function

You're implementing both, but I think they are independent, aren't they? I think if we remove wrapper and change func, even if the decorator is not a class, that would still solve the problem.

I'm happy with the changes, but just to make sure I'm understanding this correctly. Thanks!

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 1, 2020

I agree with you that the wrapper function causes this problem, but I also think implementing the decorator as a class is the way we solve this problem. I guess I misunderstand what you are trying to say here.

For my understanding, if we have to use a function, we would have something like:

def decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    wrapper.__doc__ = "This is a sample docstring"
    return wrapper

That is the fewer function to be as a decorator I can think of. I am not sure if we can remove either decorator or wrapper here. In my mind, the simplest way to avoid wrapper here is to move decorator to a class and implement the __call__ method.

@datapythonista
Copy link
Member

Decorators are a bit tricky. If you want to do something like:

@decorator
def foo():
   pass

which is equivalent to:

def foo():
   pass
foo = decorator(foo)

Then, we could simply use:

def decorator(func):
    func.__doc__ = "This is a sample docstring"
    return func

Since we call the decorator, then we actually have:

@decorator("PARAM")
def foo():
   pass

Which is equivalent to:

def foo():
   pass
foo = decorator("PARAM")(foo)

And then you need the wrapper, the function that will be returned by decorator("PARAM").

We currently have doc, decorator and wrapper. And wrapper is the final function that will be returned and will replace the original decorated function. If we then decorate a class, the class will be replaced by the function wrapper, and everything will break. My point was that instead of doing that, we can simply modify the function or class received as a parameter func, and that way things will work, whether we decorate a class, or a function.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it again, and I have a small preference for having the decorator as a function (it'll be slightly simpler), but happy to move forward either way, up to you.

If we finally use the decorator, probably you can rename the params args and kwargs to the names you're giving in the class.


def __call__(self, func: F) -> F:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does F (FuncType) also represent classes, or just functions? func is now a function or a class (probably worth changing the name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I agree here. It has been replaced to callable.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 2, 2020

We currently have doc, decorator and wrapper. And wrapper is the final function that will be returned and will replace the original decorated function. If we then decorate a class, the class will be replaced by the function wrapper, and everything will break. My point was that instead of doing that, we can simply modify the function or class received as a parameter func, and that way things will work, whether we decorate a class, or a function.

@datapythonista Yes, you are right. I didn't think about that. Also, I agree that the use function looks better here.

One more thing wants to confirm: args & kwargs vs appenders & substitution, which one makes more sense to you? I am tangled in the names and trying to find evidence to convince myself one is better than the other. If other names make more sense, please also point out.

def wrapper(*args, **kwargs) -> Callable:
return func(*args, **kwargs)

def decorator(call: Callable) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call doesn't seem very descriptive to me, may be decorated_obj, func_or_class or something like that?

Is a class of type Callable, even if it doesn't implement __call__? @simonjayhawkins may be you can help, the parameter can be any function or any class, is Callable the appropriate type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC then this seems OK since we are passing a class type not instance, and a class type is callable in order to instantiate.

from typing import Callable

class Foo:
    pass

def func(call:Callable):
    pass

func(Foo)

reveal_type(Foo)

gives
type-test.py:11: note: Revealed type is 'def () -> type-test.Foo'

and no errors. The class type seems to be treated as a function that returns an instance by mypy.

@@ -245,13 +245,13 @@ def wrapper(*args, **kwargs) -> Callable[..., Any]:
return decorate


def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[F], F]:
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[Callable], Callable]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the names of the parameters, I think a descriptive name is better in this case. I think args and kwargs are great names when they are generic, you can pass whatever, and they are passed to another function mostly.

But in this case, they have very specific usage, and they are very different between them.

I think something like *docs or *docs_or_objs and **params could be ok. May be you think of something better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this case, they have very specific usage, and they are very different between them.
I like this point! Agree that we should be specific in this case.

For the names, I think docs and params are good. I also want to add some other candidates here:

  1. use supplements to replace args
  2. use fillers to replace kwargs

I am not a native English speaker, so I am not sure if those words also make sense to you and other developers. Please let me know which one you like and I will be more than happy to make the change.

@simonjayhawkins
Copy link
Member

@HH-MWB this PR is removing some of the typevars to ensure that decorated functions/classes have their typing signatures preserved. I recall discussing this before. do we have tests?

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 3, 2020

@HH-MWB this PR is removing some of the typevars to ensure that decorated functions/classes have their typing signatures preserved. I recall discussing this before. do we have tests?

Hi @simonjayhawkins. Yes, I would be happy to add some tests and/or switch back to original type, but I guess I am a little bit confusing here.

I was changing type annotation from F to Callable to matches the name changes here, and just simply assuming F is the same as Callable.

Would you mind provide some more information about F? Does it also represent classes? What's the difference between it and Callable?

@datapythonista
Copy link
Member

@simonjayhawkins can you have a look at @HH-MWB question?

@HH-MWB, once you know about the typing, can you address the latest comments, merge master, and leave the CI green, so we can merge this please? Thanks!

@simonjayhawkins
Copy link
Member

Would you mind provide some more information about F? Does it also represent classes? What's the difference between it and Callable?

see https://mypy.readthedocs.io/en/stable/generics.html#declaring-decorators for an explaination on F and why it's necessary.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, if @simonjayhawkins is happy with the typing.

Not sure about the names, may be I'd use:

  • supplements -> docstrings
  • substitutes -> params
  • call -> decorated or func_or_class

But it's just a personal opinion. @simonjayhawkins may be you can also give your opinion on the naming?

Thanks @HH-MWB

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 15, 2020

Not sure about the names, may be I'd use:

  • supplements -> docstrings
  • substitutes -> params
  • call -> decorated or func_or_class

I think this names makes sense to me, especially use decorated to replace call. I would like to make those changes first, and we can always change them if we have future discussion and get a better name.

@simonjayhawkins
Copy link
Member

@datapythonista

Not sure about the names, may be I'd use:

naming is hard.

If we finally use the decorator, probably you can rename the params args and kwargs to the names you're giving in the class.

i don't think it makes sense to change the names if not using a class

I checked it again, and I have a small preference for having the decorator as a function (it'll be slightly simpler), but happy to move forward either way, up to you.

looking again at the 21fa045, i think the class approach may be clearer is some of the logic had been moved from __call__ to __init__.

The __call__ would then be simpler, maybe look similar to the current Appender/Substitution, and the naming maybe be more natural.

We also have an underlying bug to fix. i.e. remove the extra wrapper returning a function.

So my preference would be to do just the fix (maybe raise a issue for clarity and posterity) here. i.e. no renaming.

@datapythonista
Copy link
Member

I personally think implementing the docorator with a function instead of a class is simpler. But just my opinion, happy with any of them.

For the naming, I think the generic args and kwargs are the best choice when you don't really know what you expect to receive. As an example, if plot(kind, *args, **kwargs) will just call a matplotlib function with whatever you're passing them, I think the names are the best. But in the other hand, when the parameters have a very specific meaning, like in this case, I think actual names are better. If I see a signature foo(*args, **kwargs), I'd expect that foo('pandas') and foo(name='pandas') are somehow equivalent. Which in this case it's completely wrong, because args and kwargs are totally different concepts. So in this case, I think it's much more appropriate to use descriptive names.

Of course not a big deal, I'm ok calling them args and kwargs. But I do see a drawback.

@HH-MWB CI is red, if you don't mind having a look. Thanks!

@smartvinnetou
Copy link
Contributor

Hi,

I replaced Appender with doc in a number of places in another PR. I found that the current implementation does not play well with the dynamic method creation. As illustrated in this PR, the test_apply_with_string_funcs test fails. Here are the changes I made. The issue is that the doc() decorator for some reason changes the signature of the DataFrame.std() method.

It would be great if you considered this use of doc() in your reimplementation and made sure that it works for the dynamic method creation.

If it helps here is more detail about the situation:

  • the std() method is created dynamically here
  • when using the current doc() implementation, the test fails for the testcase where axis 1 is selected. It fails because in this case std() is calculated along the wrong axis. This happens because the apply() method does not pass this argument correctly.
  • the axis argument is not passed correctly because when the test runs for std() and the test case where axis=1, this line is not executed.
  • the control does not enter the if branch because sig from this line has no arguments or keyword arguments.
  • this does not happen when Appender() is used. In that case sig has the correct arguments and key word arguments as expected for std() including axis and axis=1 is coeerctly passed down by the apply() machinery.

Hope this helps with improving the doc() decorator.

@smartvinnetou
Copy link
Contributor

It looks like this is changing the stacklevel at which warnings are being issued, not obvious to me why

I guess the reason is this PR removed function wrapper which used to wraps decorated function.

Yes, you are right. I had the same issue in my PR. You need to decrement this value by 2 to make the tests pass after removing the wrapper function.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 23, 2020

Thanks @smartvinnetou for providing this information. I guess this new implementation would fix the stacklevel issue.

@HH-MWB HH-MWB requested a review from datapythonista April 26, 2020 21:12
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @HH-MWB

Just to make sure I understand the stachlevel issue. Some functions were before wrapped more than once with Appender and Substitution and now they are just wrapped once, and that's changing the numbers of levels the warning is stacked. Is that correct?

Do you mind generating the warning that you're changing in core/arrays/datetimelike.py and add the output in a comment in this PR, so we can have a look, and make sure we're not breaking anything with this PR please?

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 28, 2020

Just to make sure I understand the stachlevel issue. Some functions were before wrapped more than once with Appender and Substitution and now they are just wrapped once, and that's changing the numbers of levels the warning is stacked. Is that correct?

Hi @datapythonista. I am not 100% sure about this,but my guess is: the old doc decorator has 2 extra stacklevel than Appender and Substitution, but the new doc decorator has the same stacklevel.

Our old decorator has that extra wrapper function which increased the stacklevel by 2. @smartvinnetou has a merged PR using the old decorator, and the stacklevel has been increased by 2 to pass the CI/CD checks. Now we have to switch it back.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Apr 28, 2020

Do you mind generating the warning that you're changing in core/arrays/datetimelike.py and add the output in a comment in this PR, so we can have a look, and make sure we're not breaking anything with this PR please?

The error message might be this:

Traceback (most recent call last):
  File "<string>", line 3, in <module>
  File "/home/runner/work/pandas/pandas/pandas/__init__.py", line 54, in <module>
    from pandas.core.api import (
  File "/home/runner/work/pandas/pandas/pandas/core/api.py", line 15, in <module>
    from pandas.core.arrays import Categorical
  File "/home/runner/work/pandas/pandas/pandas/core/arrays/__init__.py", line 8, in <module>
    from pandas.core.arrays.datetimes import DatetimeArray
  File "/home/runner/work/pandas/pandas/pandas/core/arrays/datetimes.py", line 46, in <module>
    from pandas.core.arrays import datetimelike as dtl
  File "/home/runner/work/pandas/pandas/pandas/core/arrays/datetimelike.py", line 51, in <module>
    from pandas.tseries import frequencies
  File "/home/runner/work/pandas/pandas/pandas/tseries/frequencies.py", line 29, in <module>
    from pandas.tseries.offsets import (
  File "/home/runner/work/pandas/pandas/pandas/tseries/offsets.py", line 1289, in <module>
    class CustomBusinessMonthEnd(_CustomBusinessMonth):
TypeError: function() argument 1 must be code, not str
Check import. No warnings, and blacklist some optional dependencies DONE
##[error]Process completed with exit code 1.

We can also goto the failed commit 6e4b036 for more detail information.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented May 9, 2020

Bump @datapythonista @simonjayhawkins

@HH-MWB HH-MWB marked this pull request as draft May 17, 2020 21:58
@HH-MWB HH-MWB marked this pull request as ready for review May 17, 2020 21:59
@pep8speaks
Copy link

pep8speaks commented May 18, 2020

Hello @HH-MWB! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-21 01:44:13 UTC

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @HH-MWB, nice PR

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins
Copy link
Member

Thanks @HH-MWB for sticking with this. can you merge upstream/master one more time and ping on green

@simonjayhawkins simonjayhawkins added this to the 1.1 milestone May 20, 2020
@simonjayhawkins simonjayhawkins merged commit 42a5c1c into pandas-dev:master May 22, 2020
@simonjayhawkins
Copy link
Member

Thanks @HH-MWB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants