Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Replace Appender and Substitution with simpler doc decorator #31060

Merged
merged 15 commits into from
Feb 12, 2020
Merged

CLN: Replace Appender and Substitution with simpler doc decorator #31060

merged 15 commits into from
Feb 12, 2020

Conversation

HH-MWB
Copy link
Contributor

@HH-MWB HH-MWB commented Jan 16, 2020

A new decorator to handle docstring formatting. There is also an update for an existing case to show how it works.

@datapythonista datapythonista added Clean Docs Needs Discussion Requires discussion from core team before further action labels Jan 16, 2020
@datapythonista datapythonista changed the title DOC: add decorator to format doc templates PROPOSAL: Replace Appender and Substitution with simpler doc decorator Jan 16, 2020
@datapythonista
Copy link
Member

@pandas-dev/pandas-core I think this should simplify significantly the reuse of docstrings. Thoughts?

@jorisvandenbossche
Copy link
Member

Can you a bit more specific about what exactly the proposal is?

Looking at the code, it seems to combine the Appender and Substitution decorators into a single decorator (certainly a good idea!), but for the rest works more or less the same?

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 16, 2020

Can you a bit more specific about what exactly the proposal is?

Looking at the code, it seems to combine the Appender and Substitution decorators into a single decorator (certainly a good idea!), but for the rest works more or less the same?

Yes, I think you have summed up this proposal very well. The idea of this is to create a simple solution to re-use docstring. It combined Appender and Substitution into one decorator and retain most of their features.

One additional improvement from this approach: it can take docstring (unrendered) as a template too. This gives us the ability to put docstring under function directly, instead of saving them as a global variable somewhere else. This might help us manage docstring more easily and more conveniently.

@jorisvandenbossche
Copy link
Member

One additional improvement from this approach: it can take docstring (unrendered) as a template too. This gives us the ability to put docstring under function directly, instead of saving them as a global variable somewhere else.

That's something we already started doing (but only a bit) with the current Appender as well (see https://dev.pandas.io/docs/development/contributing_docstring.html#sharing-docstrings, and eg https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L787 in practice), if I understand you correctly. Anyway, I also think this is better than the shared_docs variables, so if this decorator makes that even a bit more simple, all the better!

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think has some potential. Any chance you can do 1-2 more docstrings to see how this works?

if isinstance(arg, str):
templates.append(arg)
elif hasattr(arg, "_docstr_template"):
templates.append(arg._docstr_template) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all of the type: ignore comments for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems mypy didn't let us declare and use attributes of function objects. They have a issue for this, but seems there are no official solutions.

We might be able to use some trick to handle this, but I am not sure if it worth. It will introducing more "unrelated" code. Also, this call of _docstr_template has been checked by hasattr, so I think it is safe here.

Just to be clear, I am open to make the adjustment to avoid this ignore comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just adding https://github.com/python/mypy/issues/2087 as a comment on a line preceding the ignores would be fine.

"factorize"
] = """
@doc(
values=dedent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get rid of the need for those dedent calls ? (thinking about how to make this even easier)

I suppose one option is writing those dedented. Something like

@doc(
    values=(
        """\
values : sequence
    A 1-D sequence. Sequences that aren't pandas objects are
    coerced to ndarrays before factorization.
        """), 
...
)
def factorize(...

but we might consider this ugly? (and does Black allow it?)

And in another this dedent could be handled by the decorator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on adding indentation "magic" to the decorator. Not a big fan of automatic things, but I think making the docstring injection as simple as possible, so the important code is more readable, makes it worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to add this "magic" to decorator so we can keep doc simple.

I am considering change from

wrapper.__doc__ = wrapper._docstr_template.format(**kwargs)  # type: ignore

to add dedentto all kwargs like this

wrapper.__doc__ = wrapper._docstr_template.format(**{k: dedent(v) for k, v in kwargs.items()})  # type: ignore

Any thoughts? Comments? Please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases where we don't do a dedent call, like in https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L66. So you might need to ensure something automatic works there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)

Good point. The changes we made here should not change the actual docstring. Since there are mixed cases of usage, it seems hard for me to get a good solution for all of them. At least nothing comes out in my mind now.

I like the "magic", but for now, maybe it is better to keep it without a dedent call. Then, we can manually control what shows there. Also, if we do not add dedent we can change the use of Appender and Substitution to doc easily and straightforward, because the doc's behave is very close to simply combine Appender and Substitution together.

How about we just keep this unchanged now, and we can come back later when we have converted most decorator usages to doc? At that time, we might have more information to decided what could be the best way of solving that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I also think it's better to start simple, see how things work, and add more features later.

I think adding the automatic indentation will be worth, since it'll make things much simpler when documenting, and it's not so complex to implement in the decorator. But let's forget about it for now, this is already a nice improvement.

Thanks!

@jorisvandenbossche
Copy link
Member

@HH-MWB some other relevant discussion: #19932 (I had a PR doing part of that, but never finished #20022)

@datapythonista
Copy link
Member

@HH-MWB since I think we all like this, I think you can add:

  • Couple more cases as @WillAyd said
  • Unit tests for the decorator
  • Documentation about the new decorator

Good point about @jorisvandenbossche regarding @doc(pandas.Series.head.__doc__) vs @doc(pandas.Series.head). I have a small preference for not having to write __doc__, and make the code a bit cleaner when using the decorator, but happy with both options.

Thanks a lot for working on this, I think it'll be a significant improvement. We're using Appender 312 times. :)

@datapythonista datapythonista removed the Needs Discussion Requires discussion from core team before further action label Jan 16, 2020
@datapythonista datapythonista changed the title PROPOSAL: Replace Appender and Substitution with simpler doc decorator CLN: Replace Appender and Substitution with simpler doc decorator Jan 16, 2020
@jorisvandenbossche
Copy link
Member

Documentation about the new decorator

We already document the sharing of docstrings: https://dev.pandas.io/docs/development/contributing_docstring.html#sharing-docstrings. So should we just update that documentation already to show this method?

Good point about @jorisvandenbossche regarding @doc(pandas.Series.head.__doc__) vs @doc(pandas.Series.head). I have a small preference for not having to write __doc__

To be clear, I am also in favor of not having to write __doc__. I mainly wanted to point out that not having to put the docstring in a shared variable was already possible (and done).

@jbrockmendel
Copy link
Member

We're using Appender 312 times.

We could consider using something like this to automatically inherit docstrings. (Though id rather use __init_subclass_ than a metaclass)

@datapythonista
Copy link
Member

Do you have an example to see how this looks like when applied?

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 17, 2020

To be clear, I am also in favor of not having to write __doc__. I mainly wanted to point out that not having to put the docstring in a shared variable was already possible (and done).

Sorry for didn't make it clear. I totally agree with you that we could use func.__doc__ to avoid put docstring in a shared variable.

However, in some very edge cases, I still think it going to be hard to avoid shared variables using Appender and Substitution only. For example, in functions of "register a custom accessor", it might be very difficult to remove _doc from a shared variable. If we try to put _doc under a function, the template will become a rendered string after @Substitution, which makes other functions hard to get the original docstring template.

@jorisvandenbossche
Copy link
Member

However, in some very edge cases, I still think it going to be hard to avoid shared variables using Appender and Substitution only.

Yeah, I also didn't want to say we should not use it at all. Unfortunately, it's not only some edge cases where we will still need to use a shared variable. I think basically every case where there is no base class method that is always overridden (such as the register_.. free functions you mentioned), we will still need to work with a variable that holds the docstring.

While starting to use this better decorator throughout the code base, we should also try to clean up the cases where we are using a shared docstring with substitution where actually nothing is being substituted / or where it is never substituted. The first case leads the unneeded complexity, while the second leads to left-over template variables in the docstring (see #19932 for some example from some time ago)

@jorisvandenbossche
Copy link
Member

(sorry wrong button)

@simonjayhawkins
Copy link
Member

@simonjayhawkins do you mind having a look at the mypy problem?

looks like it may be a mypy import issue. the following produces no errors...

diff --git a/pandas/core/series.py b/pandas/core/series.py
index 8b74ec4f5..9fc18a774 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -70,6 +70,7 @@ from pandas.core.construction import (
     is_empty_data,
     sanitize_array,
 )
+from pandas.core.generic import NDFrame
 from pandas.core.indexers import maybe_convert_indices
 from pandas.core.indexes.accessors import CombinedDatetimelikeProperties
 from pandas.core.indexes.api import (
@@ -4102,7 +4103,7 @@ Name: Max Speed, dtype: float64
             errors=errors,
         )
 
-    @doc(generic.NDFrame.fillna, **_shared_doc_kwargs)
+    @doc(NDFrame.fillna, **_shared_doc_kwargs)
     def fillna(
         self,
         value=None,

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 20, 2020

@simonjayhawkins do you mind having a look at the mypy problem?

looks like it may be a mypy import issue. the following produces no errors...

It works! Thank you for your help. @simonjayhawkins @datapythonista

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, looks good to me.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 30, 2020

Really nice, looks good to me.

@datapythonista Thanks! Any further adjustment you would like to see?

@datapythonista
Copy link
Member

It's all good, just need someone else from the team to have a look.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think this is a nice improvement. Some minor things

@@ -193,122 +193,119 @@ def __get__(self, obj, cls):
return accessor_obj


@doc(klass="", others="")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this decorated here for a particular reason?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to replace the variables, otherwise the values in parenthesis would be in the doc (and doc also keeps the original template)

Copy link
Contributor Author

@HH-MWB HH-MWB Feb 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This decorated to force a string format function to be applied.

I also think that this empty decorator may be a little bit miss-leading. One other possibility could be @doc(klass="klass", others="others"). I am not sure which one (use empty string or the arguments' name) is better here. Open and welcome for any suggestions and advice.

The string which would be used to format docstring template.
"""

def decorator(func: Callable) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use a TypeVar here - I think @simonjayhawkins might have done something similar with a Callable in another module

@HH-MWB HH-MWB requested a review from WillAyd February 1, 2020 02:50
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. so plan would be to replace Appender / Substitution with this entirely in a follow up(s) right?

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc decorator isn't preserving type annotations for decorated methods.

@@ -247,6 +247,46 @@ def wrapper(*args, **kwargs) -> Callable[..., Any]:
return decorate


def doc(*args: Union[str, Callable], **kwargs: str) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable:
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[F], F]:

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Feb 7, 2020

lgtm. so plan would be to replace Appender / Substitution with this entirely in a follow up(s) right?

Yes. If everyone agrees with this update, I will be happy to keep working on this and replace Appender / Substitution with this entirely.

@WillAyd
Copy link
Member

WillAyd commented Feb 8, 2020

@HH-MWB can you merge master one more time? Should resolve CI failures

@jreback any thoughts?

@WillAyd WillAyd added this to the 1.1 milestone Feb 12, 2020
@WillAyd WillAyd merged commit 143b011 into pandas-dev:master Feb 12, 2020
@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2020

Thanks @HH-MWB

@WillAyd
Copy link
Member

WillAyd commented Feb 12, 2020

Looking forward to follow ups

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Feb 13, 2020

Looking forward to follow ups

@WillAyd Thank you too. I have submit a new issue for replacing with doc decorator here.

Also, @datapythonista @simonjayhawkins @jorisvandenbossche thanks for your help too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Replace old string formatting syntax in calling of Appender decorators
6 participants