-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Replace Appender and Substitution with simpler doc decorator #31060
Conversation
@pandas-dev/pandas-core I think this should simplify significantly the reuse of docstrings. Thoughts? |
Can you a bit more specific about what exactly the proposal is? Looking at the code, it seems to combine the Appender and Substitution decorators into a single decorator (certainly a good idea!), but for the rest works more or less the same? |
Yes, I think you have summed up this proposal very well. The idea of this is to create a simple solution to re-use docstring. It combined One additional improvement from this approach: it can take docstring (unrendered) as a template too. This gives us the ability to put docstring under function directly, instead of saving them as a global variable somewhere else. This might help us manage docstring more easily and more conveniently. |
That's something we already started doing (but only a bit) with the current Appender as well (see https://dev.pandas.io/docs/development/contributing_docstring.html#sharing-docstrings, and eg https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L787 in practice), if I understand you correctly. Anyway, I also think this is better than the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think has some potential. Any chance you can do 1-2 more docstrings to see how this works?
if isinstance(arg, str): | ||
templates.append(arg) | ||
elif hasattr(arg, "_docstr_template"): | ||
templates.append(arg._docstr_template) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are all of the type: ignore comments for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems mypy didn't let us declare and use attributes of function objects. They have a issue for this, but seems there are no official solutions.
We might be able to use some trick to handle this, but I am not sure if it worth. It will introducing more "unrelated" code. Also, this call of _docstr_template
has been checked by hasattr
, so I think it is safe here.
Just to be clear, I am open to make the adjustment to avoid this ignore comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just adding https://github.com/python/mypy/issues/2087
as a comment on a line preceding the ignores would be fine.
"factorize" | ||
] = """ | ||
@doc( | ||
values=dedent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get rid of the need for those dedent
calls ? (thinking about how to make this even easier)
I suppose one option is writing those dedented. Something like
@doc(
values=(
"""\
values : sequence
A 1-D sequence. Sequences that aren't pandas objects are
coerced to ndarrays before factorization.
"""),
...
)
def factorize(...
but we might consider this ugly? (and does Black allow it?)
And in another this dedent could be handled by the decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding indentation "magic" to the decorator. Not a big fan of automatic things, but I think making the docstring injection as simple as possible, so the important code is more readable, makes it worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to add this "magic" to decorator so we can keep doc simple.
I am considering change from
wrapper.__doc__ = wrapper._docstr_template.format(**kwargs) # type: ignore
to add dedent
to all kwargs like this
wrapper.__doc__ = wrapper._docstr_template.format(**{k: dedent(v) for k, v in kwargs.items()}) # type: ignore
Any thoughts? Comments? Please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some cases where we don't do a dedent
call, like in https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L66. So you might need to ensure something automatic works there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)
Good point. The changes we made here should not change the actual docstring. Since there are mixed cases of usage, it seems hard for me to get a good solution for all of them. At least nothing comes out in my mind now.
I like the "magic", but for now, maybe it is better to keep it without a dedent
call. Then, we can manually control what shows there. Also, if we do not add dedent
we can change the use of Appender
and Substitution
to doc
easily and straightforward, because the doc
's behave is very close to simply combine Appender
and Substitution
together.
How about we just keep this unchanged now, and we can come back later when we have converted most decorator usages to doc
? At that time, we might have more information to decided what could be the best way of solving that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I also think it's better to start simple, see how things work, and add more features later.
I think adding the automatic indentation will be worth, since it'll make things much simpler when documenting, and it's not so complex to implement in the decorator. But let's forget about it for now, this is already a nice improvement.
Thanks!
@HH-MWB since I think we all like this, I think you can add:
Good point about @jorisvandenbossche regarding Thanks a lot for working on this, I think it'll be a significant improvement. We're using |
We already document the sharing of docstrings: https://dev.pandas.io/docs/development/contributing_docstring.html#sharing-docstrings. So should we just update that documentation already to show this method?
To be clear, I am also in favor of not having to write |
We could consider using something like this to automatically inherit docstrings. (Though id rather use |
Do you have an example to see how this looks like when applied? |
Sorry for didn't make it clear. I totally agree with you that we could use However, in some very edge cases, I still think it going to be hard to avoid shared variables using |
Yeah, I also didn't want to say we should not use it at all. Unfortunately, it's not only some edge cases where we will still need to use a shared variable. I think basically every case where there is no base class method that is always overridden (such as the While starting to use this better decorator throughout the code base, we should also try to clean up the cases where we are using a shared docstring with substitution where actually nothing is being substituted / or where it is never substituted. The first case leads the unneeded complexity, while the second leads to left-over template variables in the docstring (see #19932 for some example from some time ago) |
(sorry wrong button) |
looks like it may be a mypy import issue. the following produces no errors... diff --git a/pandas/core/series.py b/pandas/core/series.py
index 8b74ec4f5..9fc18a774 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -70,6 +70,7 @@ from pandas.core.construction import (
is_empty_data,
sanitize_array,
)
+from pandas.core.generic import NDFrame
from pandas.core.indexers import maybe_convert_indices
from pandas.core.indexes.accessors import CombinedDatetimelikeProperties
from pandas.core.indexes.api import (
@@ -4102,7 +4103,7 @@ Name: Max Speed, dtype: float64
errors=errors,
)
- @doc(generic.NDFrame.fillna, **_shared_doc_kwargs)
+ @doc(NDFrame.fillna, **_shared_doc_kwargs)
def fillna(
self,
value=None, |
It works! Thank you for your help. @simonjayhawkins @datapythonista |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice, looks good to me.
@datapythonista Thanks! Any further adjustment you would like to see? |
It's all good, just need someone else from the team to have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think this is a nice improvement. Some minor things
@@ -193,122 +193,119 @@ def __get__(self, obj, cls): | |||
return accessor_obj | |||
|
|||
|
|||
@doc(klass="", others="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this decorated here for a particular reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to replace the variables, otherwise the values in parenthesis would be in the doc (and doc
also keeps the original template)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This decorated to force a string format function to be applied.
I also think that this empty decorator may be a little bit miss-leading. One other possibility could be @doc(klass="klass", others="others")
. I am not sure which one (use empty string or the arguments' name) is better here. Open and welcome for any suggestions and advice.
pandas/util/_decorators.py
Outdated
The string which would be used to format docstring template. | ||
""" | ||
|
||
def decorator(func: Callable) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use a TypeVar here - I think @simonjayhawkins might have done something similar with a Callable in another module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. so plan would be to replace Appender / Substitution with this entirely in a follow up(s) right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the doc decorator isn't preserving type annotations for decorated methods.
pandas/util/_decorators.py
Outdated
@@ -247,6 +247,46 @@ def wrapper(*args, **kwargs) -> Callable[..., Any]: | |||
return decorate | |||
|
|||
|
|||
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable: | |
def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[F], F]: |
Yes. If everyone agrees with this update, I will be happy to keep working on this and replace |
Thanks @HH-MWB |
Looking forward to follow ups |
@WillAyd Thank you too. I have submit a new issue for replacing with Also, @datapythonista @simonjayhawkins @jorisvandenbossche thanks for your help too. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
A new decorator to handle docstring formatting. There is also an update for an existing case to show how it works.