-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: @doc - base.py & indexing.py #31970
Conversation
b7d7ac9
to
9a4abae
Compare
405589b
to
e580a96
Compare
Replacing all docstring decorator at once will be too hard to accomplish. I would prefer to do it step-by-step. Clean a few / one file in a PR, and do things iteratively. Would you mind help me review the current changes? At the mean time, I will keep working on next part. |
@HH-MWB can you post screenshots of the affected docstrings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @HH-MWB generally lgtm, however a few questions.
pandas/core/arrays/categorical.py
Outdated
@@ -1352,8 +1353,7 @@ def memory_usage(self, deep=False): | |||
""" | |||
return self._codes.nbytes + self.dtype.categories.memory_usage(deep=deep) | |||
|
|||
@Substitution(klass="Categorical") | |||
@Appender(_shared_docs["searchsorted"]) | |||
@doc(IndexOpsMixin.searchsorted, klass="Categorical") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be inheriting from core.arrays.base.ExtensionArray instead? Also I can't see this docstring in the published docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this comment here - this seems strange to use IndexOpsMixin
since it isn't part of this classes hierarchy. Any reason not to address this?
020c24f
to
ede988c
Compare
You can build the doc strings and just post a screenshot of the HTML |
Get! Will do, thanks! |
a002400
to
6cd8082
Compare
4139d59
to
703f7a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @HH-MWB lgtm. would rather #31970 (comment) was addressed here but could be a follow-up.
would it be feasible to inherit the docstring template from the super class instead of explicitly specifying it? It would address my concern for Categorical array.
@simonjayhawkins Sorry for the late reply. First, I agree with you that inherit the docstring from the super class is very intuitive, and also could saves us a lot of time. Then, when I dive deeper for this case, I found that the original docstring is different with it's super class. This difference confused me and I am trying to figure it out what cause this difference and how can we solve it. Inherit DocstringI think it is a very good idea to inherit the docstring. Maybe this should not be a default behavior but at least we should be able to do inheritance easily. A potential quick solution for this is to create another decorator, for example, For example, class A:
def f(self):
"""func doc"""
class B(A):
@doc_inhert
def f(self):
pass Another example, class A:
def f(self):
"""func doc"""
class B(A):
def f(self):
pass
class C(B):
@doc_inhert
def f(self):
pass I am still looking to see if there are other better solutions. For example, one thing in my mind will be create decorator for the class, and do docstring inhert for all methods belong to that class. Would you mind if I create a new issue about docstring inheritance and propose my idea there? I am not 100% sure, but I think we already have some issues about this. Open for suggestions and advices. What should I do next? Case HereAs what I mentioned above, the original docstring is different from it's super class's docstring. Should I keep it as the original one or switch to be same as the super class one? Here is the original docstring:
Here is the docstring from it's super class:
|
Hi @WillAyd, I might mis-understand what you said. I create HTML via
|
Thanks for the detail. It looks like we should definitely have a separate issue/PR for this discussion. |
am I correct in thinking the only published docstring affected by this PR is https://dev.pandas.io/docs/reference/api/pandas.Series.searchsorted.html? If so the screenshot of that should be OK. |
Thanks for your help @simonjayhawkins! Here is the screen shots. |
The screenshots are from locally generated documentation? |
pandas/util/_decorators.py
Outdated
arg.format(**kwargs) | ||
if isinstance(arg, str) | ||
else dedent(arg.__doc__) # type: ignore | ||
for arg in wrapper._doc_args # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you confirm that these # type: ignore
stem from the fact that we are adding attributes to the wrapper function?
I guess this was discussed originally, but would using a class for the doc decorator help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point here! Most # type: ignore
is because we are adding attributes to function.
The only one not for this reason is else dedent(arg.__doc__) # type: ignore
(line 288). This one is because theoretically, arg.__doc__
can be str
or None
. However, we have checked arg.__doc__
is not None
before putting it in, so that we are safe here.
Back to the most # type: ignore
cases. I strongly agree with you that we could consider solving this issue without using # type: ignore
, because this looks like an abuse. I would like to try putting it under a class. Thanks for this advice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one not for this reason is
else dedent(arg.__doc__) # type: ignore
(line 288). This one is because theoretically,arg.__doc__
can bestr
orNone
. However, we have checkedarg.__doc__
is notNone
before putting it in, so that we are safe here.
i think ok to remove
$ mypy pandas --warn-unused-ignores
pandas\util\_decorators.py:288: error: unused 'type: ignore' comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins Good catch here! Thanks. Fixing is here.
Hello @simonjayhawkins. It seems very hard to remove all For use case of However, we can reduce using Here is my proposal by using class: class doc:
"""
A decorator take docstring templates, concatenate them and perform string
substitution on it.
This decorator will add a variable "_docs" to the wrapped function to keep
track the original docstring template for potential usage. If it should be
consider as a template, it will be saved as a string. Otherwise, it will be
saved as callable, and later user __doc__ and dedent to get docstring.
Parameters
----------
*args : str or callable
The string / docstring / docstring template to be appended in order
after default docstring under function.
**kwags : str
The string which would be used to format docstring template.
"""
def __init__(self, *args: Union[str, Callable], **kwargs: str) -> None:
self._docs: List[Union[str, Callable]] = []
for arg in args:
if hasattr(arg, "_docs"):
self._docs.extend(arg._docs) # type: ignore
elif isinstance(arg, str) or arg.__doc__:
self._docs.append(arg)
self.parameters = kwargs
def __call__(self, func: F) -> F:
if func.__doc__:
self._docs.insert(0, dedent(func.__doc__))
func.__doc__ = "".join(
[
arg.format(**self.parameters)
if isinstance(arg, str)
else dedent(arg.__doc__) # type: ignore
for arg in self._docs
]
)
func._docs = self._docs # type: ignore
return func We can use same idea to reduce def doc(*args: Union[str, Callable], **kwargs: str) -> Callable[[F], F]:
"""
A decorator take docstring templates, concatenate them and perform string
substitution on it.
This decorator will add a variable "_docs" to the wrapped function to keep
track the original docstring template for potential usage. If it should be
consider as a template, it will be saved as a string. Otherwise, it will be
saved as callable, and later user __doc__ and dedent to get docstring.
Parameters
----------
*args : str or callable
The string / docstring / docstring template to be appended in order
after default docstring under function.
**kwags : str
The string which would be used to format docstring template.
"""
def decorator(func: F) -> F:
@wraps(func)
def wrapper(*args, **kwargs) -> Callable:
return func(*args, **kwargs)
# collecting docstring and docstring templates
docs: List[Union[str, Callable]] = []
if func.__doc__:
docs.append(dedent(func.__doc__))
for arg in args:
if hasattr(arg, "_docs"):
docs.extend(arg._docs) # type: ignore
elif isinstance(arg, str) or arg.__doc__:
docs.append(arg)
# formatting templates and concatenating docstring
wrapper.__doc__ = "".join(
[
arg.format(**kwargs)
if isinstance(arg, str)
else dedent(arg.__doc__) # type: ignore
for arg in docs
]
)
# saving docstring templates list
wrapper._docs = docs # type: ignore
return cast(F, wrapper)
return decorator Open for feed backs here. Which one is better? Any other improvement suggestions please? |
probably best to leave as a follow-up and have more detailed discussion and just keep the changes to the decorator in this PR limited to the changes needed for using a docstring from a method that is not decorated. |
Sounds very reasonable. I will follow-up latter. |
pandas/util/_decorators.py
Outdated
@@ -268,17 +269,24 @@ def decorator(func: F) -> F: | |||
def wrapper(*args, **kwargs) -> Callable: | |||
return func(*args, **kwargs) | |||
|
|||
templates = [func.__doc__ if func.__doc__ else ""] | |||
# collecting and docstring templates | |||
wrapper._doc_args: List[Union[str, Callable]] = [] # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name _doc_args
is misleading. In the context of this function makes sense, since it's the value of the args
parameters of the doc
decorator, but if I see a _doc_args
attribute of for example pandas.Series.head
, I don't think it really tells what's in it. Something like _docstring_components
could be clearer?
Also, when can it contain a callable? Even when we have something like @doc(pandas.Series.head)
we're extending it with the content templates of pandas.Series.head
, not with the function.
Last thing, what about using a docstring_components
variable to avoid all the type: ignore
, and just set wrapper._docstring_components = _docstring_components
at the end? Probably cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista Thanks for this suggestion, especially for the new variable name. I strongly agree with you here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing space and a mypy error, other than that looks good.
pandas/util/_decorators.py
Outdated
This decorator is robust even if func.__doc__ is None. This decorator will | ||
add a variable "_docstr_template" to the wrapped function to save original | ||
docstring template for potential usage. | ||
This decorator will add a variable "_docstring_components" to the wrapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing whitespace in this line breaking the CI.
@datapythonista, @simonjayhawkins. Sorry to bother you. This mypy error makes me very confused. In this commit, I removed I know we can easily fix this by adding |
pandas/util/_decorators.py
Outdated
# formatting templates and concatenating docstring | ||
wrapper.__doc__ = "".join( | ||
[ | ||
arg.format(**kwargs) if isinstance(arg, str) else dedent(arg.__doc__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decorator can be used like this:
@doc
def foo():
pass
@doc(foo)
def bar():
pass
If that happens, in this line arg
will be the function foo
, and arg.__doc__
will be None. When dedent
is called with None
as parameter we get:
>>> dedent(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mgarcia/miniconda3/lib/python3.7/textwrap.py", line 430, in dedent
text = _whitespace_only_re.sub('', text)
TypeError: expected string or bytes-like object
And mypy is warning you about it. There are different ways of fixing this. May be this?
arg.format(**kwargs) if isinstance(arg, str) else dedent(arg.__doc__) | |
arg.format(**kwargs) if isinstance(arg, str) else dedent(arg.__doc__ or "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That make sense.
In line 281 and 282, there are codes to make sure arg.__doc__
won't be None
.
elif isinstance(arg, str) or arg.__doc__:
docstring_components.append(arg)
In this case, maybe we can use # type: ignore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that code ensures _docstring_components
is not set to None
(in the parent). And the exception won't happen as far as _docstring_components
is always set with the @doc
decorator. But technically speaking there is no guarantee that something like this can be done:
def foo():
pass
foo._docstring_components = [None]
@doc(foo)
def bar():
pass
So, my preference is an actual fix, like the one I proposed, than a typing ignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that code ensures
_docstring_components
is not set toNone
(in the parent). And the exception won't happen as far as_docstring_components
is always set with the@doc
decorator. But technically speaking there is no guarantee that something like this can be done:def foo(): pass foo._docstring_components = [None] @doc(foo) def bar(): passSo, my preference is an actual fix, like the one I proposed, than a typing
ignore
.
That is a good point! Error fixed. Thanks!
pandas/util/_decorators.py
Outdated
] | ||
) | ||
|
||
# saving docstring components list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not worth having this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just one idea that could make sense.
Thanks for the work on this @HH-MWB, it's a great improvement.
if hasattr(arg, "_docstring_components"): | ||
docstring_components.extend(arg._docstring_components) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this could be a better fix for this one:
if hasattr(arg, "_docstring_components"): | |
docstring_components.extend(arg._docstring_components) # type: ignore | |
if hasattr(arg, "_docstring_components") and isinstance(arg._docstring_components, list): | |
docstring_components.extend(arg._docstring_components) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @datapythonista. I would consider this as two parts. One is to add type check for _docstring_components
and one for remove # type: ignore
.
For remove # type: ignore
After I made the change, I still get the errors below and not sure if we can remove that.
pandas/util/_decorators.py:280: error: Item "str" of "Union[str, Callable[..., Any]]" has no attribute "_docstring_components"
pandas/util/_decorators.py:280: error: Item "function" of "Union[str, Callable[..., Any]]" has no attribute "_docstring_components"
I guess my settings are not the same as checks here. I even the last passed build, I still get mypy errors on local. So, the change might fix it and just I don't know.
For checking if _docstring_components
is list
I think this is a very good point, we do as defensive programming. But I am a little bit curious about how far we should go to protect it. Also, since this function is internal use for pandas only, would it be better to let it failed? Then the developer would get notified there is a conflict immediately?
Just to be clarified, I would be happy to make this change if you (or anyone) feels needed. Because I see both pros and cons for this, I am not sure which one will be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge this as is, was wondering if the suggested change would fix the problem.
We can surely go back to it in the future if needed, but let's merge this first, since this already adds a lot of value, and we can start replacing Appender
by doc
at a larger scale after this.
pandas/core/arrays/categorical.py
Outdated
@@ -51,7 +52,7 @@ | |||
_extension_array_shared_docs, | |||
try_cast_to_ea, | |||
) | |||
from pandas.core.base import NoNewAttributesMixin, PandasObject, _shared_docs | |||
from pandas.core.base import IndexOpsMixin, NoNewAttributesMixin, PandasObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I am in the minority on this view and don't want to be overly difficult, but can you refactor the IndexOpsMixin
as a pre-cursor to this, or leave this module separate from the rest of changes (which look good btw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a good idea. I would convert this back to using _shared_docs
. This PR is already too large, and I should really put them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge master. I guess the CI problems are unrelated.
if hasattr(arg, "_docstring_components"): | ||
docstring_components.extend(arg._docstring_components) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge this as is, was wondering if the suggested change would fix the problem.
We can surely go back to it in the future if needed, but let's merge this first, since this already adds a lot of value, and we can start replacing Appender
by doc
at a larger scale after this.
c2c4931
to
95a86ee
Compare
Great thanks @HH-MWB ! If you can do a follow up to fix the IndexOpsMixin import would be very much appreciated! |
Yes. I am thinking what is the best way of doing that. Definitely want to keep making contributing on this! Thanks. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff