Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Replace Appender and Substitution with simpler doc decorator #31060

Merged
merged 15 commits into from
Feb 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 11 additions & 16 deletions doc/source/development/contributing_docstring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -937,33 +937,31 @@ classes. This helps us keep docstrings consistent, while keeping things clear
for the user reading. It comes at the cost of some complexity when writing.

Each shared docstring will have a base template with variables, like
``%(klass)s``. The variables filled in later on using the ``Substitution``
decorator. Finally, docstrings can be appended to with the ``Appender``
decorator.
``{klass}``. The variables filled in later on using the ``doc`` decorator.
Finally, docstrings can also be appended to with the ``doc`` decorator.

In this example, we'll create a parent docstring normally (this is like
``pandas.core.generic.NDFrame``. Then we'll have two children (like
``pandas.core.series.Series`` and ``pandas.core.frame.DataFrame``). We'll
substitute the children's class names in this docstring.
substitute the class names in this docstring.

.. code-block:: python
class Parent:
@doc(klass="Parent")
def my_function(self):
"""Apply my function to %(klass)s."""
"""Apply my function to {klass}."""
...
class ChildA(Parent):
@Substitution(klass="ChildA")
@Appender(Parent.my_function.__doc__)
@doc(Parent.my_function, klass="ChildA")
def my_function(self):
...
class ChildB(Parent):
@Substitution(klass="ChildB")
@Appender(Parent.my_function.__doc__)
@doc(Parent.my_function, klass="ChildB")
def my_function(self):
...
Expand All @@ -972,18 +970,16 @@ The resulting docstrings are
.. code-block:: python
>>> print(Parent.my_function.__doc__)
Apply my function to %(klass)s.
Apply my function to Parent.
>>> print(ChildA.my_function.__doc__)
Apply my function to ChildA.
>>> print(ChildB.my_function.__doc__)
Apply my function to ChildB.
Notice two things:
Notice:

1. We "append" the parent docstring to the children docstrings, which are
initially empty.
2. Python decorators are applied inside out. So the order is Append then
Substitution, even though Substitution comes first in the file.

Our files will often contain a module-level ``_shared_doc_kwargs`` with some
common substitution values (things like ``klass``, ``axes``, etc).
Expand All @@ -992,14 +988,13 @@ You can substitute and append in one shot with something like

.. code-block:: python
@Appender(template % _shared_doc_kwargs)
@doc(template, **_shared_doc_kwargs)
def my_function(self):
...
where ``template`` may come from a module-level ``_shared_docs`` dictionary
mapping function names to docstrings. Wherever possible, we prefer using
``Appender`` and ``Substitution``, since the docstring-writing processes is
slightly closer to normal.
``doc``, since the docstring-writing processes is slightly closer to normal.

See ``pandas.core.generic.NDFrame.fillna`` for an example template, and
``pandas.core.series.Series.fillna`` and ``pandas.core.generic.frame.fillna``
Expand Down
161 changes: 79 additions & 82 deletions pandas/core/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from typing import FrozenSet, Set
import warnings

from pandas.util._decorators import Appender
from pandas.util._decorators import doc


class DirNamesMixin:
Expand Down Expand Up @@ -193,122 +193,119 @@ def __get__(self, obj, cls):
return accessor_obj


@doc(klass="", others="")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this decorated here for a particular reason?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to replace the variables, otherwise the values in parenthesis would be in the doc (and doc also keeps the original template)

Copy link
Contributor Author

@HH-MWB HH-MWB Feb 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This decorated to force a string format function to be applied.

I also think that this empty decorator may be a little bit miss-leading. One other possibility could be @doc(klass="klass", others="others"). I am not sure which one (use empty string or the arguments' name) is better here. Open and welcome for any suggestions and advice.

def _register_accessor(name, cls):
def decorator(accessor):
if hasattr(cls, name):
warnings.warn(
f"registration of accessor {repr(accessor)} under name "
f"{repr(name)} for type {repr(cls)} is overriding a preexisting"
f"attribute with the same name.",
UserWarning,
stacklevel=2,
)
setattr(cls, name, CachedAccessor(name, accessor))
cls._accessors.add(name)
return accessor

return decorator
"""
Register a custom accessor on {klass} objects.
Parameters
----------
name : str
Name under which the accessor should be registered. A warning is issued
if this name conflicts with a preexisting attribute.
_doc = """
Register a custom accessor on %(klass)s objects.
Returns
-------
callable
A class decorator.
Parameters
----------
name : str
Name under which the accessor should be registered. A warning is issued
if this name conflicts with a preexisting attribute.
See Also
--------
{others}
Returns
-------
callable
A class decorator.
Notes
-----
When accessed, your accessor will be initialized with the pandas object
the user is interacting with. So the signature must be
See Also
--------
%(others)s
.. code-block:: python
Notes
-----
When accessed, your accessor will be initialized with the pandas object
the user is interacting with. So the signature must be
def __init__(self, pandas_object): # noqa: E999
...
.. code-block:: python
For consistency with pandas methods, you should raise an ``AttributeError``
if the data passed to your accessor has an incorrect dtype.
def __init__(self, pandas_object): # noqa: E999
...
>>> pd.Series(['a', 'b']).dt
Traceback (most recent call last):
...
AttributeError: Can only use .dt accessor with datetimelike values
For consistency with pandas methods, you should raise an ``AttributeError``
if the data passed to your accessor has an incorrect dtype.
Examples
--------
>>> pd.Series(['a', 'b']).dt
Traceback (most recent call last):
...
AttributeError: Can only use .dt accessor with datetimelike values
In your library code::
Examples
--------
import pandas as pd
In your library code::
@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
def __init__(self, pandas_obj):
self._obj = pandas_obj
import pandas as pd
@property
def center(self):
# return the geographic center point of this DataFrame
lat = self._obj.latitude
lon = self._obj.longitude
return (float(lon.mean()), float(lat.mean()))
@pd.api.extensions.register_dataframe_accessor("geo")
class GeoAccessor:
def __init__(self, pandas_obj):
self._obj = pandas_obj
def plot(self):
# plot this array's data on a map, e.g., using Cartopy
pass
@property
def center(self):
# return the geographic center point of this DataFrame
lat = self._obj.latitude
lon = self._obj.longitude
return (float(lon.mean()), float(lat.mean()))
Back in an interactive IPython session:
def plot(self):
# plot this array's data on a map, e.g., using Cartopy
pass
>>> ds = pd.DataFrame({{'longitude': np.linspace(0, 10),
... 'latitude': np.linspace(0, 20)}})
>>> ds.geo.center
(5.0, 10.0)
>>> ds.geo.plot()
# plots data on a map
"""

Back in an interactive IPython session:
def decorator(accessor):
if hasattr(cls, name):
warnings.warn(
f"registration of accessor {repr(accessor)} under name "
f"{repr(name)} for type {repr(cls)} is overriding a preexisting"
f"attribute with the same name.",
UserWarning,
stacklevel=2,
)
setattr(cls, name, CachedAccessor(name, accessor))
cls._accessors.add(name)
return accessor

>>> ds = pd.DataFrame({'longitude': np.linspace(0, 10),
... 'latitude': np.linspace(0, 20)})
>>> ds.geo.center
(5.0, 10.0)
>>> ds.geo.plot()
# plots data on a map
"""
return decorator


@Appender(
_doc
% dict(
klass="DataFrame", others=("register_series_accessor, register_index_accessor")
)
@doc(
_register_accessor,
klass="DataFrame",
others="register_series_accessor, register_index_accessor",
)
def register_dataframe_accessor(name):
from pandas import DataFrame

return _register_accessor(name, DataFrame)


@Appender(
_doc
% dict(
klass="Series", others=("register_dataframe_accessor, register_index_accessor")
)
@doc(
_register_accessor,
klass="Series",
others="register_dataframe_accessor, register_index_accessor",
)
def register_series_accessor(name):
from pandas import Series

return _register_accessor(name, Series)


@Appender(
_doc
% dict(
klass="Index", others=("register_dataframe_accessor, register_series_accessor")
)
@doc(
_register_accessor,
klass="Index",
others="register_dataframe_accessor, register_series_accessor",
)
def register_index_accessor(name):
from pandas import Index
Expand Down
63 changes: 29 additions & 34 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from pandas._libs import Timestamp, algos, hashtable as htable, lib
from pandas._libs.tslib import iNaT
from pandas.util._decorators import Appender, Substitution
from pandas.util._decorators import doc

from pandas.core.dtypes.cast import (
construct_1d_object_array_from_listlike,
Expand Down Expand Up @@ -487,9 +487,32 @@ def _factorize_array(
return codes, uniques


_shared_docs[
"factorize"
] = """
@doc(
values=dedent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get rid of the need for those dedent calls ? (thinking about how to make this even easier)

I suppose one option is writing those dedented. Something like

@doc(
    values=(
        """\
values : sequence
    A 1-D sequence. Sequences that aren't pandas objects are
    coerced to ndarrays before factorization.
        """), 
...
)
def factorize(...

but we might consider this ugly? (and does Black allow it?)

And in another this dedent could be handled by the decorator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on adding indentation "magic" to the decorator. Not a big fan of automatic things, but I think making the docstring injection as simple as possible, so the important code is more readable, makes it worth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to add this "magic" to decorator so we can keep doc simple.

I am considering change from

wrapper.__doc__ = wrapper._docstr_template.format(**kwargs)  # type: ignore

to add dedentto all kwargs like this

wrapper.__doc__ = wrapper._docstr_template.format(**{k: dedent(v) for k, v in kwargs.items()})  # type: ignore

Any thoughts? Comments? Please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases where we don't do a dedent call, like in https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L66. So you might need to ensure something automatic works there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I am only wondering if there are some cases where the to-be-inserted string starts with a space that we want to keep (to let it fit in the docstring)

Good point. The changes we made here should not change the actual docstring. Since there are mixed cases of usage, it seems hard for me to get a good solution for all of them. At least nothing comes out in my mind now.

I like the "magic", but for now, maybe it is better to keep it without a dedent call. Then, we can manually control what shows there. Also, if we do not add dedent we can change the use of Appender and Substitution to doc easily and straightforward, because the doc's behave is very close to simply combine Appender and Substitution together.

How about we just keep this unchanged now, and we can come back later when we have converted most decorator usages to doc? At that time, we might have more information to decided what could be the best way of solving that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I also think it's better to start simple, see how things work, and add more features later.

I think adding the automatic indentation will be worth, since it'll make things much simpler when documenting, and it's not so complex to implement in the decorator. But let's forget about it for now, this is already a nice improvement.

Thanks!

"""\
values : sequence
A 1-D sequence. Sequences that aren't pandas objects are
coerced to ndarrays before factorization.
"""
),
sort=dedent(
"""\
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
"""
),
size_hint=dedent(
"""\
size_hint : int, optional
Hint to the hashtable sizer.
"""
),
)
def factorize(
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None
) -> Tuple[np.ndarray, Union[np.ndarray, ABCIndex]]:
"""
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
Expand All @@ -499,10 +522,10 @@ def _factorize_array(
Parameters
----------
%(values)s%(sort)s
{values}{sort}
na_sentinel : int, default -1
Value to mark "not found".
%(size_hint)s\
{size_hint}\
Returns
-------
Expand Down Expand Up @@ -580,34 +603,6 @@ def _factorize_array(
>>> uniques
Index(['a', 'c'], dtype='object')
"""


@Substitution(
values=dedent(
"""\
values : sequence
A 1-D sequence. Sequences that aren't pandas objects are
coerced to ndarrays before factorization.
"""
),
sort=dedent(
"""\
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
"""
),
size_hint=dedent(
"""\
size_hint : int, optional
Hint to the hashtable sizer.
"""
),
)
@Appender(_shared_docs["factorize"])
def factorize(
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None
) -> Tuple[np.ndarray, Union[np.ndarray, ABCIndex]]:
# Implementation notes: This method is responsible for 3 things
# 1.) coercing data to array-like (ndarray, Index, extension array)
# 2.) factorizing codes and uniques
Expand Down
Loading