Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Replace old string formatting syntax in calling of Appender decorators #30933

Closed
HH-MWB opened this issue Jan 11, 2020 · 8 comments · Fixed by #31060
Closed

DOC: Replace old string formatting syntax in calling of Appender decorators #30933

HH-MWB opened this issue Jan 11, 2020 · 8 comments · Fixed by #31060
Assignees
Labels
Clean Docs good first issue Needs Discussion Requires discussion from core team before further action

Comments

@HH-MWB
Copy link
Contributor

HH-MWB commented Jan 11, 2020

I found calling of @Appender() are using % to format string. I think it might be better to replace with .format based on PEP 3101 and a discussion on Stack Overflow.

#29547 is working on replacing % with f-strings. This change would also help to keep the code more consistent in string template. Be more specific, the template using % to formate will be something like %(XXX)s, but using .format and f-strings will be the same as {XXX}.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 11, 2020

An code example to explain this idea

The current implementation is

@Appender(_shared_docs["align"] % _shared_doc_kwargs)

It might be better if we change it to

@Appender(_shared_docs["align"].format(**_shared_doc_kwargs))

This change would also need to update _shared_docs["align"] from

    _shared_docs[
        "align"
    ] = """
        ...
        broadcast_axis : %(axes_single_arg)s, default None
        ...
        """

to

    _shared_docs[
        "align"
    ] = """
        ...
        broadcast_axis : {axes_single_arg}, default None
        ...
        """

@datapythonista
Copy link
Member

I think this change is quite big, and the proposed solution is an improvement, but still not as good as it could be.

Personally, when reusing a docstring, I'd like to write something like:

@doc(method='cummin', operation='minimum')
def cummax(whatever):
    """
    This is the {method} method.

    It computes the cumulative {operation}.
    """

@doc('pandas.Series.cummax', method='cummin', operation='minimum')
def cummin(whatever):
    pass

I think it's not difficult to implement a doc decorator that behaves like this, and IMHO it'll make the code much simpler and readable, that the current Appender / Substitution.

@HH-MWB what's your opinion on this syntax? Do you want to work on a proof of concept to see how this would look like in practice?

CC: @martinagvilas @galuhsahid

@datapythonista datapythonista changed the title Replace old string formatting syntax in calling of Appender decorators DOC: Replace old string formatting syntax in calling of Appender decorators Jan 13, 2020
@datapythonista datapythonista added Clean Docs good first issue Needs Discussion Requires discussion from core team before further action labels Jan 13, 2020
@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 14, 2020

@datapythonista Yes, I agree with you. This seems to be simpler and more readable. And of course, I would be happy to work on that.

Should I put take in a comment here?

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 14, 2020

Solution

def doc(*args, **kwargs):
    def function(func):
        def wrapper(*args, **kwargs):
            return func(*args, **kwargs)

        templates = [func.__doc__ if func.__doc__ else ""]
        for arg in args:
            if isinstance(arg, str):
                templates.append(arg)
            elif hasattr(arg, "_doc_template"):
                templates.append(arg._doc_template)
            elif arg.__doc__:
                templates.append(arg.__doc__)

        wrapper._doc_template = '\n'.join(dedent(t) for t in templates)
        wrapper.__doc__ = wrapper._doc_template.format(**kwargs)

        return wrapper
    return function

All non-keyworded arguments will be consider as doc template reference. If it's a string, then it will be use as a template. If it's a function, then consider the original doc string as template.

All templates will be concatenate together, and format by keyword arguments.

Example

Here is some code in /pandas/core/series.py

@Appender(
    """
    Examples
    --------
    >>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
    >>> print(s.to_markdown())
    |    | animal   |
    |---:|:---------|
    |  0 | elk      |
    |  1 | pig      |
    |  2 | dog      |
    |  3 | quetzal  |
    """
)
@Substitution(klass="Series")
@Appender(generic._shared_docs["to_markdown"])
def to_markdown(
    self, buf: Optional[IO[str]] = None, mode: Optional[str] = None, **kwargs
) -> Optional[str]:
    return self.to_frame().to_markdown(buf, mode, **kwargs)

We could rewrite it into

@doc(
    DataFrame.to_markdown,
    """
    Examples
    --------
    >>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
    >>> print(s.to_markdown())
    |    | animal   |
    |---:|:---------|
    |  0 | elk      |
    |  1 | pig      |
    |  2 | dog      |
    |  3 | quetzal  |
    """,
    klass="Series"
)
def to_markdown(
    self, buf: Optional[IO[str]] = None, mode: Optional[str] = None, **kwargs
) -> Optional[str]:
    return self.to_frame().to_markdown(buf, mode, **kwargs)

(Assuming the doc string in DataFrame.to_markdown is generic._shared_docs["to_markdown"])

@datapythonista
Copy link
Member

Thanks for working on this @HH-MWB. That looks like a good improvement to me.

Do you want to open a PR fixing some random cases, so people can see how it looks like in practice, and discuss whether it's worth all the effort to replace the current syntax.

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 15, 2020

Do you want to open a PR fixing some random cases, so people can see how it looks like in practice, and discuss whether it's worth all the effort to replace the current syntax.

Yes, I will submit a PR, after I briefly structure the code. I want it matches the style here.

@jorisvandenbossche
Copy link
Member

One note about using format instead of % format strings, is that this puts more constraints on the actual docstring. Specifically the use of "normal" { (not to be replaced) gets more complicated then. (I seem to remember this was one of the reasons that we kept it on % strings when I looked at that some years ago).

Now, of course that can be solved (double {{), and probably that doesn't occur to often to be annoying?

@HH-MWB
Copy link
Contributor Author

HH-MWB commented Jan 16, 2020

One note about using format instead of % format strings, is that this puts more constraints on the actual docstring. Specifically the use of "normal" { (not to be replaced) gets more complicated then.

I agree with you. This could be a potential issue when switching from % to format. I have seen some docstring use {} to indicate a set. As you mentioned, one possible solution for this approach would use {{ to replace {, and add an empty decorator like @doc() above the function.

#29547 is working on replacing string formatting to f-string. Foreseeable, in my opinion, the majority string formatting in pandas will be using {xxx}. Personally, I feel using format can keep string formatting more consistent.

I guess what we want to compare is:

  1. an additional step (replacing {} and add @doc()) to some function when it didn't use doc decorator originally but some other functions want to use their docstring as a template.
  2. using % to format docstring but this could be different than we formatting string everywhere else.

I would prefer to keep consistent, but this is just my personal thought. I am happy to switching to % or any other approach as long as we feel it is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Docs good first issue Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants