Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: add guide on shared docstrings #20016

Merged
merged 6 commits into from
Mar 28, 2018

Conversation

TomAugspurger
Copy link
Contributor

closes #16446

xref #19932

cc @jorisvandenbossche

The main change in best practices here is the recommendation is that Appender is only for appending. Substitution should be used for substituting values.

See the changes to fillna that this policy requires. Personally, I find the new way clearer than the old way.

  • We write the base template normally
  • I like Appender(NDFrame.method.__doc__) better than Appender(_shared_docs['name'] % kwargs)
  • I like Substitution(**kwargs) better than % formatting.

The downsides is that we don't have the template in a dictionary anymore and the docstring in NDFrame has to be the template, it can't have its variables substituted. I think that's OK.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 6, 2018
@TomAugspurger
Copy link
Contributor Author

Oh, I put this in contributing, but it should go in the docs file once that's ready.

@jorisvandenbossche
Copy link
Member

Thanks!

The downsides is that we don't have the template in a dictionary anymore and the docstring in NDFrame has to be the template, it can't have its variables substituted.

Ah, yes, that's something I didn't think about. But, as long as the templates are used in classes that are never exposed to users (like NDFrame and IndexOpsMixin), then this is not a problem. But should check that we also don't use the system in actual user classes (eg I think base Index also has shared docstrings that are substituted in the index subclasses)

@codecov
Copy link

codecov bot commented Mar 6, 2018

Codecov Report

Merging #20016 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20016      +/-   ##
==========================================
- Coverage   91.85%   91.84%   -0.01%     
==========================================
  Files         152      152              
  Lines       49249    49236      -13     
==========================================
- Hits        45237    45223      -14     
- Misses       4012     4013       +1
Flag Coverage Δ
#multiple 90.23% <100%> (-0.01%) ⬇️
#single 41.88% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/panel.py 97.29% <100%> (ø) ⬆️
pandas/core/series.py 93.85% <100%> (ø) ⬆️
pandas/core/generic.py 95.85% <100%> (-0.01%) ⬇️
pandas/core/frame.py 97.18% <100%> (ø) ⬆️
pandas/core/arrays/base.py 83.33% <0%> (-0.82%) ⬇️
pandas/util/testing.py 84.52% <0%> (-0.21%) ⬇️
pandas/core/algorithms.py 94.29% <0%> (-0.07%) ⬇️
pandas/core/base.py 96.78% <0%> (-0.02%) ⬇️
pandas/core/dtypes/missing.py 91.07% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fd3b85...8c221ea. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the doc guide now as well, looks good.

Maybe worth to mention that often Appender is used to inherit a docstring from partent class, also when no substitution is done (for Index this is eg the majority of the cases)

for the user reading. It comes at the cost of some complexity when writing.

Each shared docstring will have a base template with variables, like
``%(klass)s``. The variables filled in later on using the ``Substitution``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filled -> are filled

where ``template`` may come from a module-level ``_shared_docs`` dictionary
mapping function names to docstrings. Wherever possible, we prefer using
``Appender`` and ``Substitution``, since the docstring-writing processes is
slightly closer to normal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this reasoning (both with using % or Substitution you can write a normal docstring and append it with baseclass.method.__doc__ ?)

@jorisvandenbossche
Copy link
Member

@TomAugspurger I made a PR doing such a clean-up for Index (as this is one where we don't have always a non-user exposed base class where it is no problem to have non-substituted values): #20022

@jorisvandenbossche
Copy link
Member

@jreback you are OK with this approach? (see also discussion in #19932 and other PR #20022)

@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

this is just the guide which is codifying current practice which is good

maybe add something about actually putting functions in Series/Frame that render the doc strings (while the actual doc string itself is in generic.py)

@jorisvandenbossche
Copy link
Member

this is just the guide which is codifying current practice which is good

no, it is not exactly codifying the current practice, but a slight variation of it (without _shared_docs dict).

There is one example in this PR (see for fillna in the diff, or #20022 for a bigger example)

@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

my only comment is actually on your other PR #20022. Should clarify when should be using _shared_docs, I agree in Index subclasses we should prob try not to use it as much as its cumbersome. Though it is nice to have the correct return class rather than Index as the generic return value. Its more informative.

@jorisvandenbossche
Copy link
Member

my only comment is actually on your other PR #20022.

Can you then comment there?

Should clarify when should be using _shared_docs,

The proposal is to never use it, at least if there is a base class that can hold the template docstrings (like NDFrame). For classes like Index (where the base class is already user exposed), we might discuss further. But I show in the PR #20022 that at least currently we don't actually do this much (4 times for all Index methods).

@jorisvandenbossche
Copy link
Member

@jreback for NDFrame, we certainly still use the substitution with shared_kwargs, so are you ok with the approach in this PR for that part of the codebase? (use NDFrame.method.__doc__ instead of _shared_docs['method'])

@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

oh I c now what you did. yes I much prefer that, its way more explict.

@datapythonista
Copy link
Member

I know that this approach is based on the current one, and may be it's too ambitious to define a brand-new one, but given that we'll be working in most of the docstrings, I think it's worth suggesting.

Correct me if I'm wrong, but if the general case it's to reuse the docstring of the parent class as a template, why doesn't the decorator get it for us by default? Also, if _shared_doc_kwargs is defined as standard, why don't we render the parent docstring with the parameters defined on it?

Then, I'm sure you're already use to it, but I think naming the decorator Appender is not very descriptive.

So, with the next example:

class Series(NDFrame):
    @reuse_parent_docstring
    def fillna(self, ...):
        ...

I think the decorator should be able to automatically do the right thing. And for special cases, the next syntax could also be supported:

class Series(NDFrame):
    @reuse_parent_docstring(fillna_template, klass='Series',...)
    def fillna(self, ...):
        ...

or

class Series(NDFrame):
    @reuse_parent_docstring(NotTheParent.fillna.__doc__, **_shared_doc_kwargs)
    def fillna(self, ...):
        ...

Also, may be it's a good time to use the new formatting syntax '{klass}' instead of '%(klass)s'?

I think this adds a bit of complexity to the decorator, but the code mode clear, so I think it's worth. Am I missing something?

@jorisvandenbossche
Copy link
Member

Then, I'm sure you're already use to it, but I think naming the decorator Appender is not very descriptive.

Agreed here, I think "AppendDocstring" or "AddDocstring" (or lower case) would already be more explicit (it is used for more than inheriting docstrings, so "reuse_parent_docstring" might also be confusing in certain cases).

[about @reuse_parent_docstring] I think the decorator should be able to automatically do the right thing.

I have thought about this as well, but the problem is that there are many parent classes, for example for Series it are in order: IndexOpsMixin, NDFrame, PandasObject, etc. So already in this case, it would not work out of the box for Series, as the majority of the methods needs to inherit from NDFrame, and not IndexOpsMixin. For DatetimeIndex there are even 5 parent classes before getting to Index.
We could of course define a private attribute that holds this default parent class to inherit from.

Also, may be it's a good time to use the new formatting syntax '{klass}' instead of '%(klass)s'?

The problem is that this does not work with strings where you also want to use { itself (for example for parameter type description listing the possibilities). It would also be annoying to have to do {{'optionA', 'optionB'}}.
Although of course it is a trade-off, so it is a choice we have to make, and we can also choose the other way.

@TomAugspurger
Copy link
Contributor Author

A few things:

  1. I'm OK with having the contributor explicitly write the exact method that's the template. In my mind, if you're writing a shared docstring, you'll need to be modify both the parent template and the child, so writing @Appender(NDFrame.method) is a good thing.
  2. @jreback mentioned a clear policy for when to use the two styles. Let's call them dict-style and method-style.

Use method-style when the parent class, where the template is written, is not public. This include NDFrame and I think much of IndexOpsMixin. I haven't looked at the groupby ones.

  1. I don't have a preference for new vs. old-style format strings.

If people are ok with that policy I'll add it to this PR.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 7, 2018

Use method-style when the parent class, where the template is written, is not public. This include NDFrame and I think much of IndexOpsMixin. I haven't looked at the groupby ones.

The groupby ones mostly just use variables holding a template (without storing it in a dict), which I think is good. And IndexOpsMixin can all use the method-style (as this is not public).
The only remaining case that uses the dict-style is Index itself, for which I propose to not use substitution, but only inhertiting, so that way we can also use method-style (#20022).

@@ -1077,5 +1077,78 @@ The branch will still exist on GitHub, so to delete it there do::

git push origin --delete shiny-new-feature

Sharing Docstrings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a ref here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Going to wait till the docstring guide is merged and I move it to that document so that the name makes sense.


Each shared docstring will have a base template with variables, like
``%(klass)s``. The variables filled in later on using the ``Substitution``
decorator. Finally, docstrings can be appended to with the ``Appender``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reference the Substituion/Appender via links to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, there isn't a good way to link to a specific function. I think you can only do a specific line (with or without a commit). If we pin the commit things may go out of date. If don't do a specific commit then the line number will become incorrect.

decorator.

In this example, we'll create a parent docstring normally (this is like
``pandas.core.generic.NDFrame``. Then we'll have two children (like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll -> we will, don't use contractions :>

In this example, we'll create a parent docstring normally (this is like
``pandas.core.generic.NDFrame``. Then we'll have two children (like
``pandas.core.series.Series`` and ``pandas.core.frame.DataFrame``). We'll
substitute the children's class names in this docstring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

childrens'

Copy link
Contributor Author

@TomAugspurger TomAugspurger Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Children is a plural noun not ending in "s", so children's is the correct plural possessive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah never get these right :>. We'll -> we will


Notice two things:

1. We "append" the parent docstring to the children docstrings, which are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use * rather than numbering generally (though this is slightly more clear)


1. We "append" the parent docstring to the children docstrings, which are
initially empty.
2. Python decorators are applied inside out. So the order is Append then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on Append/Substitution

``Appender`` and ``Substitution``, since the docstring-writing processes is
slightly closer to normal.

See ``pandas.core.generic.NDFrame.fillna`` for an example template, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point to actual code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue with linking specific lines / commits, though maybe this one is OK, since we're giving an example.

But unfortunately I can't link to the correct version, since the correction is in this PR :)

@@ -1077,5 +1077,78 @@ The branch will still exist on GitHub, so to delete it there do::

git push origin --delete shiny-new-feature

Sharing Docstrings
==================

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is not long enough

In this example, we'll create a parent docstring normally (this is like
``pandas.core.generic.NDFrame``. Then we'll have two children (like
``pandas.core.series.Series`` and ``pandas.core.frame.DataFrame``). We'll
substitute the children's class names in this docstring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah never get these right :>. We'll -> we will

@TomAugspurger
Copy link
Contributor Author

Moved to the docstring guide now that it's in master.

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche @datapythonista contribution_docstring.rst isn't in any toctree. Is that deliberate?

@jorisvandenbossche
Copy link
Member

Not sure, it is linked too from contributing.rst part about documentation.

I am fine with adding it to the main index.rst, it's already a disaster anyhow :-)
But ideally we would have a more nested contributing section of the docs.


See ``pandas.core.generic.NDFrame.fillna`` for an example template, and
``pandas.core.series.Series.fillna`` and ``pandas.core.generic.frame.fillna``
for the filled versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fillna is actually not a really good example, as it has an example section only targetting DataFrame .. (but anyhow, that is another issue :))

@datapythonista
Copy link
Member

I linked contributing_docstring.rst from contributing.rst. Personally I think contributing.rst is long enough to be splited. If that sounds good, we can probably leave it out of any toctree until then. I'm happy to take a looks at it when I find some time.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 12, 2018 via email

@TomAugspurger
Copy link
Contributor Author

Anything else you'd like to see here?

@jorisvandenbossche jorisvandenbossche changed the title Shared doc guide DOC: add guide on shared docstrings Mar 27, 2018
@jorisvandenbossche
Copy link
Member

@TomAugspurger can you update with master?

@TomAugspurger
Copy link
Contributor Author

Done.

@jorisvandenbossche jorisvandenbossche merged commit a6ff9e6 into pandas-dev:master Mar 28, 2018
@jorisvandenbossche
Copy link
Member

Thanks!

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
@TomAugspurger TomAugspurger deleted the shared-doc-guide branch May 2, 2018 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Explain how our shared_docs system works
4 participants