Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix to_latex docstring. #22516

Merged
merged 4 commits into from
Sep 8, 2018
Merged

Conversation

Moisan
Copy link
Contributor

@Moisan Moisan commented Aug 26, 2018

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Fix the DataFrame.to_latex docstring to match scripts/validate_docstrings.py as explained in #22459 and add an example.

The docstring was previously in a variable that was only used in to_latex. I put it in the method docstring instead. The @Substitution wasn't matching anything, I suspect this dates back to the common docstring in io/formats/format.py.

@codecov
Copy link

codecov bot commented Aug 26, 2018

Codecov Report

Merging #22516 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22516      +/-   ##
==========================================
- Coverage   92.04%   92.04%   -0.01%     
==========================================
  Files         169      169              
  Lines       50787    50784       -3     
==========================================
- Hits        46746    46743       -3     
  Misses       4041     4041
Flag Coverage Δ
#multiple 90.45% <100%> (-0.01%) ⬇️
#single 42.29% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 96.43% <100%> (-0.01%) ⬇️
pandas/core/indexes/multi.py 95.41% <0%> (ø) ⬆️
pandas/core/frame.py 97.2% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.45% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa47b8d...c64fb5d. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution, the docstring really needed some care. Added just some minor comments mainly about the parameter types.

Parameters
----------
buf : StringIO-like, optional
Buffer to write to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than StringIO-like I'd say this is a file descriptor. I guess as other methods, if None, the output is returned as a string. I think in this cases we usually say file descriptor or None, instead of optional, but in either case, we want to explain that if None it returns the string.

----------
buf : StringIO-like, optional
Buffer to write to.
columns : sequence, optional, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ened for default None when it's optional. I think we agreed on using label when talking about the objects in the indices, so it could make sense to have the type as list of label.

Write row names (index).
na_rep : str, default 'NaN'
Missing data representation.
formatters : list or dict of one-param. functions, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is a bit confusing. Don't know what it's expected, would it be list of function or dict of {str: function}?

Formatter functions to apply to columns' elements by position or
name. The result of each function must be a unicode string.
List must be of length equal to the number of columns.
float_format : str, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional instead of default None would make more sense in this case to me. It's not always clear, but in general we use default None when the None value is used as None. When None means the feature is not used, it's optional.

Format string for floating point numbers.
sparsify : bool, optional, default None
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if None and False are the same here? if that's the case I'd prefer to get rid of optional and explain in the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not the same, in formats.py:

if sparsify is None:
    sparsify = get_option("display.multi_sparse")

sparsify : bool, optional, default None
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary optional

Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Prints the names of the indexes.
bold_rows : boolean, default False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool instead of boolean

When set to None, the value will default from the pandas config
module. Use a longtable environment instead of tabular. Requires
adding a \usepackage{longtable} to your LaTeX preamble.
escape : bool, default will be read from the pandas config module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave here simple default None, and explain about the config in the description?

See Also
--------
DataFrame.to_csv : Write a DataFrame to CSV format.
DataFrame.to_excel : Write a DataFrame to an Excel file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to_string and to_html in this case, which to me are conceptually more similar than to_csv or to_excel (formatting to present vs formatting to export).

Default: True.
<https://en.wikibooks.org/wiki/LaTeX/Tables>`__ e.g. 'rcl' for 3
columns.
longtable : bool, default None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be bool, optional, default None, like on line 2558?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it should be bool, optional. We never use whatever_type, optional, default None. We use only one of them.

When the None default value is being used, we use default None, imagine for example .fillna(value=None) where the None is the value used to impute.

When the None is just a flag, then we use optional. For example, in this case the longtable won't get the value None itself, but a value from the config. Meaning that it's optional to provide a longtable value, as we can use that.

When set to None, the value will default from the pandas config
module. Use a longtable environment instead of tabular. Requires
adding a \usepackage{longtable} to your LaTeX preamble.
escape : bool, default None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above.

@pep8speaks
Copy link

Hello @Moisan! Thanks for updating the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run ./scripts/validate_docstrings.py pandas.DataFrame.to_latex?

I think it will complain that the first line should be a single line (this is important for the page with the list of methods)

When the script says everything is all right, I'm happy with it. Really nice change, much better docstring now.

@Moisan
Copy link
Contributor Author

Moisan commented Sep 5, 2018

./scripts/validate_docstrings.py pandas.DataFrame.to_latex returns that everything is correct.

@datapythonista
Copy link
Member

That's an error in the script them. Can you post the output of the script to see how the docstring is veing rendered, and see if it helps to see what's wrong with the script.

And after that fix the docstring, so the first line (short summary) fits in a single line, as described here: https://pandas.pydata.org/pandas-docs/stable/contributing_docstring.html

@Moisan
Copy link
Contributor Author

Moisan commented Sep 5, 2018

Here is the output. I can open an issue regarding the bug of scripts/validate_docstrings.py.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the contribution @Moisan

If you can open the issue for the validation, that would be great.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2018

thanks @Moisan

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants