Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pandas.DataFrame.hist docstring #20113

Merged
merged 2 commits into from
Mar 13, 2018

Conversation

DZPM
Copy link
Contributor

@DZPM DZPM commented Mar 10, 2018

  • PR title is "DOC: update the pandas.DataFrame.hist docstring"
  • The validation script passes: scripts/validate_docstrings.py pandas.DataFrame.hist
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single pandas.DataFrame.hist
  • It has been proofread on language by another sprint participant
################################################################################
###################### Docstring (pandas.DataFrame.hist)  ######################
################################################################################

Draw histogram of the DataFrame's series using matplotlib / pylab.

A histogram is a representation of the distribution of numerical data.
This function wraps the matplotlib histogram function for each serie in
the DataFrame. It returns an array with a plot for each histogram.

Parameters
----------
data : DataFrame
    The pandas object holding the data.
column : string or sequence
    If passed, will be used to limit data to a subset of columns.
by : object, optional
    If passed, then used to form histograms for separate groups.
grid : boolean, default True
    Whether to show axis grid lines.
xlabelsize : int, default None
    If specified changes the x-axis label size.
xrot : float, default None
    Rotation of x axis labels.
ylabelsize : int, default None
    If specified changes the y-axis label size.
yrot : float, default None
    Rotation of y axis labels.
ax : Matplotlib axes object, default None
    The histogram axes.
sharex : boolean, default True if ax is None else False
    In case subplots=True, share x axis and set some x axis labels to
    invisible; defaults to True if ax is None otherwise False if an ax
    is passed in; Be aware, that passing in both an ax and sharex=True
    will alter all x axis labels for all subplots in a figure!.
sharey : boolean, default False
    In case subplots=True, share y axis and set some y axis labels to
    invisible.
figsize : tuple
    The size of the figure to create in inches by default.
layout : tuple, optional
    Tuple of (rows, columns) for the layout of the histograms.
bins : integer or sequence, default 10
    Number of histogram bins to be used. If an integer is given, bins + 1
    bin edges are calculated and returned. If bins is a sequence, gives
    bin edges, including left edge of first bin and right edge of last
    bin. In this case, bins is returned unmodified.
kwds : Keyword Arguments
    All other plotting keyword arguments to be passed to
    matplotlib's boxplot function.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them

See Also
--------
matplotlib.axes.Axes.hist : Plot a histogram using matplotlib.

Examples
--------

.. plot::
    :context: close-figs

    >>> df = pd.DataFrame({
    ...     'length': [ 1.5, 0.5, 1.2, 0.9, 3],
    ...     'width': [ 0.7, 0.2, 0.15, 0.2,  1.1]
    ...     }, index= ['pig', 'rabbit', 'duck', 'chicken', 'horse'])
    >>> hist = df.hist(bins=3)

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.hist" correct. :)

@@ -2130,48 +2130,70 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,
"""
Draw histogram of the DataFrame's series using matplotlib / pylab.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the / pylab. It's been deprecated for ~5 years now :)

@@ -2130,48 +2130,70 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,
"""
Draw histogram of the DataFrame's series using matplotlib / pylab.

A histogram is a representation of the distribution of numerical data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike numerical as hist can handle non-numeric data as well.

@@ -2130,48 +2130,70 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,
"""
Draw histogram of the DataFrame's series using matplotlib / pylab.

A histogram is a representation of the distribution of numerical data.
This function wraps the matplotlib histogram function for each serie in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serie -> series.

ax : matplotlib axes object, default None
Rotation of y axis labels.
ax : Matplotlib axes object, default None
The histogram axes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The axes to plot the histogram on."

sharex : boolean, default True if ax is None else False
In case subplots=True, share x axis and set some x axis labels to
invisible; defaults to True if ax is None otherwise False if an ax
is passed in; Be aware, that passing in both an ax and sharex=True
will alter all x axis labels for all subplots in a figure!
will alter all x axis labels for all subplots in a figure!.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the . since the period already ends in a punctuation character. Did the script complain about ending with a !?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the validator script complains:

Errors in parameters section
	Parameter "sharex" description should finish with "."

I'm rewriting it to remove the !.

figsize : tuple
The size of the figure to create in inches by default
The size of the figure to create in inches by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

The size in inches of the figure to create. Uses the value in `matplotlib.rcParams` by default.

bins : integer or sequence, default 10
Number of histogram bins to be used. If an integer is given, bins + 1
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
`**kwds` : other plotting keyword arguments
To be passed to hist function
kwds : Keyword Arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but I think we're just doing these as

kwds:
    All other plotting ...

IOW no type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That raises some errors in the validator:

Errors in parameters section
	Parameters {'kwds'} not documented
	Unknown parameters {'kwds :'}
	Parameter "kwds :" has no type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write kwds : optional.

The validation script is wrong @datapythonista team is fixing it from London. Use always **kwargs as parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!


Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.array -> numpy.ndarray

(np.array is the function, np.ndarray is the object).

@DZPM DZPM force-pushed the docstring_dataframe_hist branch from 8898b45 to d7a1ecd Compare March 10, 2018 12:27
@DZPM
Copy link
Contributor Author

DZPM commented Mar 10, 2018

Fixed all isues (except the kwargs, as it'd fail the validation), and ammended the commit.

Thanks for the review!

@DZPM DZPM force-pushed the docstring_dataframe_hist branch from d7a1ecd to 8f5c449 Compare March 10, 2018 12:56
@@ -2128,50 +2128,74 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,
xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False,
sharey=False, figsize=None, layout=None, bins=10, **kwds):
"""
Draw histogram of the DataFrame's series using matplotlib / pylab.
Draw histogram of the DataFrame's Series using matplotlib.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is written like this in other parts of the docs, but I personally find it easier to understand to say something like Draw histogram of the DataFrame's columns, instead of DataFrame's Series.

xrot : float, default None
rotation of x axis labels
Rotation of x axis labels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some explanation like "For example, a value of 90 displays the x labels rotated 90º clockwise".

yrot : float, default None
rotation of y axis labels
ax : matplotlib axes object, default None
Rotation of y axis labels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, make clear what a value of 90 means, for example.

is passed in; Be aware, that passing in both an ax and sharex=True
will alter all x axis labels for all subplots in a figure!
is passed in.
Be aware: passing in both an ax and sharex=True will alter all x axis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "Note" instead of "Be aware", which feels a bit more agressive.

To be passed to hist function
kwds : Keyword Arguments
All other plotting keyword arguments to be passed to
matplotlib's boxplot function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to this function using :method: or :class:.

:context: close-figs

>>> df = pd.DataFrame({
... 'length': [ 1.5, 0.5, 1.2, 0.9, 3],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind the spacing: [ 1.5, .... In next lines, there are two spaces between the 0.2 and the 1.1.

.. plot::
:context: close-figs

>>> df = pd.DataFrame({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some introduction explaining in words what the code does?

... 'length': [ 1.5, 0.5, 1.2, 0.9, 3],
... 'width': [ 0.7, 0.2, 0.15, 0.2, 1.1]
... }, index= ['pig', 'rabbit', 'duck', 'chicken', 'horse'])
>>> hist = df.hist(bins=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of arguments for this method. Can you write a couple more examples trying to show how do they behave?


A histogram is a representation of the distribution of data.
This function wraps the matplotlib histogram function for each series in
the DataFrame. It returns an array with a plot for each histogram.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you link to wikipedia (here or in Notes is ok)


A histogram is a representation of the distribution of data.
This function wraps the matplotlib histogram function for each series in
the DataFrame. It returns an array with a plot for each histogram.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It returns an array with a plot for each histogram" is a bit misleading. What about "A histogram is plotted for every column of the DataFrame"?

@@ -2128,50 +2128,79 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,
xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False,
sharey=False, figsize=None, layout=None, bins=10, **kwds):
"""
Draw histogram of the DataFrame's series using matplotlib / pylab.
Draw histogram of the DataFrame's columns using matplotlib.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in other cases, I believe that we should delay the fact that it is using matplotlib under the hood to the extended description and get it out of the short one.

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 13, 2018
@codecov
Copy link

codecov bot commented Mar 13, 2018

Codecov Report

Merging #20113 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20113      +/-   ##
==========================================
- Coverage   91.72%    91.7%   -0.03%     
==========================================
  Files         150      150              
  Lines       49156    49156              
==========================================
- Hits        45090    45078      -12     
- Misses       4066     4078      +12
Flag Coverage Δ
#multiple 90.08% <ø> (-0.03%) ⬇️
#single 41.85% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.23% <ø> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/generic.py 95.84% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.06% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 840d432...ab1e17d. Read the comment docs.

@TomAugspurger TomAugspurger merged commit ffe297c into pandas-dev:master Mar 13, 2018
@TomAugspurger
Copy link
Contributor

Thanks @DZPM !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants