-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improve pandas.Series.plot.kde docstring and kwargs rewording for whole file #20041
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2532,7 +2532,8 @@ def line(self, **kwds): | |
Parameters | ||
---------- | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2556,7 +2557,8 @@ def bar(self, **kwds): | |
Parameters | ||
---------- | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2571,7 +2573,8 @@ def barh(self, **kwds): | |
Parameters | ||
---------- | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2586,7 +2589,8 @@ def box(self, **kwds): | |
Parameters | ||
---------- | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2603,7 +2607,8 @@ def hist(self, bins=10, **kwds): | |
bins: integer, default 10 | ||
Number of histogram bins to be used | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2613,26 +2618,74 @@ def hist(self, bins=10, **kwds): | |
|
||
def kde(self, bw_method=None, ind=None, **kwds): | ||
""" | ||
Kernel Density Estimate plot | ||
Kernel Density Estimate plot using Gaussian kernels. | ||
|
||
In statistics, kernel density estimation (KDE) is a non-parametric way | ||
to estimate the probability density function (PDF) of a random | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ideally include a reference to a wiki page (same ones that matplotlib uses?) |
||
variable. This function uses Gaussian kernels and includes automatic | ||
bandwith determination. | ||
|
||
Parameters | ||
---------- | ||
bw_method: str, scalar or callable, optional | ||
The method used to calculate the estimator bandwidth. This can be | ||
bw_method : str, scalar or callable, optional | ||
The method used to calculate the estimator bandwidth. This can be | ||
'scott', 'silverman', a scalar constant or a callable. | ||
If None (default), 'scott' is used. | ||
See :class:`scipy.stats.gaussian_kde` for more information. | ||
ind : NumPy array or integer, optional | ||
Evaluation points. If None (default), 1000 equally spaced points | ||
are used. If `ind` is a NumPy array, the kde is evaluated at the | ||
points passed. If `ind` is an integer, `ind` number of equally | ||
spaced points are used. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Evaluation points for the estimated PDF. If None (default), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you spell out PDF There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both KDE and DPF are spelled out in the small summary above the parameters section. I wouldn't like to repeat that over here. @dukebody what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm I see it. |
||
1000 equally spaced points are used. If `ind` is a NumPy array, the | ||
kde is evaluated at the points passed. If `ind` is an integer, | ||
`ind` number of equally spaced points are used. | ||
kwds : optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be
|
||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
axes : matplotlib.AxesSubplot or np.array of them | ||
|
||
See also | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to have an additional reference:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See also -> See Also |
||
-------- | ||
scipy.stats.gaussian_kde : Representation of a kernel-density | ||
estimate using Gaussian kernels. This is the function used | ||
internally to estimate the PDF. | ||
|
||
Examples | ||
-------- | ||
Given a Series of points randomly sampled from an unknown | ||
distribution, estimate this distribution using KDE with automatic | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather say |
||
bandwidth determination and plot the results, evaluating them at | ||
1000 equally spaced points (default): | ||
|
||
.. plot:: | ||
:context: close-figs | ||
|
||
>>> s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5]) | ||
>>> ax = s.plot.kde() | ||
|
||
|
||
An scalar fixed bandwidth can be specified. Using a too small bandwidth | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great work @dukebody! 👍 This is Shivam from the HK Chapter of the Pandas Doc Sprint. Just had a minor suggestion for simplifying this paragraph:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, I'm Jonas from the Barcelona Pandas Doc Sprint. I agree, @shivam6294 version sounds a little better. Anyways, you have to change |
||
can lead to overfitting, while a too large bandwidth can result in | ||
underfitting: | ||
|
||
.. plot:: | ||
:context: close-figs | ||
|
||
>>> ax = s.plot.kde(bw_method=0.3) | ||
|
||
.. plot:: | ||
:context: close-figs | ||
|
||
>>> ax = s.plot.kde(bw_method=3) | ||
|
||
Finally, the `ind` parameter determines the evaluation points for the | ||
plot of the estimated PDF: | ||
|
||
.. plot:: | ||
:context: close-figs | ||
|
||
>>> ax = s.plot.kde(ind=[1, 2, 3, 4, 5]) | ||
""" | ||
return self(kind='kde', bw_method=bw_method, ind=ind, **kwds) | ||
|
||
|
@@ -2645,7 +2698,8 @@ def area(self, **kwds): | |
Parameters | ||
---------- | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2660,7 +2714,8 @@ def pie(self, **kwds): | |
Parameters | ||
---------- | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.Series.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2711,7 +2766,8 @@ def line(self, x=None, y=None, **kwds): | |
x, y : label or position, optional | ||
Coordinates for each point. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2728,7 +2784,8 @@ def bar(self, x=None, y=None, **kwds): | |
x, y : label or position, optional | ||
Coordinates for each point. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2745,7 +2802,8 @@ def barh(self, x=None, y=None, **kwds): | |
x, y : label or position, optional | ||
Coordinates for each point. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2762,7 +2820,8 @@ def box(self, by=None, **kwds): | |
by : string or sequence | ||
Column in the DataFrame to group by. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2781,7 +2840,8 @@ def hist(self, by=None, bins=10, **kwds): | |
bins: integer, default 10 | ||
Number of histogram bins to be used | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2806,7 +2866,8 @@ def kde(self, bw_method=None, ind=None, **kwds): | |
points passed. If `ind` is an integer, `ind` number of equally | ||
spaced points are used. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2825,7 +2886,8 @@ def area(self, x=None, y=None, **kwds): | |
x, y : label or position, optional | ||
Coordinates for each point. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2842,7 +2904,8 @@ def pie(self, y=None, **kwds): | |
y : label or position, optional | ||
Column to plot. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2863,7 +2926,8 @@ def scatter(self, x, y, s=None, c=None, **kwds): | |
c : label or position, optional | ||
Color of each point. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
@@ -2888,7 +2952,8 @@ def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, | |
gridsize : int, optional | ||
Number of bins. | ||
`**kwds` : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short summary should start with an infinitive verb. I would change it to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for plot functions it is fine to just state the plot type (the rules are to give direction, but sometimes there can be reason to deviate, so in this case the question is if "Draw .." makes it more informative)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in that adding
Draw
doesn't add much information, but I would still add a prefix since most (all?) of the doc strings created in the Doc Sprint will be written with these rules in mind. Also, the doc string proposed in PR20113 for thehist
function includesDraw
as well.On the other hand,
Generate
might be a better fit thanDraw
because if pandas isn't used from within a jupyter notebook there is nothing drawn immediately ..