Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix quantile docstring #22906

Closed
wants to merge 6 commits into from
Closed
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 71 additions & 42 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7390,20 +7390,26 @@ def f(s):
def quantile(self, q=0.5, axis=0, numeric_only=True,
interpolation='linear'):
"""
Return values at the given quantile over requested axis.
Return value(s) at the given quantile over requested axis.

This function returns the Series of 'q' quantile value(s)
from the DataFrame, dividing data points into groups
along `axis` axis.
In case of insufficient number of data points for clean division
into groups, specify `interpolation` scheme to implement.

Parameters
----------
q : float or array-like, default 0.5 (50% quantile)
0 <= q <= 1, the quantile(s) to compute
axis : {0, 1, 'index', 'columns'} (default 0)
0 or 'index' for row-wise, 1 or 'columns' for column-wise
numeric_only : boolean, default True
q : float or array-like, default 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array-like is used for an object that implements the numpy.array api. I think just list would be better here.

The quantile(s) to compute (0 <= q <= 1) (0.5 == 50% quantile)
If float is passed as `q`, scalar quantile is returned
If `array-like` is passed as `q`, Series is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is correct when calling quantile on a Series. But as this is for DataFrame, the mentioned types are not correct (or they are not clear enough, scalar is never returned for DataFrame if I'm not wrong). Can you also replace the somehow mathematical formulario in (0 <= q <= 1) and .5 == 50% by an explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops! Missed it. Sorry!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q : float or array-like, default 0.5
The quantile(s) to compute
the quantile(s) should be a floating point number
in the range [0.0, 1.0]
Passing q = 0.5 is equivalent to call for a 50% quantile value

Does this look fine, @datapythonista ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but I'd prefer should be a float between 0 and 1 (inclusive). or something like this, that doesn't seem a mix of text and mathematical notation. Also include periods to separate sentences... And you can mention that 50% quantile is the median.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @datapythonista !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... And you can mention that 50% quantile is the median.

I am having second thoughts about this, but, as you say ! 👍

axis : {0 or 'index', 1 or 'columns'}, default 0
For row-wise : 0 or'index', for column-wise : 1 or 'columns'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a typo here, but anyway this seems redundant with the type line. Can you check other docstrings with the axis parameter and copy their description.

numeric_only : bool, default True
If False, the quantile of datetime and timedelta data will be
computed as well
computed as well.
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
.. versionadded:: 0.18.0

This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points `i` and `j`:

Expand All @@ -7416,46 +7422,69 @@ def quantile(self, q=0.5, axis=0, numeric_only=True,

Returns
-------
quantiles : Series or DataFrame

- If ``q`` is an array, a DataFrame will be returned where the
index is ``q``, the columns are the columns of self, and the
Series or DataFrame
- If `q` is an array, a DataFrame will be returned where the
index is `q`, the columns are the columns of self, and the
values are the quantiles.
- If ``q`` is a float, a Series will be returned where the
- If `q` is a float, a Series will be returned where the
index is the columns of self and the values are the quantiles.

Examples
--------

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
columns=['a', 'b'])
>>> df.quantile(.1)
a 1.3
b 3.7
dtype: float64
>>> df.quantile([.1, .5])
a b
0.1 1.3 3.7
0.5 2.5 55.0

Specifying `numeric_only=False` will also compute the quantile of
datetime and timedelta data.

>>> df = pd.DataFrame({'A': [1, 2],
'B': [pd.Timestamp('2010'),
pd.Timestamp('2011')],
'C': [pd.Timedelta('1 days'),
pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A 1.5
B 2010-07-02 12:00:00
C 1 days 12:00:00
Name: 0.5, dtype: object

See Also
--------
pandas.core.window.Rolling.quantile
Returns the rolling quantile for the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the See Also section has the format func : desc in the same line (continuing in the next). Can you check the docs and the other docstrings and adapt.

numpy.percentile
Returns 'nth' percentile for the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.percentile is used for numpy arrays, not for DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected !


Examples
--------
>>> import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being discussed in #22900 how to address the validation error because of the missing import. Can you just remove this line for now and ignore the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

>>> d = {'Data': [416, 493, 423, 859, 32, 548,\
33, 951, 450, 1001, 998]}
>>> df = pd.DataFrame(data=d)
>>> df
Data
0 416
1 493
2 423
3 859
4 32
5 548
6 33
7 951
8 450
9 1001
10 998
>>> for i in sorted(df['Data'],reverse=True): print(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred way to sort a Series is df['Data'].sort_values() which also have a reverse parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should've looked the documentation for this. Will correct this, @datapythonista .

1001
998
951
859
548
493
450
423
416
33
32
>>> df.quantile()
Data 493.0
Name: 0.5, dtype: float64
>>> type(df.quantile())
<class 'pandas.core.series.Series'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to show the type

>>> df.quantile(q=0.7)
Data 859.0
Name: 0.7, dtype: float64
>>> df.quantile(q=[0.5, 0.7])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use for q the values 0.05 and 0.95 as they are quite standard in practice

Data
0.5 493.0
0.7 859.0
>>> df.quantile(q=[0.55],interpolation='higher')
Data
0.55 548
>>> df.quantile(q=[0.55],interpolation='lower')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure you don't have pep8 issues in the code, a space is missing after the comma

Data
0.55 493
"""
self._check_percentile(q)

Expand Down