-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Fix quantile docstring #22906
DOC: Fix quantile docstring #22906
Conversation
…tput of example 1 Added summary for **See Also** functions Some typo fixes .
Hello @brute4s99! Thanks for updating the PR.
Comment last updated on September 30, 2018 at 07:50 Hours UTC |
In this commit -
Comments : |
Codecov Report
@@ Coverage Diff @@
## master #22906 +/- ##
==========================================
+ Coverage 92.18% 92.19% +<.01%
==========================================
Files 169 169
Lines 50830 50873 +43
==========================================
+ Hits 46860 46904 +44
+ Misses 3970 3969 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, thanks for the fixes. I added some comments on things that can be improved.
pandas/core/frame.py
Outdated
Return values at the given quantile over requested axis. | ||
Return value(s) at the given quantile over requested axis. | ||
|
||
This function calculates the 'q' quantile values on the dataframe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description, I'm unsure about using the word calculate
, as gives the impression that the values are not present in the data already. Also, I'd use DataFrame
as in the class (capital letters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos fixed, and changed terminology. Wait for next commit!
pandas/core/frame.py
Outdated
0 <= q <= 1, the quantile(s) to compute | ||
axis : {0, 1, 'index', 'columns'} (default 0) | ||
0 or 'index' for row-wise, 1 or 'columns' for column-wise | ||
q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, besides the name, it'd be good to have just the types and the default value. Any additional clarification is better to have it in the description. By keeping only types, we'll eventually be able to catch typos automatically in them. So, q : float or array-like, default 0.5
and the rest as part of the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, @datapythonista ! Looks much better that way!
pandas/core/frame.py
Outdated
q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1] | ||
The quantile(s) to compute. | ||
axis : boolean{0, 1, 'index', 'columns'} (default 0) | ||
For row-wise : 0 or'index', for column-wise : 1 or 'columns'. | ||
numeric_only : boolean, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use bool
(Python type) instead of boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch to the convention axis : {0 or 'index', 1 or 'columns'}, default 0
so boolean/bool won't be needed.
pandas/core/frame.py
Outdated
|
||
- If ``q`` is an array, a DataFrame will be returned where the | ||
index is ``q``, the columns are the columns of self, and the | ||
scalar, Series or DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where this function returns a scalar? If so, can you explain in the description when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add a few examples !
pandas/core/frame.py
Outdated
columns=['a', 'b']) | ||
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], \ | ||
[3, 100], [4, 100]]), \ | ||
columns=['a', 'b']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually try to keep the examples small, but I think the results are almost meaningless with just two samples. I'd have may be 6 or 8 in this case. Also, we try to use data that looks kind of real, and that the user know before hand (see this example: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7580).
Also, can you use the validations mentioned in the issue. This example can't be run, and the backslashes are redundant and a style error is probably being shown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, @datapythonista . I'll think of something!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista please review my PR. I believe these examples should suffice.
pandas/core/frame.py
Outdated
'B': [pd.Timestamp('2010'), \ | ||
pd.Timestamp('2011')], \ | ||
'C': [pd.Timedelta('1 days'), \ | ||
pd.Timedelta('2 days')]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, try to define just an example at the beginning, and use it everywhere. If you need a date column, you can add it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, @datapythonista .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking good, but there are some things that could make the examples better I think, added some comments
pandas/core/frame.py
Outdated
|
||
Examples | ||
-------- | ||
>>> import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's being discussed in #22900 how to address the validation error because of the missing import. Can you just remove this line for now and ignore the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
pandas/core/frame.py
Outdated
q : float or array-like, default 0.5 | ||
The quantile(s) to compute (0 <= q <= 1) (0.5 == 50% quantile) | ||
If float is passed as `q`, scalar quantile is returned | ||
If `array-like` is passed as `q`, Series is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this description is correct when calling quantile
on a Series. But as this is for DataFrame, the mentioned types are not correct (or they are not clear enough, scalar is never returned for DataFrame if I'm not wrong). Can you also replace the somehow mathematical formulario in (0 <= q <= 1)
and .5 == 50%
by an explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops! Missed it. Sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q : float or array-like, default 0.5
The quantile(s) to compute
the quantile(s) should be a floating point number
in the range [0.0, 1.0]
Passingq
= 0.5 is equivalent to call for a 50% quantile value
Does this look fine, @datapythonista ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you, but I'd prefer should be a float between 0 and 1 (inclusive).
or something like this, that doesn't seem a mix of text and mathematical notation. Also include periods to separate sentences... And you can mention that 50% quantile is the median.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, @datapythonista !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... And you can mention that 50% quantile is the median.
I am having second thoughts about this, but, as you say ! 👍
pandas/core/frame.py
Outdated
numpy.percentile | ||
Returns 'nth' percentile for the DataFrame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy.percentile
is used for numpy arrays, not for DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected !
pandas/core/frame.py
Outdated
8 450 | ||
9 1001 | ||
10 998 | ||
>>> for i in sorted(df['Data'],reverse=True): print(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred way to sort a Series is df['Data'].sort_values()
which also have a reverse
parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should've looked the documentation for this. Will correct this, @datapythonista .
pandas/core/frame.py
Outdated
Data 493.0 | ||
Name: 0.5, dtype: float64 | ||
>>> type(df.quantile()) | ||
<class 'pandas.core.series.Series'> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to show the type
pandas/core/frame.py
Outdated
>>> df.quantile(q=0.7) | ||
Data 859.0 | ||
Name: 0.7, dtype: float64 | ||
>>> df.quantile(q=[0.5, 0.7]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use for q
the values 0.05
and 0.95
as they are quite standard in practice
pandas/core/frame.py
Outdated
>>> df.quantile(q=[0.55],interpolation='higher') | ||
Data | ||
0.55 548 | ||
>>> df.quantile(q=[0.55],interpolation='lower') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure you don't have pep8 issues in the code, a space is missing after the comma
@brute4s99 can you update to @datapythonista comments |
It's done, @jreback. I am awaiting a review now. |
@datapythonista please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added many comments, but there are several other pep8 problems. As I mention in the sprint, read the documentation carefully, review your changes in detail before sending them, and run the validation and the rendering of the html to make sure everything is all right. It's a lot of work for us to review if there are so many errors in the PR. Thanks!
should be a float between 0 and 1 (inclusive), | ||
0.5 is equivalent to calculate 50% quantile value ie the median. | ||
axis : {0 or 'index', 1 or 'columns'}, default 0 | ||
For row-wise : 0 or'index', for column-wise : 1 or 'columns'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a typo here, but anyway this seems redundant with the type line. Can you check other docstrings with the axis
parameter and copy their description.
axis : {0, 1, 'index', 'columns'} (default 0) | ||
0 or 'index' for row-wise, 1 or 'columns' for column-wise | ||
numeric_only : boolean, default True | ||
q : float or array-like, default 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array-like
is used for an object that implements the numpy.array
api. I think just list
would be better here.
q : float or array-like, default 0.5 | ||
The quantile(s) to compute, | ||
should be a float between 0 and 1 (inclusive), | ||
0.5 is equivalent to calculate 50% quantile value ie the median. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you finish the lines to close to 79 characters long. Feels a bit weird to have these short lines.
Can you use (i.e. the median)
at the end.
|
||
See Also | ||
-------- | ||
pandas.core.window.Rolling.quantile | ||
Returns the rolling quantile for the DataFrame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the See Also
section has the format func : desc
in the same line (continuing in the next). Can you check the docs and the other docstrings and adapt.
>>> import numpy as np | ||
>>> d = {'animal':['Cheetah','Falcon','Eagle','Goose','Pigeon'], | ||
... 'class':['mammal','bird','bird','bird','bird'], | ||
... 'max_speed':[120,np.nan,320,142,150]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please respect pep8 and have the right indentation. It'd be better if you can avoid having a separate variable d
, and instead provide the data directly to the constructor.
The import numpy
can be ommited.
3 Goose bird 142.0 | ||
4 Pigeon bird 150.0 | ||
|
||
The `max_speed` in sorted order:- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for the -
at the end or is a typo?
Examples | ||
-------- | ||
>>> import numpy as np | ||
>>> d = {'animal':['Cheetah','Falcon','Eagle','Goose','Pigeon'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have the name of the animals as the index instead.
@brute4s99 can you update based on the previous review? |
@brute4s99 do you have time to make the required fixes? |
I am currently working on it, @datapythonista . I am sorry it is taking so much time :/ |
Closing in favor of #23936 |
…tput of example 1 Added summary for See Also functions Some typo fixes .
git diff upstream/master -u -- "*.py" | flake8 --diff