Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix quantile docstring #22906

Closed
wants to merge 6 commits into from
Closed

Conversation

tm9k1
Copy link
Contributor

@tm9k1 tm9k1 commented Sep 30, 2018

…tput of example 1 Added summary for See Also functions Some typo fixes .

…tput of example 1 Added summary for **See Also** functions Some typo fixes .
@pep8speaks
Copy link

pep8speaks commented Sep 30, 2018

Hello @brute4s99! Thanks for updating the PR.

Comment last updated on September 30, 2018 at 07:50 Hours UTC

@tm9k1
Copy link
Contributor Author

tm9k1 commented Sep 30, 2018

In this commit -

  • Added extended summary for the function
  • corrected output of example 1
  • Added summary for See Also functions
  • Some typo fixes

Comments :
Also, please review my implementation to use '\' for multi-line commands. ./scripts/validate_docstrings.py likes it ! :D

@tm9k1 tm9k1 changed the title In this commit - Added extended summary for the function corrected ou… fix quantile docstring Sep 30, 2018
@codecov
Copy link

codecov bot commented Sep 30, 2018

Codecov Report

Merging #22906 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22906      +/-   ##
==========================================
+ Coverage   92.18%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50830    50873      +43     
==========================================
+ Hits        46860    46904      +44     
+ Misses       3970     3969       -1
Flag Coverage Δ
#multiple 90.61% <100%> (+0.01%) ⬆️
#single 42.32% <0%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.2% <100%> (ø) ⬆️
pandas/core/dtypes/dtypes.py 95.56% <0%> (-0.56%) ⬇️
pandas/core/internals/blocks.py 93.48% <0%> (-0.37%) ⬇️
pandas/core/reshape/merge.py 93.89% <0%> (-0.26%) ⬇️
pandas/core/ops.py 97.19% <0%> (-0.19%) ⬇️
pandas/core/arrays/categorical.py 95.62% <0%> (-0.13%) ⬇️
pandas/io/pytables.py 92.44% <0%> (-0.05%) ⬇️
pandas/core/generic.py 96.65% <0%> (-0.02%) ⬇️
pandas/core/strings.py 98.63% <0%> (ø) ⬆️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14598c6...39081e1. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks for the fixes. I added some comments on things that can be improved.

Return values at the given quantile over requested axis.
Return value(s) at the given quantile over requested axis.

This function calculates the 'q' quantile values on the dataframe,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description, I'm unsure about using the word calculate, as gives the impression that the values are not present in the data already. Also, I'd use DataFrame as in the class (capital letters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos fixed, and changed terminology. Wait for next commit!

0 <= q <= 1, the quantile(s) to compute
axis : {0, 1, 'index', 'columns'} (default 0)
0 or 'index' for row-wise, 1 or 'columns' for column-wise
q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line, besides the name, it'd be good to have just the types and the default value. Any additional clarification is better to have it in the description. By keeping only types, we'll eventually be able to catch typos automatically in them. So, q : float or array-like, default 0.5 and the rest as part of the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, @datapythonista ! Looks much better that way!

q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1]
The quantile(s) to compute.
axis : boolean{0, 1, 'index', 'columns'} (default 0)
For row-wise : 0 or'index', for column-wise : 1 or 'columns'.
numeric_only : boolean, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use bool (Python type) instead of boolean

Copy link
Contributor Author

@tm9k1 tm9k1 Sep 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to the convention axis : {0 or 'index', 1 or 'columns'}, default 0 so boolean/bool won't be needed.


- If ``q`` is an array, a DataFrame will be returned where the
index is ``q``, the columns are the columns of self, and the
scalar, Series or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where this function returns a scalar? If so, can you explain in the description when

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add a few examples !

columns=['a', 'b'])
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], \
[3, 100], [4, 100]]), \
columns=['a', 'b'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually try to keep the examples small, but I think the results are almost meaningless with just two samples. I'd have may be 6 or 8 in this case. Also, we try to use data that looks kind of real, and that the user know before hand (see this example: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7580).

Also, can you use the validations mentioned in the issue. This example can't be run, and the backslashes are redundant and a style error is probably being shown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, @datapythonista . I'll think of something!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista please review my PR. I believe these examples should suffice.

'B': [pd.Timestamp('2010'), \
pd.Timestamp('2011')], \
'C': [pd.Timedelta('1 days'), \
pd.Timedelta('2 days')]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, try to define just an example at the beginning, and use it everywhere. If you need a date column, you can add it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @datapythonista .

@datapythonista datapythonista changed the title fix quantile docstring DOC: Fix quantile docstring Sep 30, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good, but there are some things that could make the examples better I think, added some comments


Examples
--------
>>> import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's being discussed in #22900 how to address the validation error because of the missing import. Can you just remove this line for now and ignore the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

q : float or array-like, default 0.5
The quantile(s) to compute (0 <= q <= 1) (0.5 == 50% quantile)
If float is passed as `q`, scalar quantile is returned
If `array-like` is passed as `q`, Series is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description is correct when calling quantile on a Series. But as this is for DataFrame, the mentioned types are not correct (or they are not clear enough, scalar is never returned for DataFrame if I'm not wrong). Can you also replace the somehow mathematical formulario in (0 <= q <= 1) and .5 == 50% by an explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops! Missed it. Sorry!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q : float or array-like, default 0.5
The quantile(s) to compute
the quantile(s) should be a floating point number
in the range [0.0, 1.0]
Passing q = 0.5 is equivalent to call for a 50% quantile value

Does this look fine, @datapythonista ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but I'd prefer should be a float between 0 and 1 (inclusive). or something like this, that doesn't seem a mix of text and mathematical notation. Also include periods to separate sentences... And you can mention that 50% quantile is the median.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @datapythonista !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... And you can mention that 50% quantile is the median.

I am having second thoughts about this, but, as you say ! 👍

numpy.percentile
Returns 'nth' percentile for the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.percentile is used for numpy arrays, not for DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected !

8 450
9 1001
10 998
>>> for i in sorted(df['Data'],reverse=True): print(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred way to sort a Series is df['Data'].sort_values() which also have a reverse parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should've looked the documentation for this. Will correct this, @datapythonista .

Data 493.0
Name: 0.5, dtype: float64
>>> type(df.quantile())
<class 'pandas.core.series.Series'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to show the type

>>> df.quantile(q=0.7)
Data 859.0
Name: 0.7, dtype: float64
>>> df.quantile(q=[0.5, 0.7])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use for q the values 0.05 and 0.95 as they are quite standard in practice

>>> df.quantile(q=[0.55],interpolation='higher')
Data
0.55 548
>>> df.quantile(q=[0.55],interpolation='lower')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure you don't have pep8 issues in the code, a space is missing after the comma

@jreback
Copy link
Contributor

jreback commented Oct 7, 2018

@brute4s99 can you update to @datapythonista comments

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 8, 2018

It's done, @jreback. I am awaiting a review now.

@tm9k1
Copy link
Contributor Author

tm9k1 commented Oct 9, 2018

@datapythonista please review

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added many comments, but there are several other pep8 problems. As I mention in the sprint, read the documentation carefully, review your changes in detail before sending them, and run the validation and the rendering of the html to make sure everything is all right. It's a lot of work for us to review if there are so many errors in the PR. Thanks!

should be a float between 0 and 1 (inclusive),
0.5 is equivalent to calculate 50% quantile value ie the median.
axis : {0 or 'index', 1 or 'columns'}, default 0
For row-wise : 0 or'index', for column-wise : 1 or 'columns'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a typo here, but anyway this seems redundant with the type line. Can you check other docstrings with the axis parameter and copy their description.

axis : {0, 1, 'index', 'columns'} (default 0)
0 or 'index' for row-wise, 1 or 'columns' for column-wise
numeric_only : boolean, default True
q : float or array-like, default 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array-like is used for an object that implements the numpy.array api. I think just list would be better here.

q : float or array-like, default 0.5
The quantile(s) to compute,
should be a float between 0 and 1 (inclusive),
0.5 is equivalent to calculate 50% quantile value ie the median.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you finish the lines to close to 79 characters long. Feels a bit weird to have these short lines.

Can you use (i.e. the median) at the end.


See Also
--------
pandas.core.window.Rolling.quantile
Returns the rolling quantile for the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the See Also section has the format func : desc in the same line (continuing in the next). Can you check the docs and the other docstrings and adapt.

>>> import numpy as np
>>> d = {'animal':['Cheetah','Falcon','Eagle','Goose','Pigeon'],
... 'class':['mammal','bird','bird','bird','bird'],
... 'max_speed':[120,np.nan,320,142,150]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please respect pep8 and have the right indentation. It'd be better if you can avoid having a separate variable d, and instead provide the data directly to the constructor.

The import numpy can be ommited.

3 Goose bird 142.0
4 Pigeon bird 150.0

The `max_speed` in sorted order:-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for the - at the end or is a typo?

Examples
--------
>>> import numpy as np
>>> d = {'animal':['Cheetah','Falcon','Eagle','Goose','Pigeon'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have the name of the animals as the index instead.

@datapythonista
Copy link
Member

@brute4s99 can you update based on the previous review?

@datapythonista
Copy link
Member

@brute4s99 do you have time to make the required fixes?

@tm9k1
Copy link
Contributor Author

tm9k1 commented Nov 9, 2018

@brute4s99 do you have time to make the required fixes?

I am currently working on it, @datapythonista . I am sorry it is taking so much time :/

@datapythonista
Copy link
Member

Closing in favor of #23936

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Fix the docstring of quantile in pandas/core/frame.py
5 participants