DOC: Fix quantile docstring #22906

tm9k1 · 2018-09-30T07:49:27Z

…tput of example 1 Added summary for See Also functions Some typo fixes .

closes DOC: Fix the docstring of quantile in pandas/core/frame.py #22898
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…tput of example 1 Added summary for **See Also** functions Some typo fixes .

pep8speaks · 2018-09-30T07:49:31Z

Hello @brute4s99! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/frame.py !

Comment last updated on September 30, 2018 at 07:50 Hours UTC

tm9k1 · 2018-09-30T07:49:32Z

In this commit -

Added extended summary for the function
corrected output of example 1
Added summary for See Also functions
Some typo fixes

Comments :
Also, please review my implementation to use '\' for multi-line commands. ./scripts/validate_docstrings.py likes it ! :D

codecov · 2018-09-30T08:33:43Z

Codecov Report

Merging #22906 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22906      +/-   ##
==========================================
+ Coverage   92.18%   92.19%   +<.01%     
==========================================
  Files         169      169              
  Lines       50830    50873      +43     
==========================================
+ Hits        46860    46904      +44     
+ Misses       3970     3969       -1

Flag	Coverage Δ
#multiple	`90.61% <100%> (+0.01%)`	⬆️
#single	`42.32% <0%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.2% <100%> (ø)`	⬆️
pandas/core/dtypes/dtypes.py	`95.56% <0%> (-0.56%)`	⬇️
pandas/core/internals/blocks.py	`93.48% <0%> (-0.37%)`	⬇️
pandas/core/reshape/merge.py	`93.89% <0%> (-0.26%)`	⬇️
pandas/core/ops.py	`97.19% <0%> (-0.19%)`	⬇️
pandas/core/arrays/categorical.py	`95.62% <0%> (-0.13%)`	⬇️
pandas/io/pytables.py	`92.44% <0%> (-0.05%)`	⬇️
pandas/core/generic.py	`96.65% <0%> (-0.02%)`	⬇️
pandas/core/strings.py	`98.63% <0%> (ø)`	⬆️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14598c6...39081e1. Read the comment docs.

datapythonista

Looking good, thanks for the fixes. I added some comments on things that can be improved.

datapythonista · 2018-09-30T09:51:50Z

pandas/core/frame.py

-        Return values at the given quantile over requested axis.
+        Return value(s) at the given quantile over requested axis.
+
+        This function calculates the 'q' quantile values on the dataframe,


Great description, I'm unsure about using the word calculate, as gives the impression that the values are not present in the data already. Also, I'd use DataFrame as in the class (capital letters).

Typos fixed, and changed terminology. Wait for next commit!

datapythonista · 2018-09-30T09:54:22Z

pandas/core/frame.py

-            0 <= q <= 1, the quantile(s) to compute
-        axis : {0, 1, 'index', 'columns'} (default 0)
-            0 or 'index' for row-wise, 1 or 'columns' for column-wise
+        q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1]


In this line, besides the name, it'd be good to have just the types and the default value. Any additional clarification is better to have it in the description. By keeping only types, we'll eventually be able to catch typos automatically in them. So, q : float or array-like, default 0.5 and the rest as part of the description.

Sure, @datapythonista ! Looks much better that way!

pandas/core/frame.py

datapythonista · 2018-09-30T09:56:13Z

pandas/core/frame.py

+        q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1]
+            The quantile(s) to compute.
+        axis : boolean{0, 1, 'index', 'columns'} (default 0)
+            For row-wise : 0 or'index', for column-wise : 1 or 'columns'.
        numeric_only : boolean, default True


Can you use bool (Python type) instead of boolean

I'll switch to the convention axis : {0 or 'index', 1 or 'columns'}, default 0 so boolean/bool won't be needed.

datapythonista · 2018-09-30T09:57:26Z

pandas/core/frame.py

-
-            - If ``q`` is an array, a DataFrame will be returned where the
-              index is ``q``, the columns are the columns of self, and the
+        scalar, Series or DataFrame


Is there a case where this function returns a scalar? If so, can you explain in the description when

Sure, I'll add a few examples !

pandas/core/frame.py

datapythonista · 2018-09-30T10:02:55Z

pandas/core/frame.py

-                              columns=['a', 'b'])
+        >>> df = pd.DataFrame(np.array([[1, 1], [2, 10], \
+                                        [3, 100], [4, 100]]), \
+                                        columns=['a', 'b'])


We usually try to keep the examples small, but I think the results are almost meaningless with just two samples. I'd have may be 6 or 8 in this case. Also, we try to use data that looks kind of real, and that the user know before hand (see this example: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7580).

Also, can you use the validations mentioned in the issue. This example can't be run, and the backslashes are redundant and a style error is probably being shown.

Alright, @datapythonista . I'll think of something!

@datapythonista please review my PR. I believe these examples should suffice.

datapythonista · 2018-09-30T10:03:43Z

pandas/core/frame.py

+                                'B': [pd.Timestamp('2010'), \
+                                      pd.Timestamp('2011')], \
+                                'C': [pd.Timedelta('1 days'), \
+                                      pd.Timedelta('2 days')]})


If possible, try to define just an example at the beginning, and use it everywhere. If you need a date column, you can add it there.

okay, @datapythonista .

datapythonista

It's looking good, but there are some things that could make the examples better I think, added some comments

datapythonista · 2018-10-01T08:42:21Z

pandas/core/frame.py

+
+        Examples
+        --------
+        >>> import pandas as pd


It's being discussed in #22900 how to address the validation error because of the missing import. Can you just remove this line for now and ignore the error.

datapythonista · 2018-10-01T08:47:29Z

pandas/core/frame.py

+        q : float or array-like, default 0.5
+            The quantile(s) to compute (0 <= q <= 1) (0.5 == 50% quantile)
+            If float is passed as `q`, scalar quantile is returned
+            If `array-like` is passed as `q`, Series is returned.


I think this description is correct when calling quantile on a Series. But as this is for DataFrame, the mentioned types are not correct (or they are not clear enough, scalar is never returned for DataFrame if I'm not wrong). Can you also replace the somehow mathematical formulario in (0 <= q <= 1) and .5 == 50% by an explanation?

woops! Missed it. Sorry!

q : float or array-like, default 0.5
The quantile(s) to compute
the quantile(s) should be a floating point number
in the range [0.0, 1.0]
Passing q = 0.5 is equivalent to call for a 50% quantile value

Does this look fine, @datapythonista ?

Up to you, but I'd prefer should be a float between 0 and 1 (inclusive). or something like this, that doesn't seem a mix of text and mathematical notation. Also include periods to separate sentences... And you can mention that 50% quantile is the median.

okay, @datapythonista !

... And you can mention that 50% quantile is the median.

I am having second thoughts about this, but, as you say ! 👍

datapythonista · 2018-10-01T08:48:32Z

pandas/core/frame.py

        numpy.percentile
+            Returns 'nth' percentile for the DataFrame.


numpy.percentile is used for numpy arrays, not for DataFrame.

corrected !

pandas/core/frame.py

datapythonista · 2018-10-01T09:00:09Z

pandas/core/frame.py

+        8    450
+        9   1001
+        10   998
+        >>> for i in sorted(df['Data'],reverse=True): print(i)


The preferred way to sort a Series is df['Data'].sort_values() which also have a reverse parameter

Should've looked the documentation for this. Will correct this, @datapythonista .

datapythonista · 2018-10-01T09:00:35Z

pandas/core/frame.py

+        Data    493.0
+        Name: 0.5, dtype: float64
+        >>> type(df.quantile())
+        <class 'pandas.core.series.Series'>


I don't think it's necessary to show the type

datapythonista · 2018-10-01T09:01:12Z

pandas/core/frame.py

+        >>> df.quantile(q=0.7)
+        Data    859.0
+        Name: 0.7, dtype: float64
+        >>> df.quantile(q=[0.5, 0.7])


I'd use for q the values 0.05 and 0.95 as they are quite standard in practice

datapythonista · 2018-10-01T09:01:43Z

pandas/core/frame.py

+        >>> df.quantile(q=[0.55],interpolation='higher')
+              Data
+        0.55   548
+        >>> df.quantile(q=[0.55],interpolation='lower')


make sure you don't have pep8 issues in the code, a space is missing after the comma

jreback · 2018-10-07T23:11:21Z

@brute4s99 can you update to @datapythonista comments

tm9k1 · 2018-10-08T17:20:54Z

It's done, @jreback. I am awaiting a review now.

tm9k1 · 2018-10-09T09:18:05Z

@datapythonista please review

datapythonista

Added many comments, but there are several other pep8 problems. As I mention in the sprint, read the documentation carefully, review your changes in detail before sending them, and run the validation and the rendering of the html to make sure everything is all right. It's a lot of work for us to review if there are so many errors in the PR. Thanks!

datapythonista · 2018-10-10T07:04:53Z

pandas/core/frame.py

+            should be a float between 0 and 1 (inclusive),
+            0.5 is equivalent to calculate 50% quantile value ie the median.
+        axis : {0 or 'index', 1 or 'columns'}, default 0
+            For row-wise : 0 or'index', for column-wise : 1 or 'columns'.


there is a typo here, but anyway this seems redundant with the type line. Can you check other docstrings with the axis parameter and copy their description.

datapythonista · 2018-10-10T07:05:58Z

pandas/core/frame.py

-        axis : {0, 1, 'index', 'columns'} (default 0)
-            0 or 'index' for row-wise, 1 or 'columns' for column-wise
-        numeric_only : boolean, default True
+        q : float or array-like, default 0.5


array-like is used for an object that implements the numpy.array api. I think just list would be better here.

datapythonista · 2018-10-10T07:07:45Z

pandas/core/frame.py

+        q : float or array-like, default 0.5
+            The quantile(s) to compute,
+            should be a float between 0 and 1 (inclusive),
+            0.5 is equivalent to calculate 50% quantile value ie the median.


Can you finish the lines to close to 79 characters long. Feels a bit weird to have these short lines.

Can you use (i.e. the median) at the end.

datapythonista · 2018-10-10T07:09:03Z

pandas/core/frame.py


        See Also
        --------
        pandas.core.window.Rolling.quantile
+            Returns the rolling quantile for the DataFrame.


I think the See Also section has the format func : desc in the same line (continuing in the next). Can you check the docs and the other docstrings and adapt.

datapythonista · 2018-10-10T07:11:11Z

pandas/core/frame.py

+        >>> import numpy as np
+        >>> d = {'animal':['Cheetah','Falcon','Eagle','Goose','Pigeon'],
+        ... 'class':['mammal','bird','bird','bird','bird'],
+        ... 'max_speed':[120,np.nan,320,142,150]}


please respect pep8 and have the right indentation. It'd be better if you can avoid having a separate variable d, and instead provide the data directly to the constructor.

The import numpy can be ommited.

datapythonista · 2018-10-10T07:12:24Z

pandas/core/frame.py

+        3    Goose    bird        142.0
+        4   Pigeon    bird        150.0
+
+        The `max_speed` in sorted order:-


any reason for the - at the end or is a typo?

datapythonista · 2018-10-10T07:13:52Z

pandas/core/frame.py

+        Examples
+        --------
+        >>> import numpy as np
+        >>> d = {'animal':['Cheetah','Falcon','Eagle','Goose','Pigeon'],


I'd have the name of the animals as the index instead.

datapythonista · 2018-11-03T07:15:07Z

@brute4s99 can you update based on the previous review?

datapythonista · 2018-11-09T15:06:47Z

@brute4s99 do you have time to make the required fixes?

tm9k1 · 2018-11-09T15:17:46Z

@brute4s99 do you have time to make the required fixes?

I am currently working on it, @datapythonista . I am sorry it is taking so much time :/

datapythonista · 2018-12-07T12:46:19Z

Closing in favor of #23936

In this commit - Added extended summary for the function corrected ou…

72d8082

…tput of example 1 Added summary for **See Also** functions Some typo fixes .

corrected PEP8 issue

6ad3c1d

tm9k1 changed the title ~~In this commit - Added extended summary for the function corrected ou…~~ fix quantile docstring Sep 30, 2018

fixed **Returns** section

dce5a3e

datapythonista reviewed Sep 30, 2018

View reviewed changes

datapythonista added the Docs label Sep 30, 2018

datapythonista changed the title ~~fix quantile docstring~~ DOC: Fix quantile docstring Sep 30, 2018

tm9k1 added 2 commits October 1, 2018 13:07

More fixes and better examples

4910bbe

.

ef09ccc

datapythonista requested changes Oct 1, 2018

View reviewed changes

datapythonista mentioned this pull request Oct 1, 2018

DOC: Fix the docstring of resample in pandas/core/generic.py #22894

Closed

fixed

39081e1

datapythonista requested changes Oct 10, 2018

View reviewed changes

Moisan mentioned this pull request Nov 27, 2018

DOC: fix DataFrame.quantile docstring and doctests #23936

Merged

3 tasks

datapythonista closed this Dec 7, 2018

DOC: Fix quantile docstring #22906

DOC: Fix quantile docstring #22906

Conversation

tm9k1 commented Sep 30, 2018 • edited Loading

pep8speaks commented Sep 30, 2018 • edited Loading

Comment last updated on September 30, 2018 at 07:50 Hours UTC

tm9k1 commented Sep 30, 2018 • edited Loading

codecov bot commented Sep 30, 2018 • edited Loading

Codecov Report

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tm9k1 Sep 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 7, 2018

tm9k1 commented Oct 8, 2018 • edited Loading

tm9k1 commented Oct 9, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Nov 3, 2018

datapythonista commented Nov 9, 2018

tm9k1 commented Nov 9, 2018

datapythonista commented Dec 7, 2018

tm9k1 commented Sep 30, 2018 •

edited

Loading

pep8speaks commented Sep 30, 2018 •

edited

Loading

tm9k1 commented Sep 30, 2018 •

edited

Loading

codecov bot commented Sep 30, 2018 •

edited

Loading

tm9k1 Sep 30, 2018 •

edited

Loading

tm9k1 commented Oct 8, 2018 •

edited

Loading