Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pd.DataFrame.memory_usage/empty docstring(Seoul) #20102

Merged
merged 6 commits into from
Mar 15, 2018

Conversation

ohahohah
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

# paste output of "scripts/validate_docstrings.py <your-function-or-method>" here
# between the "```" (remove this comment, but keep the "```")

################################################################################
###################### Docstring (pandas.DataFrame.empty) ######################
################################################################################

True if DataFrame is empty.

True if DataFrame is entirely empty [no items], meaning any of the
axes are of length 0.

Returns
-------
empty : boolean
    if DataFrame is empty, return true, if not return false.

Notes
-----
If DataFrame contains only NaNs, it is still not considered empty. See
the example below.

Examples
--------
An example of an actual empty DataFrame. Notice the index is empty:

>>> df_empty = pd.DataFrame({'A' : []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We
will need to drop the NaNs to make the DataFrame empty:

>>> df = pd.DataFrame({'A' : [np.nan]})
>>> df
    A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True

See also
--------
pandas.Series.dropna
pandas.DataFrame.dropna

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Missing description for See Also "pandas.Series.dropna" reference
	Missing description for See Also "pandas.DataFrame.dropna" reference


################################################################################
################## Docstring (pandas.DataFrame.memory_usage)  ##################
################################################################################

Memory usage of DataFrame columns.

Memory usage of DataFrame is accessing pandas.DataFrame.info method.
A configuration option, `display.memory_usage` (see Parameters)

Parameters
----------
index : bool
    Specifies whether to include memory usage of DataFrame's
    index in returned Series. If `index=True` (default is False)
    the first index of the Series is `Index`.
deep : bool
    Introspect the data deeply, interrogate
    `object` dtypes for system-level memory consumption.

Returns
-------
sizes : Series
    A series with column names as index and memory usage of
    columns with units of bytes.

Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False

See Also
--------
numpy.ndarray.nbytes

Examples
--------
>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.random.randint(100, size=5000).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.memory_usage()
Index            80
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=True)
Index            80
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64
>>> df.memory_usage(index=True).sum()
205080

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Missing description for See Also "numpy.ndarray.nbytes" reference

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Lastly, I left errors already occurred in the previous version without changes.

@jorisvandenbossche
Copy link
Member

There is a related PR on Series.memory_usage: #20086
It might be interesting to look at, to make sure to use similar explanations of the keywords.

Copy link
Contributor

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!

A few comments are below.

Also please change the "default is False" for index in the docstring which is True.

object 40000
bool 5000
dtype: int64
>>> df.memory_usage(index=False)
Copy link
Contributor

@rth rth Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain the two latter examples (with index=False and True) bring anything. Just the first example might be enough.

@@ -1436,12 +1436,20 @@ def __contains__(self, key):

@property
def empty(self):
"""True if NDFrame is entirely empty [no items], meaning any of the
"""
True if DataFrame is empty.
Copy link
Contributor

@rth rth Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be """True [...] I think (no empty line)

Edit: nevermind, the official docstring example doesn't seem to do that.

Returns
-------
empty : boolean
if DataFrame is empty, return true, if not return false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, False

@jreback jreback added the Docs label Mar 10, 2018
@jreback
Copy link
Contributor

jreback commented Mar 10, 2018

coordinate text with #20086


Examples
--------
>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a categorical type here as well

@@ -1969,6 +1973,38 @@ def memory_usage(self, index=True, deep=False):
See Also
--------
numpy.ndarray.nbytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Series.memory_usage
Series.nbytes

* Consistent with Series.memory_usage
* Added Categorical notes

[ci skip]
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging later today.

@codecov
Copy link

codecov bot commented Mar 15, 2018

Codecov Report

Merging #20102 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20102      +/-   ##
==========================================
- Coverage   91.72%    91.7%   -0.03%     
==========================================
  Files         150      150              
  Lines       49149    49149              
==========================================
- Hits        45083    45071      -12     
- Misses       4066     4078      +12
Flag Coverage Δ
#multiple 90.08% <ø> (-0.03%) ⬇️
#single 41.85% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 95.84% <ø> (ø) ⬆️
pandas/core/frame.py 97.18% <ø> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/indexes/base.py 96.66% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52cffa3...d4cc71d. Read the comment docs.

The memory usage can optionally include the contribution of
the index and elements of `object` dtype.

A configuration option, `display.memory_usage` (see Parameters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be missing something in this sentence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

[ci skip]
@jorisvandenbossche jorisvandenbossche merged commit bf9e4f3 into pandas-dev:master Mar 15, 2018
@jorisvandenbossche
Copy link
Member

@ohahohah Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants