Merging upstream/master into my dev branch

# Conflicts: # doc/source/whatsnew/v0.23.2.txt
pandas-dev · Jun 21, 2018 · 726755f · 726755f
2 parents 53aca81 + 506935c
commit 726755f
Show file tree

Hide file tree

Showing 67 changed files with 1,448 additions and 983 deletions.
diff --git a/.gitignore b/.gitignore
@@ -110,3 +110,4 @@ doc/source/styled.xlsx
 doc/source/templates/
 env/
 doc/source/savefig/
+*my-dev-test.py
diff --git a/asv_bench/benchmarks/categoricals.py b/asv_bench/benchmarks/categoricals.py
@@ -202,7 +202,11 @@ class Contains(object):
     def setup(self):
         N = 10**5
         self.ci = tm.makeCategoricalIndex(N)
-        self.cat = self.ci.categories[0]
+        self.c = self.ci.values
+        self.key = self.ci.categories[0]
 
-    def time_contains(self):
-        self.cat in self.ci
+    def time_categorical_index_contains(self):
+        self.key in self.ci
+
+    def time_categorical_contains(self):
+        self.key in self.c
diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -1200,9 +1200,9 @@ Attributes and underlying data
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 **Axes**
 
-  * **items**: axis 0; each item corresponds to a DataFrame contained inside
-  * **major_axis**: axis 1; the index (rows) of each of the DataFrames
-  * **minor_axis**: axis 2; the columns of each of the DataFrames
+* **items**: axis 0; each item corresponds to a DataFrame contained inside
+* **major_axis**: axis 1; the index (rows) of each of the DataFrames
+* **minor_axis**: axis 2; the columns of each of the DataFrames
 
 .. autosummary::
    :toctree: generated/

diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -50,9 +50,8 @@ Attributes and the raw ndarray(s)
 
 pandas objects have a number of attributes enabling you to access the metadata
 
-  * **shape**: gives the axis dimensions of the object, consistent with ndarray
-  * Axis labels
-
+* **shape**: gives the axis dimensions of the object, consistent with ndarray
+* Axis labels
     * **Series**: *index* (only axis)
     * **DataFrame**: *index* (rows) and *columns*
     * **Panel**: *items*, *major_axis*, and *minor_axis*
@@ -131,9 +130,9 @@ Flexible binary operations
 With binary operations between pandas data structures, there are two key points
 of interest:
 
-  * Broadcasting behavior between higher- (e.g. DataFrame) and
-    lower-dimensional (e.g. Series) objects.
-  * Missing data in computations.
+* Broadcasting behavior between higher- (e.g. DataFrame) and
+  lower-dimensional (e.g. Series) objects.
+* Missing data in computations.
 
 We will demonstrate how to manage these issues independently, though they can
 be handled simultaneously.
@@ -462,10 +461,10 @@ produce an object of the same size. Generally speaking, these methods take an
 **axis** argument, just like *ndarray.{sum, std, ...}*, but the axis can be
 specified by name or integer:
 
-  - **Series**: no axis argument needed
-  - **DataFrame**: "index" (axis=0, default), "columns" (axis=1)
-  - **Panel**: "items" (axis=0), "major" (axis=1, default), "minor"
-    (axis=2)
+* **Series**: no axis argument needed
+* **DataFrame**: "index" (axis=0, default), "columns" (axis=1)
+* **Panel**: "items" (axis=0), "major" (axis=1, default), "minor"
+  (axis=2)
 
 For example:
 
@@ -1187,11 +1186,11 @@ It is used to implement nearly all other features relying on label-alignment
 functionality. To *reindex* means to conform the data to match a given set of
 labels along a particular axis. This accomplishes several things:
 
-  * Reorders the existing data to match a new set of labels
-  * Inserts missing value (NA) markers in label locations where no data for
-    that label existed
-  * If specified, **fill** data for missing labels using logic (highly relevant
-    to working with time series data)
+* Reorders the existing data to match a new set of labels
+* Inserts missing value (NA) markers in label locations where no data for
+  that label existed
+* If specified, **fill** data for missing labels using logic (highly relevant
+  to working with time series data)
 
 Here is a simple example:
 
@@ -1911,10 +1910,10 @@ the axis indexes, since they are immutable) and returns a new object. Note that
 **it is seldom necessary to copy objects**. For example, there are only a
 handful of ways to alter a DataFrame *in-place*:
 
-  * Inserting, deleting, or modifying a column.
-  * Assigning to the ``index`` or ``columns`` attributes.
-  * For homogeneous data, directly modifying the values via the ``values``
-    attribute or advanced indexing.
+* Inserting, deleting, or modifying a column.
+* Assigning to the ``index`` or ``columns`` attributes.
+* For homogeneous data, directly modifying the values via the ``values``
+  attribute or advanced indexing.
 
 To be clear, no pandas method has the side effect of modifying your data;
 almost every method returns a new object, leaving the original object
@@ -2112,22 +2111,22 @@ Because the data was transposed the original inference stored all columns as obj
 The following functions are available for one dimensional object arrays or scalars to perform
 hard conversion of objects to a specified type:
 
-- :meth:`~pandas.to_numeric` (conversion to numeric dtypes)
+* :meth:`~pandas.to_numeric` (conversion to numeric dtypes)
 
   .. ipython:: python
 
      m = ['1.1', 2, 3]
      pd.to_numeric(m)
 
-- :meth:`~pandas.to_datetime` (conversion to datetime objects)
+* :meth:`~pandas.to_datetime` (conversion to datetime objects)
 
   .. ipython:: python
 
      import datetime
      m = ['2016-07-09', datetime.datetime(2016, 3, 2)]
      pd.to_datetime(m)
 
-- :meth:`~pandas.to_timedelta` (conversion to timedelta objects)
+* :meth:`~pandas.to_timedelta` (conversion to timedelta objects)
 
   .. ipython:: python
 

diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -542,11 +542,11 @@ Comparisons
 
 Comparing categorical data with other objects is possible in three cases:
 
- * Comparing equality (``==`` and ``!=``) to a list-like object (list, Series, array,
-   ...) of the same length as the categorical data.
- * All comparisons (``==``, ``!=``, ``>``, ``>=``, ``<``, and ``<=``) of categorical data to
-   another categorical Series, when ``ordered==True`` and the `categories` are the same.
- * All comparisons of a categorical data to a scalar.
+* Comparing equality (``==`` and ``!=``) to a list-like object (list, Series, array,
+  ...) of the same length as the categorical data.
+* All comparisons (``==``, ``!=``, ``>``, ``>=``, ``<``, and ``<=``) of categorical data to
+  another categorical Series, when ``ordered==True`` and the `categories` are the same.
+* All comparisons of a categorical data to a scalar.
 
 All other comparisons, especially "non-equality" comparisons of two categoricals with different
 categories or a categorical with any list-like object, will raise a ``TypeError``.

diff --git a/doc/source/comparison_with_r.rst b/doc/source/comparison_with_r.rst
@@ -18,11 +18,11 @@ was started to provide a more detailed look at the `R language
 party libraries as they relate to ``pandas``. In comparisons with R and CRAN
 libraries, we care about the following things:
 
-  - **Functionality / flexibility**: what can/cannot be done with each tool
-  - **Performance**: how fast are operations. Hard numbers/benchmarks are
-    preferable
-  - **Ease-of-use**: Is one tool easier/harder to use (you may have to be
-    the judge of this, given side-by-side code comparisons)
+* **Functionality / flexibility**: what can/cannot be done with each tool
+* **Performance**: how fast are operations. Hard numbers/benchmarks are
+  preferable
+* **Ease-of-use**: Is one tool easier/harder to use (you may have to be
+  the judge of this, given side-by-side code comparisons)
 
 This page is also here to offer a bit of a translation guide for users of these
 R packages.

diff --git a/doc/source/computation.rst b/doc/source/computation.rst
@@ -344,20 +344,20 @@ The weights used in the window are specified by the ``win_type`` keyword.
 The list of recognized types are the `scipy.signal window functions
 <https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions>`__:
 
-- ``boxcar``
-- ``triang``
-- ``blackman``
-- ``hamming``
-- ``bartlett``
-- ``parzen``
-- ``bohman``
-- ``blackmanharris``
-- ``nuttall``
-- ``barthann``
-- ``kaiser`` (needs beta)
-- ``gaussian`` (needs std)
-- ``general_gaussian`` (needs power, width)
-- ``slepian`` (needs width).
+* ``boxcar``
+* ``triang``
+* ``blackman``
+* ``hamming``
+* ``bartlett``
+* ``parzen``
+* ``bohman``
+* ``blackmanharris``
+* ``nuttall``
+* ``barthann``
+* ``kaiser`` (needs beta)
+* ``gaussian`` (needs std)
+* ``general_gaussian`` (needs power, width)
+* ``slepian`` (needs width).
 
 .. ipython:: python
 
@@ -537,10 +537,10 @@ Binary Window Functions
 two ``Series`` or any combination of ``DataFrame/Series`` or
 ``DataFrame/DataFrame``. Here is the behavior in each case:
 
-- two ``Series``: compute the statistic for the pairing.
-- ``DataFrame/Series``: compute the statistics for each column of the DataFrame
+* two ``Series``: compute the statistic for the pairing.
+* ``DataFrame/Series``: compute the statistics for each column of the DataFrame
   with the passed Series, thus returning a DataFrame.
-- ``DataFrame/DataFrame``: by default compute the statistic for matching column
+* ``DataFrame/DataFrame``: by default compute the statistic for matching column
   names, returning a DataFrame. If the keyword argument ``pairwise=True`` is
   passed then computes the statistic for each pair of columns, returning a
   ``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see :ref:`the next section
@@ -741,10 +741,10 @@ Aside from not having a ``window`` parameter, these functions have the same
 interfaces as their ``.rolling`` counterparts. Like above, the parameters they
 all accept are:
 
-- ``min_periods``: threshold of non-null data points to require. Defaults to
+* ``min_periods``: threshold of non-null data points to require. Defaults to
   minimum needed to compute statistic. No ``NaNs`` will be output once
   ``min_periods`` non-null data points have been seen.
-- ``center``: boolean, whether to set the labels at the center (default is False).
+* ``center``: boolean, whether to set the labels at the center (default is False).
 
 .. _stats.moments.expanding.note:
 .. note::
@@ -903,12 +903,12 @@ of an EW moment:
 One must specify precisely one of **span**, **center of mass**, **half-life**
 and **alpha** to the EW functions:
 
-- **Span** corresponds to what is commonly called an "N-day EW moving average".
-- **Center of mass** has a more physical interpretation and can be thought of
+* **Span** corresponds to what is commonly called an "N-day EW moving average".
+* **Center of mass** has a more physical interpretation and can be thought of
   in terms of span: :math:`c = (s - 1) / 2`.
-- **Half-life** is the period of time for the exponential weight to reduce to
+* **Half-life** is the period of time for the exponential weight to reduce to
   one half.
-- **Alpha** specifies the smoothing factor directly.
+* **Alpha** specifies the smoothing factor directly.
 
 Here is an example for a univariate time series:
 

diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst
@@ -138,11 +138,11 @@ steps; you only need to install the compiler.
 
 For Windows developers, the following links may be helpful.
 
-- https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/
-- https://github.com/conda/conda-recipes/wiki/Building-from-Source-on-Windows-32-bit-and-64-bit
-- https://cowboyprogrammer.org/building-python-wheels-for-windows/
-- https://blog.ionelmc.ro/2014/12/21/compiling-python-extensions-on-windows/
-- https://support.enthought.com/hc/en-us/articles/204469260-Building-Python-extensions-with-Canopy
+* https://blogs.msdn.microsoft.com/pythonengineering/2016/04/11/unable-to-find-vcvarsall-bat/
+* https://github.com/conda/conda-recipes/wiki/Building-from-Source-on-Windows-32-bit-and-64-bit
+* https://cowboyprogrammer.org/building-python-wheels-for-windows/
+* https://blog.ionelmc.ro/2014/12/21/compiling-python-extensions-on-windows/
+* https://support.enthought.com/hc/en-us/articles/204469260-Building-Python-extensions-with-Canopy
 
 Let us know if you have any difficulties by opening an issue or reaching out on
 `Gitter`_.
@@ -155,11 +155,11 @@ Creating a Python Environment
 Now that you have a C compiler, create an isolated pandas development
 environment:
 
-- Install either `Anaconda <https://www.anaconda.com/download/>`_ or `miniconda
+* Install either `Anaconda <https://www.anaconda.com/download/>`_ or `miniconda
   <https://conda.io/miniconda.html>`_
-- Make sure your conda is up to date (``conda update conda``)
-- Make sure that you have :ref:`cloned the repository <contributing.forking>`
-- ``cd`` to the *pandas* source directory
+* Make sure your conda is up to date (``conda update conda``)
+* Make sure that you have :ref:`cloned the repository <contributing.forking>`
+* ``cd`` to the *pandas* source directory
 
 We'll now kick off a three-step process:
 
@@ -286,15 +286,15 @@ complex changes to the documentation as well.
 
 Some other important things to know about the docs:
 
-- The *pandas* documentation consists of two parts: the docstrings in the code
+* The *pandas* documentation consists of two parts: the docstrings in the code
   itself and the docs in this folder ``pandas/doc/``.
 
   The docstrings provide a clear explanation of the usage of the individual
   functions, while the documentation in this folder consists of tutorial-like
   overviews per topic together with some other information (what's new,
   installation, etc).
 
-- The docstrings follow a pandas convention, based on the **Numpy Docstring
+* The docstrings follow a pandas convention, based on the **Numpy Docstring
   Standard**. Follow the :ref:`pandas docstring guide <docstring>` for detailed
   instructions on how to write a correct docstring.
 
@@ -303,7 +303,7 @@ Some other important things to know about the docs:
 
      contributing_docstring.rst
 
-- The tutorials make heavy use of the `ipython directive
+* The tutorials make heavy use of the `ipython directive
   <http://matplotlib.org/sampledoc/ipython_directive.html>`_ sphinx extension.
   This directive lets you put code in the documentation which will be run
   during the doc build. For example::
@@ -324,7 +324,7 @@ Some other important things to know about the docs:
   doc build. This approach means that code examples will always be up to date,
   but it does make the doc building a bit more complex.
 
-- Our API documentation in ``doc/source/api.rst`` houses the auto-generated
+* Our API documentation in ``doc/source/api.rst`` houses the auto-generated
   documentation from the docstrings. For classes, there are a few subtleties
   around controlling which methods and attributes have pages auto-generated.
 
@@ -488,8 +488,8 @@ standard. Google provides an open source style checker called ``cpplint``, but w
 use a fork of it that can be found `here <https://github.com/cpplint/cpplint>`__.
 Here are *some* of the more common ``cpplint`` issues:
 
-  - we restrict line-length to 80 characters to promote readability
-  - every header file must include a header guard to avoid name collisions if re-included
+* we restrict line-length to 80 characters to promote readability
+* every header file must include a header guard to avoid name collisions if re-included
 
 :ref:`Continuous Integration <contributing.ci>` will run the
 `cpplint <https://pypi.org/project/cpplint>`_ tool
@@ -536,8 +536,8 @@ Python (PEP8)
 There are several tools to ensure you abide by this standard. Here are *some* of
 the more common ``PEP8`` issues:
 
-  - we restrict line-length to 79 characters to promote readability
-  - passing arguments should have spaces after commas, e.g. ``foo(arg1, arg2, kw1='bar')``
+* we restrict line-length to 79 characters to promote readability
+* passing arguments should have spaces after commas, e.g. ``foo(arg1, arg2, kw1='bar')``
 
 :ref:`Continuous Integration <contributing.ci>` will run
 the `flake8 <https://pypi.org/project/flake8>`_ tool
@@ -715,14 +715,14 @@ Using ``pytest``
 
 Here is an example of a self-contained set of tests that illustrate multiple features that we like to use.
 
-- functional style: tests are like ``test_*`` and *only* take arguments that are either fixtures or parameters
-- ``pytest.mark`` can be used to set metadata on test functions, e.g. ``skip`` or ``xfail``.
-- using ``parametrize``: allow testing of multiple cases
-- to set a mark on a parameter, ``pytest.param(..., marks=...)`` syntax should be used
-- ``fixture``, code for object construction, on a per-test basis
-- using bare ``assert`` for scalars and truth-testing
-- ``tm.assert_series_equal`` (and its counter part ``tm.assert_frame_equal``), for pandas object comparisons.
-- the typical pattern of constructing an ``expected`` and comparing versus the ``result``
+* functional style: tests are like ``test_*`` and *only* take arguments that are either fixtures or parameters
+* ``pytest.mark`` can be used to set metadata on test functions, e.g. ``skip`` or ``xfail``.
+* using ``parametrize``: allow testing of multiple cases
+* to set a mark on a parameter, ``pytest.param(..., marks=...)`` syntax should be used
+* ``fixture``, code for object construction, on a per-test basis
+* using bare ``assert`` for scalars and truth-testing
+* ``tm.assert_series_equal`` (and its counter part ``tm.assert_frame_equal``), for pandas object comparisons.
+* the typical pattern of constructing an ``expected`` and comparing versus the ``result``
 
 We would name this file ``test_cool_feature.py`` and put in an appropriate place in the ``pandas/tests/`` structure.
 
@@ -969,21 +969,21 @@ Finally, commit your changes to your local repository with an explanatory messag
 uses a convention for commit message prefixes and layout.  Here are
 some common prefixes along with general guidelines for when to use them:
 
-    * ENH: Enhancement, new functionality
-    * BUG: Bug fix
-    * DOC: Additions/updates to documentation
-    * TST: Additions/updates to tests
-    * BLD: Updates to the build process/scripts
-    * PERF: Performance improvement
-    * CLN: Code cleanup
+* ENH: Enhancement, new functionality
+* BUG: Bug fix
+* DOC: Additions/updates to documentation
+* TST: Additions/updates to tests
+* BLD: Updates to the build process/scripts
+* PERF: Performance improvement
+* CLN: Code cleanup
 
 The following defines how a commit message should be structured.  Please reference the
 relevant GitHub issues in your commit message using GH1234 or #1234.  Either style
 is fine, but the former is generally preferred:
 
-    * a subject line with `< 80` chars.
-    * One blank line.
-    * Optionally, a commit message body.
+* a subject line with `< 80` chars.
+* One blank line.
+* Optionally, a commit message body.
 
 Now you can commit your changes in your local repository::