pandas-dev · jreback · Aug 19, 2014 · Jul 16, 2014 · jorisvandenbossche · Aug 12, 2014
diff --git a/doc/source/10min.rst b/doc/source/10min.rst
@@ -66,7 +66,8 @@ Creating a ``DataFrame`` by passing a dict of objects that can be converted to s
                         'B' : pd.Timestamp('20130102'),
                         'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                         'D' : np.array([3] * 4,dtype='int32'),
-                        'E' : 'foo' })
+                        'E' : pd.Categorical(["test","train","test","train"]),
+                        'F' : 'foo' })
    df2
 
 Having specific :ref:`dtypes <basics.dtypes>`
@@ -635,6 +636,32 @@ the quarter end:
    ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
    ts.head()
 
+Categoricals
+------------
+
+Since version 0.15, pandas can include categorical data in a ``DataFrame``. For full docs, see the
+:ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>` .
+
+.. ipython:: python
+
+    df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
+
+    # convert the raw grades to a categorical
+    df["grade"] = pd.Categorical(df["raw_grade"])
+
+    # Alternative: df["grade"] = df["raw_grade"].astype("category")
+    df["grade"]
+
+    # Rename the levels
+    df["grade"].cat.levels = ["very good", "good", "very bad"]
+
+    # Reorder the levels and simultaneously add the missing levels
+    df["grade"].cat.reorder_levels(["very bad", "bad", "medium", "good", "very good"])
+    df["grade"]
+    df.sort("grade")
+    df.groupby("grade").size()
+
+
 
 Plotting
 --------

diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -521,51 +521,33 @@ Categorical
 .. currentmodule:: pandas.core.categorical
 
 If the Series is of dtype ``category``, ``Series.cat`` can be used to access the the underlying
-``Categorical``. This data type is similar to the otherwise underlying numpy array
-and has the following usable methods and properties (all available as
-``Series.cat.<method_or_property>``).
-
+``Categorical``. This accessor is similar to the ``Series.dt`` or ``Series.str``and has the
+following usable methods and properties (all available as ``Series.cat.<method_or_property>``).
 
 .. autosummary::
    :toctree: generated/
 
-   Categorical
-   Categorical.from_codes
    Categorical.levels
    Categorical.ordered
    Categorical.reorder_levels
    Categorical.remove_unused_levels
-   Categorical.min
-   Categorical.max
-   Categorical.mode
-   Categorical.describe
 
-``np.asarray(categorical)`` works by implementing the array interface. Be aware, that this converts
-the Categorical back to a numpy array, so levels and order information is not preserved!
+The following methods are considered API when using ``Categorical`` directly:
 
 .. autosummary::
    :toctree: generated/
 
-   Categorical.__array__
+   Categorical
+   Categorical.from_codes
+   Categorical.codes
 
-To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
-are also introduced.
+``np.asarray(categorical)`` works by implementing the array interface. Be aware, that this converts
+the Categorical back to a numpy array, so levels and order information is not preserved!
 
 .. autosummary::
    :toctree: generated/
 
-   Categorical.from_array
-   Categorical.get_values
-   Categorical.copy
-   Categorical.dtype
-   Categorical.ndim
-   Categorical.sort
-   Categorical.equals
-   Categorical.unique
-   Categorical.order
-   Categorical.argsort
-   Categorical.fillna
-
+   Categorical.__array__
 
 Plotting
 ~~~~~~~~

diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -90,6 +90,7 @@ By using some special functions:
     df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, labels=labels)
     df.head(10)
 
+See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`.
 
 `Categoricals` have a specific ``category`` :ref:`dtype <basics.dtypes>`:
 
@@ -331,6 +332,57 @@ Operations
 
 The following operations are possible with categorical data:
 
+Comparing `Categoricals` with other objects is possible in two cases:
+
+ * comparing a `Categorical` to another `Categorical`, when `level` and `ordered` is the same or
+ * comparing a `Categorical` to a scalar.
+
+All other comparisons will raise a TypeError.
+
+.. ipython:: python
+
+    cat = pd.Series(pd.Categorical([1,2,3], levels=[3,2,1]))
+    cat_base = pd.Series(pd.Categorical([2,2,2], levels=[3,2,1]))
+    cat_base2 = pd.Series(pd.Categorical([2,2,2]))
+
+    cat
+    cat_base
+    cat_base2
+
+Comparing to a categorical with the same levels and ordering or to a scalar works:
+
+.. ipython:: python
+
+    cat > cat_base
+    cat > 2
+
+This doesn't work because the levels are not the same:
+
+.. ipython:: python
+
+    try:
+        cat > cat_base2
+    except TypeError as e:
+         print("TypeError: " + str(e))
+
+.. note::
+
+    Comparisons with `Series`, `np.array` or a `Categorical` with different levels or ordering
+    will raise an `TypeError` because custom level ordering would result in two valid results:
+    one with taking in account the ordering and one without. If you want to compare a `Categorical`
+    with such a type, you need to be explicit and convert the `Categorical` to values:
+
+.. ipython:: python
+
+    base = np.array([1,2,3])
+
+    try:
+        cat > base
+    except TypeError as e:
+         print("TypeError: " + str(e))
+
+    np.asarray(cat) > base
+
 Getting the minimum and maximum, if the categorical is ordered:
 
 .. ipython:: python
@@ -489,34 +541,38 @@ but the levels of these `Categoricals` need to be the same:
 
 .. ipython:: python
 
-        cat = pd.Categorical(["a","b"], levels=["a","b"])
-        vals = [1,2]
-        df = pd.DataFrame({"cats":cat, "vals":vals})
-        res = pd.concat([df,df])
-        res
-        res.dtypes
+    cat = pd.Categorical(["a","b"], levels=["a","b"])
+    vals = [1,2]
+    df = pd.DataFrame({"cats":cat, "vals":vals})
+    res = pd.concat([df,df])
+    res
+    res.dtypes
 
-        df_different = df.copy()
-        df_different["cats"].cat.levels = ["a","b","c"]
+In this case the levels are not the same and so an error is raised:
 
-        try:
-            pd.concat([df,df])
-        except ValueError as e:
-            print("ValueError: " + str(e))
+.. ipython:: python
+
+    df_different = df.copy()
+    df_different["cats"].cat.levels = ["a","b","c"]
+    try:
+        pd.concat([df,df_different])
+    except ValueError as e:
+        print("ValueError: " + str(e))
 
 The same applies to ``df.append(df)``.
 
 Getting Data In/Out
 -------------------
 
-Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently raise ``NotImplementedError``.
+Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently
+raise ``NotImplementedError``.
 
 Writing to a CSV file will convert the data, effectively removing any information about the
 `Categorical` (levels and ordering). So if you read back the CSV file you have to convert the
 relevant columns back to `category` and assign the right levels and level ordering.
 
 .. ipython:: python
-   :suppress:
+    :suppress:
 
     from pandas.compat import StringIO
 
@@ -548,7 +604,7 @@ default not included in computations. See the :ref:`Missing Data section
 <missing_data>`
 
 There are two ways a `np.nan` can be represented in `Categorical`: either the value is not
-available or `np.nan` is a valid level.
+available ("missing value") or `np.nan` is a valid level.
 
 .. ipython:: python
 
@@ -560,9 +616,25 @@ available or `np.nan` is a valid level.
     s2.cat.levels = [1,2,np.nan]
     s2
     # three levels, np.nan included
-    # Note: as int arrays can't hold NaN the levels were converted to float
+    # Note: as int arrays can't hold NaN the levels were converted to object
     s2.cat.levels
 
+.. note::
+    Missing value methods like ``isnull`` and ``fillna`` will take both missing values as well as
+    `np.nan` levels into account:
+
+.. ipython:: python
+
+    c = pd.Categorical(["a","b",np.nan])
+    c.levels = ["a","b",np.nan]
+    # will be inserted as a NA level:
+    c[0] = np.nan
+    s = pd.Series(c)
+    s
+    pd.isnull(s)
+    s.fillna("a")
+
+
 Gotchas
 -------
 
@@ -579,15 +651,18 @@ object and not as a low level `numpy` array dtype. This leads to some problems.
     try:
         np.dtype("category")
     except TypeError as e:
-         print("TypeError: " + str(e))
+        print("TypeError: " + str(e))
 
     dtype = pd.Categorical(["a"]).dtype
     try:
         np.dtype(dtype)
     except TypeError as e:
          print("TypeError: " + str(e))
 
-    # dtype comparisons work:
+Dtype comparisons work:
+
+.. ipython:: python
+
     dtype == np.str_
     np.str_ == dtype
 

diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst
@@ -505,3 +505,10 @@ handling of NaN:
 
    pd.factorize(x, sort=True)
    np.unique(x, return_inverse=True)[::-1]
+
+.. note::
+    If you just want to handle one column as a categorical variable (like R's factor),
+    you can use  ``df["cat_col"] = pd.Categorical(df["col"])`` or
+    ``df["cat_col"] = df["col"].astype("category")``. For full docs on :class:`~pandas.Categorical`,
+    see the :ref:`Categorical introduction <categorical>` and the
+    :ref:`API documentation <api.categorical>`. This feature was introduced in version 0.15.
diff --git a/doc/source/v0.15.0.txt b/doc/source/v0.15.0.txt
@@ -283,9 +283,10 @@ Categoricals in Series/DataFrame
 
 :class:`~pandas.Categorical` can now be included in `Series` and `DataFrames` and gained new
 methods to manipulate. Thanks to Jan Schultz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`,
-:issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`).
+:issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`, :issue:`7768`, :issue:`8006`, :issue:`3678`).
 
-For full docs, see the :ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>`.
+For full docs, see the :ref:`Categorical introduction <categorical>` and the
+:ref:`API documentation <api.categorical>`.
 
 .. ipython:: python