-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical fixups #7768
Categorical fixups #7768
Changes from all commits
c2f490e
90a81df
f4bf9ee
130b61e
65d9d6e
5c4f1bd
0438a30
19f4d46
47953a2
2958ce1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,6 +90,7 @@ By using some special functions: | |
df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, labels=labels) | ||
df.head(10) | ||
|
||
See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`. | ||
|
||
`Categoricals` have a specific ``category`` :ref:`dtype <basics.dtypes>`: | ||
|
||
|
@@ -331,6 +332,45 @@ Operations | |
|
||
The following operations are possible with categorical data: | ||
|
||
Comparing `Categoricals` with other objects is possible in two cases: | ||
* comparing a `Categorical` to another `Categorical`, when `level` and `ordered` is the same or | ||
* comparing a `Categorical` to a scalar. | ||
All other comparisons will raise a TypeError. | ||
|
||
.. ipython:: python | ||
|
||
cat = pd.Series(pd.Categorical([1,2,3], levels=[3,2,1])) | ||
cat_base = pd.Series(pd.Categorical([2,2,2], levels=[3,2,1])) | ||
cat_base2 = pd.Series(pd.Categorical([2,2,2])) | ||
|
||
cat > cat_base | ||
|
||
# This doesn't work because the levels are not the same | ||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. their is a way to do this in the docs (showing an exception); can also do a code block There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't found a way to do that. Just letting the exception happen results in long stacktraces and I don't like codeblocks, where the exception message has to be manually inserted (and maintained). Maybe that would be a nice PR for the ipython directive.... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep that's fine (I bet their is a way with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked at the sphinx extension source and don't think there is a way without modifying it. `:okexcept:' basically only prevents sphinx to write the exception to stdout. A There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe can create a small function and put in utils for this purpose (basically what u r doing) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. something like
|
||
cat > cat_base2 | ||
except TypeError as e: | ||
print("TypeError: " + str(e)) | ||
|
||
cat > 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put a comment above (eg comparison vs scalar) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
.. note:: | ||
|
||
Comparisons with `Series`, `np.array` or a `Categorical` with different levels or ordering | ||
will raise an `TypeError` because custom level ordering would result in two valid results: | ||
one with taking in account the ordering and one without. If you want to compare a `Categorical` | ||
with such a type, you need to be explicit and convert the `Categorical` to values: | ||
|
||
.. ipython:: python | ||
|
||
base = np.array([1,2,3]) | ||
|
||
try: | ||
cat > base | ||
except TypeError as e: | ||
print("TypeError: " + str(e)) | ||
|
||
np.asarray(cat) > base | ||
|
||
Getting the minimum and maximum, if the categorical is ordered: | ||
|
||
.. ipython:: python | ||
|
@@ -509,7 +549,8 @@ The same applies to ``df.append(df)``. | |
Getting Data In/Out | ||
------------------- | ||
|
||
Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently raise ``NotImplementedError``. | ||
Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently | ||
raise ``NotImplementedError``. | ||
|
||
Writing to a CSV file will convert the data, effectively removing any information about the | ||
`Categorical` (levels and ordering). So if you read back the CSV file you have to convert the | ||
|
@@ -579,7 +620,7 @@ object and not as a low level `numpy` array dtype. This leads to some problems. | |
try: | ||
np.dtype("category") | ||
except TypeError as e: | ||
print("TypeError: " + str(e)) | ||
print("TypeError: " + str(e)) | ||
|
||
dtype = pd.Categorical(["a"]).dtype | ||
try: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show the cats after they are created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done