-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenation of category value counts mixes up the index order #14600
Comments
The underlying reason is possibly that
|
Hello!
where _outer_indexer() expected ndarray and not categorical. That caused TypeError. But after I wrote tests another problem was discovered.
This causes IndexError: list index out of range. So based on which should union() work: on indices or on underlying categories? Or they both should be joined, taking into account that the list of categories may be wider than the list of indices ? Or maybe it worse to sort indices firstly and than pass it to union()? |
@nathalier thanks for having a look! yes both of these things appear wrong. The general principal that we try to follow is that Categoricals (or CategoricalIndex) can be combined if they are dtype equal (is_dtype_equal is True). IOW, there categories and ordered flags match. Then they should be combined and stay categoricals. Otherwise we still allow combinations, BUT coerce to object, then follow union / diff logic. So happy to have more tests / fixes. |
I start to work with this issue. |
This bug was fixed in version 0.20.1. Using the current development version, this code:
returns the correct result:
And the nathalier’s code runs without error too:
The result is:
Please, let me know if I can help with anything else! |
@jcontesti Thanks for checking this! |
I go on with this one. |
Hi, I have the tests prepared to commit, but now the bug strikes back again :-( This code:
returns:
instead of:
Version 0.23.4 executes it right, but v0.24.0rc1 and development version fail. Remember that this bug was solved since version 0.20.1. I can help with the solution, but I need some help to know how to proceed because of my little knowledge of the internals of this project. Thank you! |
@jcontesti can you open a new issue for that constructor bug? |
@TomAugspurger Could it be already opened in #24845? It's a very similar bug. Let me know if you want me to add a new issue anyway. |
I tried to investigate this issue. I agree with previous findings, that problem is in union. But what is expected behavior of union in general. I tried to analyze it on normal Index, not Categorical. And I see, that results are different based on
What is correct? |
Let's say we have two
Series
s1 and s2, which can be the output of thepd.value_counts()
function, and we want to combine them into oneDataFrame
The result is
where the order of categories in the first row is changed.
And the current workaround is
which gives the correct result
Output of
pd.show_versions()
pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: