-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"invalid dtype determination in get_concat_dtype" when concating dfs with certain columns #20597
Comments
you are fighting pandas here - i suppose this could be supported but its not efficient in the least, nor very useful in terms of indexing you would very likely need a custom index type to have an real support here - quite a major effort - if you wanted to contribute this great |
cc @toobaz |
Oh, so you are aware of the problem? So could you explain me a bit more about why it fails, please? I can assure you that i'm not fighting pandas on purpose. ;) I in case you were wondering, what I actually do is to convert a tree structures to a pandas DataFrame. One line representing one tree. (The trees are very similar but not always identical in structure). So those tuples (columns) give the path through the tree. The data is given by the leafs. |
why are you not using a MultiIndex? |
Originally i wanted to convert it to a MultiIndex after concatenating, but sure, that would be an acceptable workaround. Though, for my case it also failed with the same error when concatenating. (Not for the example above) |
then show an example using MI that fails |
Weird... my testcase must have been flawed... I can't reproduce it anymore. So thanks a lot for the help. Still, if you had the patience to explain I would be really interested in what is going wrong in the example above. |
this actually breaks in a different place in master. cc @TomAugspurger
I didn't think a |
#20757 might be what caused my observation that this issue also occured when using MultiIndex |
Is this a blocker for 0.23? |
no - it’s pretty unusual |
Code Sample, a copy-pastable example if possible
There might be a simpler minimal example, but I was already really struggeling to identify this problem and to find this example. The problem seems to be related to strings reappearing in different positions of the tuples, different length tuples and unequal sets of
columns
.(btw. I'm aware of MultiIndex, I would like to convert the Index to MultiIndex after the concatenation)
Problem description
This yields
Expected Output
Something similar to
Output of
pd.show_versions()
(same result with pandas=0.17.1)
pandas: 0.23.0.dev0+38.g6552718
pytest: 2.8.7
pip: 9.0.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.3.6
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: