-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Unexpected behaviour when reading large text files with mixed datatypes #3866
Comments
So the way parsing works (when you don't specify a specifc dtyp) is that on a particular column you so the end result is the correct dtype. you essentially want downcasting back to strings for object dtype; easy enough, specify object as the dtype for this column. If you want this automatic I think we'd have to provide an option to do it, because that would be inefficient from a parsing speed as you have to copy the column for every dtype you try can you explain why this actually matters? |
I'm not sure I understand. Why aren't there 500K integers and 500K+2 This matters because if you try and aggregate using the object type column On Wed, Jun 12, 2013 at 6:14 PM, jreback [email protected] wrote:
|
@wesm pls take a look so the int conversion stops at 262144, which is exactly 2**16 * 4...weird must be something odd going on |
I can repro, but fix is eluding me :) |
closes pandas-dev#3866 Silently fix problem rather than warning if we can coerce to numerical type.
@jreback Did this issue get fixed? This is a very common reason for bugs in code written by my developers. Theres a reason for that... pandas is doing something unexpected. If datatype inference fails, it should not fail silently and produce a mixed datatype column, it should fail with an exception. |
read_csv gives unexpected behaviour with large files if a column contains both strings and integers. eg
It seems some of the integers are parsed as integers and others as strings.
The text was updated successfully, but these errors were encountered: