-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.table_1.9.4 fread failing on field with newline in quoted field. #1201
Comments
It would be good to check it using 1.9.5. |
Thanks for the quick response. I'm having issues compiling 1.9.5... not sure what I'm doing wrong. (error at bottom of post). When I extract extracted several examples of the issue into a test file (along with the rows before and after I found that 1.9.4 was able to read them in successfully. But when I try to process the entire file those same rows give me errors. the zipped file can be found here: ERROR WHEN TRYING TO BUILD 1.9.5
|
Just to add 1.9.5 status of fread on your source file. > dt <- fread("redata-pa2-StormData.csv", verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.523066 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 37 columns. Longest stretch was from line 1 to line 30
Starting data input on line 1 (either column names or first row of data). First 10 characters: "STATE__",
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 1307675 (including 1 at the end)
Count of sep: 34819802
nrow = MIN( nsep [34819802] / ncol [37] -1, neol [1307675] - nblank [1] ) = 967216
Type codes ( first 5 rows): 3444344430000303003343333430000333303
Type codes (+ middle 5 rows): 3444344434444303443343333434440333343
Type codes (+ last 5 rows): 3444344434444303443343333434444333343
Type codes: 3444344434444303443343333434444333343 (after applying colClasses and integer64)
Type codes: 3444344434444303443343333434444333343 (after applying drop or select (if supplied)
Allocating 37 column slots (37 - 0 dropped)
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:13
0.090s ( 1%) Memory map (rerun may be quicker)
0.000s ( 0%) sep and header detection
1.975s ( 15%) Count rows (wc -l)
0.001s ( 0%) Column type detection (first, middle and last 5 rows)
1.833s ( 14%) Allocation of 902297x37 result (xMB) in RAM
9.068s ( 70%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.044s ( 0%) Changing na.strings to NA
13.011s Total
Warning message:
In fread("redata-pa2-StormData.csv", verbose = TRUE) :
Read less rows (902297) than were allocated (967216). Run again with verbose=TRUE and please report. According to warning the data were partially loaded: > str(dt)
Classes ‘data.table’ and 'data.frame': 902297 obs. of 37 variables:
$ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
$ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
$ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
... |
when I'm using fread on a large csv file that has a newline in a quoted column I get the following error. I see other similar issues have been fixed but am still having the same problem.
Error in fread(input = unzFileName, sep = ",", header = TRUE, select = c("EVTYPE", :
Expected sep (',') but new line or EOF ends field 36 on line 249453 when reading data: 1.00,12/18/1996 0:00:00,"02:00:00 PM","CST",28.00,"ALZ028>029 - 035>038 - 040>049","AL","WINTER STORM",0.00,,,12/18/1996 0:00:00,"07:00:00 PM",0.00,,0.00,,,0.00,0.00,,0.00,0.00,0.00,240.00,"K",320.00,"K","BMX","ALABAMA, Central","CLAY - CLAY - RANDOLPH - CHILTON - COOSA - TALLAPOOSA - CHAMBERS - DALLAS - AUTAUGA - LOWNDES - ELMORE - MONTGOMERY - MACON - BULLOCK - LEE - RUSSELL - PIKE",0.00,0.00,0.00,0.00,"A snow storm that began in the early afternoon hours across the central sections of the state dumped 1 to 3 inches of snow on parts of the state. It was over by early evening. Schools and businesses let out early on the 18th across much of the area affected. A few roads became slick but there were no major travel problems reported. The snow remained on the ground in some areas for about two days. Here is a list of snowfall totals by county:
Autauga 2-3"" Bullock 1"" C
The text was updated successfully, but these errors were encountered: