-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread with "text" argument is very slow #4919
Comments
Thanks for the report, I can reproduce on Windows. Here is a self-contained reprex: library(data.table)
tmp = tempfile()
fwrite(data.table(id = 1:10000), tmp)
string = readLines(tmp)
microbenchmark::microbenchmark(fread(tmp), fread(text = string), times = 1)
#> Unit: milliseconds
#> expr min lq mean median uq
#> fread(tmp) 2.495301 2.495301 2.495301 2.495301 2.495301
#> fread(text = string) 217.293502 217.293502 217.293502 217.293502 217.293502 Here is the relevant source: Lines 37 to 39 in 3fa8b20
#4572 is slightly relevant. #4805 would largely close this issue as proposed implementation is much faster although it would still be slower than reading directly from the .csv. Could you expand on the use case? microbenchmark::microbenchmark(old = cat(string, file=(tmpFile<-tempfile(tmpdir=tempdir())), sep="\n"),
new = writeLines(string, (tmpFile<-tempfile(tmpdir=tempdir()))), times = 10)
## Unit: milliseconds
## expr min lq mean median uq max neval
## old 173.6504 181.8860 194.84379 187.7294 201.8676 246.2789 10
## new 19.4778 30.4866 30.32118 31.4903 33.5262 35.5296 10 |
Thanks for the reply. |
The example passes a greater-than-one length character vector to library(data.table)
fread("
id var
1 a") To accept a character vector of greater-than-one without writing to disk, significant changes would be needed. AFAIU I am not super familiar with |
my workaround is use paste (or stri_c) to collapse the char vector into one string:
|
I haven't found this issue reported.
To reproduce:
Here are my results (confirmed on different files and macOS + Linux), with a ~150MB file.
Expected results : the second command should be at least as fast as the first.
on macOS 11.2, I have noticed that the slower command involves some write activity to disk. I haven't checked this on Linux (this was on a remote server). Anyhow, both commands take 100% of a CPU.
The text was updated successfully, but these errors were encountered: