-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
big_fread1 function and the possibility to parallelize it #11
Comments
What are the size of the file, the number of lines and number of columns? |
File size : 6 Go |
If you have 81M lines and only 4 columns, you should use a much larger every_nline, maybe up to 1M. |
Yes, I will try that. In fact, for physical conditions, I have to do my calcultion en every 200 lines, that's why I chose every_line=200. So I modified my script to do the calculation on every 200 lines in every chunk of 1e+6 lines as bellow: my_results <- big_fread1( csv
) |
The spliting is faster but I have still the same problem of time. In fact it is my calculation which takes a long time. Would it be possible to parallelize the work pertained to every chunk of lines ? |
Yes, it should be possible. |
Thank you for the link. I will see that and try to do something. |
Any update on this? |
Hello,
I have a very huge file to process and a small RAM memory. So I decided to use the big_fread1 function. I use a window of 200 lines and on every window I did the calculation I want to do on only three columns of my dataframe. The script below works fine on a file with a not hyge size but on my actual file it lasts many, many hours. May you please tell me if it would be possible to parallelize easily this script ? I can do the calculation independendly in every window and after I could sort the result on the time variable to put the result together in the good order.
Thank you
Best Regards
Laurent
The text was updated successfully, but these errors were encountered: