-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage of read_csv_chunked()
in conjunction with a gzip compressed file
#1200
Comments
This is likely a duplicate of #1161, it is fixed in the devel version of readr but not yet on CRAN. |
@jimhester On the development version, COVID19.BR_Municipality <- read_delim(
"https://github.com/wcota/covid19br/raw/master/cases-brazil-cities-time.csv.gz",
delim = ",",
col_types = cols(
epi_week = col_integer(),
date = col_date(format = "%Y-%m-%d"),
country = col_character(),
state = col_character(),
city = col_character(),
ibgeID = col_character(),
cod_RegiaoDeSaude = col_character(),
name_RegiaoDeSaude = col_character(),
newDeaths = col_integer(),
deaths = col_integer(),
newCases = col_integer(),
totalCases = col_integer(),
deaths_per_100k_inhabitants = col_double(),
totalCases_per_100k_inhabitants = col_double(),
deaths_by_totalCases = col_double(),
`_source` = col_character(),
last_info_date = col_date(format = "%Y-%m-%d")
),
lazy = FALSE
) |
@hsbadr, that issue is tracked by tidyverse/vroom#331 and should be fixed. |
Thanks @jimhester! I confirm that tidyverse/vroom@5fc54e6 fixed the problem. I'll let you know if I run into a related issue. |
Recently I have been running into
Error: vector memory exhausted (limit reached?)
errors when reading large gzip compressed .csv files using the chunked API. IIRC, earlier versions of readr would explicitly create a temporary file, containing the full uncompressed data, which then was fed intoread_csv_chunked()
.Looking at reported memory usage, this no longer seems to be the case. If this change was intentional, I apologize for having missed that, but I could not find any announcement hinting at this (neither from NEWS nor docs). Also, I feel this takes away some of the convenience of the chunked API. Of course, this can easily be resolved outside of readr by decompressing files manually beforehand (using for example
R.utils::gunzip()
).As it's not straightforward to create an example for this, I'll just add my session info (but I'm happy to provide further information if requested):
The text was updated successfully, but these errors were encountered: