Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching for HTTP resources #51

Closed
Bisaloo opened this issue Sep 5, 2022 · 2 comments · Fixed by #132
Closed

Add caching for HTTP resources #51

Bisaloo opened this issue Sep 5, 2022 · 2 comments · Fixed by #132

Comments

@Bisaloo
Copy link
Contributor

Bisaloo commented Sep 5, 2022

As @sbfnk mentioned, it would be useful (and polite) to cache surveys downloaded from zenodo instead of re-downloading them each time the code is re-run.

The simplest option is probably to use memoise but there are also other tools specific to http resources (e.g., https://github.com/sckott/webmiddens) so open to discussion / suggestions

@sbfnk
Copy link
Collaborator

sbfnk commented Oct 14, 2022

I now wonder if this is worth the overhead given that one can just do (from the vignette)

peru_survey <- get_survey("https://doi.org/10.5281/zenodo.1095664")
saveRDS(peru_survey, "peru.rds")

and later future

peru_survey <- readRDS("peru.rds")

or alternatively via #61

peru_files <- download_survey("https://doi.org/10.5281/zenodo.1095664", dir = "Surveys")
peru_survey <- load_survey(peru_files)
saveRDS(peru_files, file.path("Surveys", "peru_files.rds"))

and later

peru_files <- readRDS(file.path("Surveys", "peru_files.rds"))
peru_survey <- load_survey(peru_files)

which also enables inspection/use of the raw csv files in "Surveys".

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Oct 17, 2022

I think it depends on the position you're taking:

  • from the user point of view, you're entirely right, they could "manually cache" the results if they wish.

  • from the server point of view (zenodo.org), we're hitting them with unnecessary requests to get the same result over and over. It would be more polite to cache repeated requests. I believe it's especially important in this case because we're using webscraping and not an official API, which is usually better set up to handle automated requests.

@sbfnk sbfnk mentioned this issue Sep 18, 2024
@sbfnk sbfnk linked a pull request Sep 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants