-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache styling #538
Cache styling #538
Conversation
320ec21
to
bfb5053
Compare
a5899f8
to
5ab8389
Compare
Codecov Report
@@ Coverage Diff @@
## master #538 +/- ##
==========================================
- Coverage 90.83% 90.46% -0.38%
==========================================
Files 43 45 +2
Lines 1801 1898 +97
==========================================
+ Hits 1636 1717 +81
- Misses 165 181 +16
Continue to review full report at Codecov.
|
With the latest two commits, we use my fork of R.cache to store empty files for caching, as suggested in HenrikBengtsson/R.cache#37, beacuse otherwise, each cached code expression needs 4KB on disk. |
f6a8923
to
cb2dc17
Compare
Is this creating one file per expression, or one file per file processed? |
One per file. I thought we could also do one per expression, it would give an additional speed boost in case you have big expressions and modify just one. The implementation would also be more complex though. I think as of now, it's pretty slim (excluding all the cache management tools that most people will never touch). However, there is (obviously) a trade-off with file size as long as caching costs us one block, e.g. 4KB memory on macOS. |
Can we do one cache object per R project? |
Do you mean RStudio Projects associated with a working directory? |
Project = package, analysis project, RStudio project -- in the sense of {here} or |
I am not sure I understand what you mean. I don't think we should convolute the project directory with a styler cache to make the caching directory specific. What do you think would be the advantages of this approach? Advantages of a central cache:
Also, I think it would not help solving the problem with the block size. Also, if possible, I'd like to use
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to make this package easier to use in practice!
DESCRIPTION
Outdated
@@ -27,7 +27,8 @@ Imports: | |||
tibble (>= 1.4.2), | |||
tools, | |||
withr (>= 1.0.0), | |||
xfun (>= 0.1) | |||
xfun (>= 0.1), | |||
R.cache (>= 0.13.0.9000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to add {R.cache} to "Suggests"? Leaving it in "Imports" seems easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it was only added in the very last commit cb2dc17363dfaa39f4025a1895ffed86556e5ceb
when I added my own fork as a remote dependency. Probably caused by unintentional usethis::use_dep()
. Do you think it's unnecessary to keep it in Suggests
? It needs some extra handling in other places, I agree.
setdiff(miniCRAN::pkgDep("R.cache"), miniCRAN::pkgDep("styler"))
#> [1] "R.cache" "R.methodsS3" "R.oo" "R.utils"
Created on 2019-09-23 by the reprex package (v0.3.0)
) | ||
should_use_cache <- cache_is_activated() | ||
use_cache <- is_cached && should_use_cache | ||
if (!use_cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an early return like if (use_cache) return(text)
?
R/transform-files.R
Outdated
R.cache::findCache(key = hash_standardize(text), dir = cache_dir) | ||
) | ||
should_use_cache <- cache_is_activated() | ||
use_cache <- is_cached && should_use_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_cache <- is_cached && should_use_cache | |
use_cache <- should_use_cache && is_cached(text) |
with a suitable is_cached()
function avoids calling hash_standardize()
if not needed.
Thanks for reviewing @krlmlr. Feels like the good old times are back 🎉 . |
Do we need to include the style in the hash key, so that we recompute when the style changes? |
I guess we should. The question is how. Because currently, the version of styler determines which cache to use by default. Also, if people have their own style guide they supply, we should probably also hash that function and include it in the hash, leaving the R.cache directory structure as is, i.e.
In addition, if we x <- 3
gg <- function() {
if (TRUE) {
f(x, na = 3)
}
} |
3479b98
to
e1adb92
Compare
still failing: mixed case
hack: convert list with envs to text, otherwise it does not work
… return different things. for reference 43219ixmypi.
a7eae8f
to
d013cea
Compare
I tweaked a bit, works for me with {dm}.
|
Tracked in HenrikBengtsson/R.cache#37.
I am not sure this is necessary because it seems quite cumbersome to implement. You mean in the console output or also in the data frame returned invisibly by Also, if you want to read in only once (i.e.
|
Creating file manually instead of mapping NULL to empty file
I think we should not communicate to the user if the cache was used, at least not in this PR. It involves refactoring quite a few things and the PR is already getting too large. Please open another issue if you think it's important. If something is wrong, the user can always turn off the cache completely. |
Seems that with 36606a2, we are able to consume 0 KB instead of block size per cached value, so that one is resolved. |
Plus the inode, but these are cheap. Let's see how it works without notifying the user. |
There are two remaining issues with this PR:
We will merge this and open new issues for these tasks. |
Closes #320.
Goal
The goal is to make styler remember what it styled so the pre-commit hook in https://github.com/lorenzwalthert/pre-commit-hooks is much faster and functions like
styler::style_pkg()
are much faster if run on already styled files.Requirements
The requirements are:
must work on all plattforms and must not be error-prone.
the user should be able to manage it (know its size and location, deactivate,
clear, delete)
should be as close to the functionality "once seen, by style_text(), Addin or
style_file() remembered forever".
The cache should be default be enabled, but the required packages should be in
Suggests
to keep the core installation lightweight. IfR.cache
is notinstalled, we should issue a warning message, deactivate the caching feature
for the current R session and ask the user to install the dependencies and or
permanently disable the feature in their .Rprofile with
usethis::edit_r_profile().
For this reason, the styling should also work when
R.cache
is not installed,which requires every
R.cache
call to be wrapped in a conditional.The advanced user should be able to understand how R.cache was used to
implement the caching.
Conceputal
In this PR we introduce caching to styler. We follow the approach outlined in
#320 (comment), which basically is:
check if text to style is in cache.
if not, style it and add styled text to cache.
The approach has the following advantages:
Very simple to implement.
API agnostic. Works for files, text, Addin because it operates on the text
level, which is quite low level.
Because approach is path and modification time independent, it can cache the
same content in different locations, including renaming, copies in multiple
places as well as moving files. Can cache multiple versions of a file, e.g. on
different branches, when going back and forth in the git history.
Cache remains very small because no actual code is cached.
The cache must be styler version specific because if not, updated styling rules
won't be applied.
Implementation
We use
R.cache
to power the caching.We use R options to manage it, with additional functional wrappers to modify
the options.
There must be one cache per styler version and for testing purposes, the user
must be able to specify a cache maunally (mainly to be able to delete the
cache of the tests without deleting the cache he uses as a user).
To not convolute the
R.cache
caching directory, we use adir
under thecaching root that corresponds to
/styler/cache_name
, wherecache_name
isthe use specified cache name, defaulting to the installed version number from
DESCRIPTION.
We modify
.travis.yaml
to also test behavior ifR.cache
is not installed.Todo:
R.cache
be in suggest and certainly fail in a vanilla installation of styler when caching is not explicitly disabled andR.cache
is not installed? No, just warning as described above.R.cache
is not installed?