-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer: cache runtime artifacts on filesystem #747
Conversation
This will download all runtime parameters once and stores them in an in memory cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional thoughts:
FileSystemCache
has no eviction mechanism and no size limit on values you put in the map. You may run out of memory. I'd limit the number of keys at least (but better to use a proper LRU cache).- You'll make copies of duplicate files (two urls returning same data). This can be avoided you keep an index of url -> hash(data) and use hash(data) instead of hash(url) for the filename.
- If you can't write a generic
fetch
-compatible wrapper, perhaps you can narrow down whatfetch
does to something simpler, e.g. async function that only takes a URL and returns a string? That would make it easier to wrap in a cached call.
packages/core/lib/FileSystemCache.js
Outdated
*/ | ||
'use strict'; | ||
const {join} = require('path'); | ||
const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering you have this many imports, I'd argue destructuring isn't a good idea, some of these are more readable as part of fs
(e.g. fs.unlink
vs unlink
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol - yeah. Frog in boiling water...
packages/core/lib/FileSystemCache.js
Outdated
'use strict'; | ||
const {join} = require('path'); | ||
const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); | ||
const {promisify} = require('util'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this? Unless you want to support Node.js 9 and below (which is probably no longer needed... oldest supported LTS is 10), use fs.promises
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Old habit...fixed.
packages/core/lib/FileSystemCache.js
Outdated
const unlinkAsync = promisify(unlink); | ||
|
||
class FileSystemCache { | ||
static get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get
to me implies a singleton-like pattern, but this actually creates a new instance (also, you used create
below for a similar pattern). Also, do you really need this, since it doesn't do much other than provide a default (which you can just do in the constructor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to create. Leftover from a previous implementation.
packages/core/lib/FileSystemCache.js
Outdated
* limitations under the License. | ||
*/ | ||
'use strict'; | ||
const {join} = require('path'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you're importing path
twice. Also, I suggest not destructuring, join
can be very confusing (as opposed to path.join
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
packages/core/lib/FileSystemCache.js
Outdated
await Promise.all( | ||
entries.map((entry) => { | ||
let fullPath = path.join(dir, entry.name); | ||
return entry.isDirectory() ? this.deleteDir_(fullPath) : unlinkAsync(fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially dangerous: directories can be cyclic. That said, I guess this is not a big deal and hard to fix. I'd consider using fs.rmdir
which is experimental or rimraf
. Or add a max depth.
Actually, if I understand correctly, nested directories are never created by this cache. I'd throw if there are non-files in a directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rmdir is unfortunately only Node >=12. rimraf has to many dependencies for a non-dev dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this further, there's another danger here: if there's a symlink outside this directory (worst case, to ~
or /
), this will delete everything it can. You can argue that's not very likely, but I'd prefer if this was impossible.
A few options:
- Ignore directories and only delete files (since directories are never created this way anyway).
- Check if a directory is a symlink first.
- Resolve symlinks and check if the directory has the initial directory as a prefix.
Regarding fs.rmdir
, you can add an if
to check if it's available and fallback to your own implementation if not. But probably not needed anyway. Just make sure it doesn't recursively delete my home folder. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg - changed to only delete files in cache dir
/** | ||
* @private | ||
*/ | ||
async function downloadAmpRuntimeStyles_(config, runtimeCssUrl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, consider moving to a more generic fetch
wrapper.
packages/optimizer/scripts/init.js
Outdated
|
||
async function warmupCaches() { | ||
// run a dummy transformation to pre-fill the caches | ||
await ampOptimizer.transformHtml('<h1>hello world</h1>', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: just return it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
const cacheFile = this.createCacheFileName(key); | ||
try { | ||
const content = await readFileAsync(cacheFile, 'utf-8'); | ||
value = JSON.parse(content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why must it be JSON? If you use a raw buffer, your cache becomes binary-compatible (and less error prone and less code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you got an example for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const content = await fs.readFile(cacheFile); // returns Buffer
this.cache.set(key, value); // Buffer in Map
...
fs.writeFile(cacheFile, value); // writes Buffer to file, no need to specify encoding
JSON.parse
also works on a Buffer
, by coercing it to string first (i.e. JSON.parse(str) === JSON.parse(Buffer.from(str))
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, so the advantage is that you don't need to mess with encoding?
await initValidatorRules(runtimeParameters, customRuntimeParameters, config); | ||
await initRuntimeVersion(runtimeParameters, customRuntimeParameters, config); | ||
await initRuntimeStyles(runtimeParameters, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will execute sequentially, but it can be done in parallel with one (annoying) tranformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good tip! done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Filip! Comments:
- re LRU Cache: not if that's a realistic scenario with releases once a week, but better save than sorry. Using an LRU Cache now.
- same value for different URLs is not a use case as we only store version specific artifacts which are immutable.
- fetch wrapper: I don't see the need here. There are three different fetches happening (ValidatorRules with built-int fetch, JSON and text), some require a max-age some don't. Unifying this feels like over-engineering at this point. This might change if we need to fetch more artifacts.
await initValidatorRules(runtimeParameters, customRuntimeParameters, config); | ||
await initRuntimeVersion(runtimeParameters, customRuntimeParameters, config); | ||
await initRuntimeStyles(runtimeParameters, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good tip! done
packages/core/lib/FileSystemCache.js
Outdated
*/ | ||
'use strict'; | ||
const {join} = require('path'); | ||
const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol - yeah. Frog in boiling water...
packages/core/lib/FileSystemCache.js
Outdated
'use strict'; | ||
const {join} = require('path'); | ||
const {rmdir, readdir, readFile, writeFile, unlink, existsSync, mkdirSync} = require('fs'); | ||
const {promisify} = require('util'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Old habit...fixed.
packages/core/lib/FileSystemCache.js
Outdated
const unlinkAsync = promisify(unlink); | ||
|
||
class FileSystemCache { | ||
static get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to create. Leftover from a previous implementation.
const cacheFile = this.createCacheFileName(key); | ||
try { | ||
const content = await readFileAsync(cacheFile, 'utf-8'); | ||
value = JSON.parse(content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you got an example for this?
packages/optimizer/scripts/init.js
Outdated
|
||
async function warmupCaches() { | ||
// run a dummy transformation to pre-fill the caches | ||
await ampOptimizer.transformHtml('<h1>hello world</h1>', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
packages/core/lib/FileSystemCache.js
Outdated
* limitations under the License. | ||
*/ | ||
'use strict'; | ||
const {join} = require('path'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
packages/core/lib/MaxAge.js
Outdated
* @param {Number} timestampInMs time when max-age value was received | ||
* @param {Number} value max-age value in seconds | ||
**/ | ||
static fromJson(timestampInMs, value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, renamed to from/toObject. I miss Java in these cases...
const cacheFile = this.createCacheFileName(key); | ||
try { | ||
const content = await readFileAsync(cacheFile, 'utf-8'); | ||
value = JSON.parse(content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const content = await fs.readFile(cacheFile); // returns Buffer
this.cache.set(key, value); // Buffer in Map
...
fs.writeFile(cacheFile, value); // writes Buffer to file, no need to specify encoding
JSON.parse
also works on a Buffer
, by coercing it to string first (i.e. JSON.parse(str) === JSON.parse(Buffer.from(str))
).
packages/core/lib/FileSystemCache.js
Outdated
async set(key, value) { | ||
const cacheFile = this.createCacheFileName(key); | ||
try { | ||
this.cache[key] = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test worked because it was always getting a cache miss and loading from a file. You could maybe add a test that clears the folder on the filesystem after a set
to make sure it loads it from the LRU.
packages/core/lib/FileSystemCache.js
Outdated
await Promise.all( | ||
entries.map((entry) => { | ||
let fullPath = path.join(dir, entry.name); | ||
return entry.isDirectory() ? this.deleteDir_(fullPath) : unlinkAsync(fullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this further, there's another danger here: if there's a symlink outside this directory (worst case, to ~
or /
), this will delete everything it can. You can argue that's not very likely, but I'd prefer if this was impossible.
A few options:
- Ignore directories and only delete files (since directories are never created this way anyway).
- Check if a directory is a symlink first.
- Resolve symlinks and check if the directory has the initial directory as a prefix.
Regarding fs.rmdir
, you can add an if
to check if it's available and fallback to your own implementation if not. But probably not needed anyway. Just make sure it doesn't recursively delete my home folder. :)
2677bb4
to
8427903
Compare
@fstanis agree, only removing the files now Thanks everyone for the review! |
This PR addresses the problem that currently transformations rely on working access to the internet:
postinstall
hook that will download the latest runtime artifacts to prime the cacheFixes #378 #719 #650
@mdmower would be great if you could check if this works on windows.