-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements content addressing inside copyPromise #4586
Conversation
I'm thinking about landing it soon, to let people experiment with it in RCs; I think it's fine if reviews are done post-merge if you don't have the bandwidth right now, I'll do a follow-up if necessary. |
async checksumFilePromise(path: P, {algorithm = `sha512`}: {algorithm?: string} = {}) { | ||
const fd = await this.openPromise(path, `r`); | ||
|
||
try { | ||
const CHUNK_SIZE = 65536; | ||
const chunk = Buffer.allocUnsafeSlow(CHUNK_SIZE); | ||
|
||
const hash = createHash(algorithm); | ||
|
||
let bytesRead = 0; | ||
while ((bytesRead = await this.readPromise(fd, chunk, 0, CHUNK_SIZE)) !== 0) | ||
hash.update(bytesRead === CHUNK_SIZE ? chunk : chunk.slice(0, bytesRead)); | ||
|
||
return hash.digest(`hex`); | ||
} finally { | ||
await this.closePromise(fd); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this duplicated in here instead of using the one in hashUtils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because otherwise fslib
would depend on the core, which isn't necessary. The best would be to change the core to use this new FakeFS function rather than the hashUtils one, but I wanted to avoid touching the core in this PR 🤔
this.indexFolderPromise = setupCopyIndex(xfs, { | ||
indexPath: ppath.join(opts.project.configuration.get(`globalFolder`), `index` as Filename), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that the constructor has side-effects, can't this happen later when it's actually needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it early to avoid having to either do IO calls for each package, or having to keep a boolean state (which I feel like would add complexity to avoid a situation - a pnpm install on an empty project without any other install - that doesn't happen much, and doesn't cost much even if it happens).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have that boolean state, this will do 257 mkdir
calls every time PnpmInstaller
is constructed, regardless of which linker is in use and if the directories already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good to me. A couple of observations that might be optimised in the future. They both seem to eat around 30-35% of the overall performance.
lockPromise
- it is generally a complex operation which creates the file by opening it in exclusive mode and then removes the file and it is done for EACH copied file. I think it eats majority of these 30%. I It can be optimised at least by doinglockPromise
once per package instead or by switching to atomic IO operations or to daemon server aspnpm
does.- Computing hashes per each file. I think this one doesn't actually contribute much, but can be optimised by storing precomputed hashes in the storage instead and addressing the storage via hash(package_contents | file_path_inside_package).
Locking by package wouldn't work, because multiple packages may contain the same file content (and thus point to the same index file) - I want to improve this part, I agree it's a low-hanging fruit, but I'll probably address that as a follow-up. |
This PR is connected to #4533
What's the problem this PR addresses?
We want to support content-addressable cache when running installs. The implementation in #4533 is a great start, but implements this support inside a
copyPromise
function unique to the nm linker - subsequently importing it into the pnpm linker so that the two share the same code.While avoiding code duplication is a good thing generally speaking, I strongly feel like we should leverage tools from the core (improving them if needed) rather than implement special-cases that we then reuse in multiple places. It doesn't give us as much freedom to implement behaviours tailored to specific use cases, but this cost is imo offset by the easier maintenance now that we have a clearly delimited core to rely on.
Note that this only covers the pnpm linker - the nm linker is a little special in that @larixer has definitely more experience there so I trust him to implement and maintain his own versions of our tools if really needed.
How did you fix it?
The main difference compared to #4533 is that I check whether the index is busted by just checking its mtime, rather than computing its checksum.
I have two further optimizations in mind, although they'll have to be implemented separately:
I'm thinking we could use the Zip extra fields to precompute sha512 checksums for all files inside zip archives; this might then speedup the
checksumFilePromise
calls, which wouldn't need to fetch the file contents anymore.The copyFile implementation currently does a
write(dest, read(src))
instead of aclone(dest, src)
. We should allow FakeFS implementation to know whether they are compatible for fast cloning (at minima by keeping track of whether the deepest underlyingbaseFs
isxfs
).Performance-wise, using the Yarn repository as test subject, on my fairly slow machine:
The
lockPromise
is probably where a lot of perfs are lost - I may try to replace it withrenamePromise
instead (I initially usedlockPromise
because I'd like to preserve the inode when repairing a modified file, so that the fix affects all packages on the machine).Checklist