-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: metro-file-map roadmap ideas that would be fantastic for monorepos #1325
Comments
Hi @stevenpetryk - this is an awesome analysis and we'd love to work with you to make improvements here. It's probably worth a few words of background context to help explain why things are as they are:
That said, I appreciate the pain this behaviour causes for large repos outside Meta - we've heard similar stories from Microsoft, in particular, and I'm keen to work on a design that works well for our largest external users. So let's dig in...
Interesting - there could be a gap in Metro's logic here, where we assume that requesting Taking advantage of cached SHA1 on non-Eden WatchmanFor non-Watchman, our metro/packages/metro-file-map/src/index.js Lines 534 to 535 in 0a8cfb1
But for Watchman, Metro effectively always asks Watchman to This could be as simple as changing: metro/packages/metro-file-map/src/crawlers/watchman/index.js Lines 207 to 213 in 0a8cfb1
So that we includeSha1 only if the backing watcher is Eden.
Of course for cold builds this makes no difference, but it ought to be a big win if you do have a cache.
This is believable but worth validating, I think - where is that 16GB actually being used? If you use the "minimal" Another possibility is that the memory is being used by Lazily hashing
IMO, this is very much the right thing to do - there are a few small things to design for:
Excluding node_modules
I think this is where we need to be a bit careful to distinguish, say, the contents of But, we might not need to keep the whole project in memory - the watching part alone can be cheap, especially if we don't even need to stat files. On macOS, we can subscribe to changes for a whole file tree in constant time and with no application memory cost, though it's more costly and requires a walk on Linux. A non-trivial complication with some files being in the in-memory map and others not, is how you can quickly tell one case from the other to know whether you need to perform disk IO - especially when symlinks can bridge the two cases. I've got some work in draft somewhere that builds this logic into TreeFS so that it can fall through to file IO. I'll see if I can revive that tomorrow and get back to you here. A second complication is correctly implementing incremental resolution when typical file IO makes symlinks transparent. But yes, broadly I do think there are good opportunities here and I'd be very grateful for your help! |
Thanks for the context! I suspected that Eden would provide a lot of this for free. No judgment on my end btw, I think Metro's watchman integration makes a lot of sense as-is for tons of repos. Responses to specific questions
I can definitely confirm this. When I manually run the same query as Metro against Watchman on a fresh watch, asking for
I do, but I probably can't share since they contain a list of file paths. I'm seeing this:
Responses to solution responses (lol)Lazily hashing
In my testing it's plenty fast, but for blazing fast™, CRC32 would probably be the most blazin' and sufficient. But it may be easier to use the same algorithm for all (even if one's coming from Watchman and another from metro-file-map). Re: the other two points, yes and yes. I noticed there was no mechanism to persist the file map cache. I'm glad this is what you proposed, since I felt a little shy suggesting periodic persistence. And the extra read in my patch was just for testing, I'd definitely want to only read once. Excluding node_modulesAgreed that Metro should/must continue watching node_modules (would be pretty inconvenient otherwise). My latest thinking on this is that Metro should tolerate symlink targets being missing from the Watchman query. The reason being that it would be a huge boon to Watchman performance if we could ignore I think we're aligned but just to clarify, this is the state of the world I'm imagining:
|
@stevenpetryk This article is very interesting! We have a monorepo of 4 React Native apps and MSA + Nestjs (about 50). When we start metro, it is so slow that we have to wait 2 minutes and 30 seconds for it to start (I think it's because of the large number of node_mdoules in root). Is there an update? 🤔 |
Keep an eye on recent commits/PRs 🙂. It's coming... |
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Pull Request resolved: #1435 Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Differential Revision: D69373618
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Reading files at most once With the exception of Eden+Watchman, computing the SHA1 requires reading the content, but if we lazily compute the SHA1, find no cache hit (likely, given no cached SHA1 implies no local transform cache), we'll then want to transform the file we've just read. This change will allow us to pass the content we read to derive the SHA1 through to the transform worker *without reading the file from disk a second time*. `maybeReturnContent` is admittedly a slightly awkward API, but it's Metro-internal, and it was the best I could come up with for returning content only if it's already been read (otherwise, it's faster for transform workers to read it from disk), and only if needed (otherwise there's potentially significant extra IPC for no reason). Changelog: Internal Reviewed By: vzaidman Differential Revision: D69372887
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing a lazy hashing API in metro-file-map This diff adds a new async `getOrComputeSha1` API to the exposed `FileSystem` interface implemented by `TreeFS`. `TreeFS` is provided during construction with a means to use the `FileProcessor`. - In common with other `FileSystem` methods, it returns null if the path does not point to a watched regular file. - It dereferences symlinks to pass only real, absolute paths to `processFile`, so that it accepts the same range of inputs as `getSha1`. - Caches SHA1 in metro-file-map, so that it may be persisted. Safely clears the value on modification, including when modification races processing. - Emits a `metadata` event to inform auto-saving caches about a change to internal state. - This diff does not exercise the new API, except in tests. - TreeFS consumers are required to pass `processFile`, non-breaking as this API is [experimental](https://github.com/facebook/metro/blob/main/packages/metro-file-map/README.md#experimental-metro-file-map). Changelog: Internal Reviewed By: vzaidman Differential Revision: D69123130
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Differential Revision: D69373618
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Reading files at most once With the exception of Eden+Watchman, computing the SHA1 requires reading the content, but if we lazily compute the SHA1, find no cache hit (likely, given no cached SHA1 implies no local transform cache), we'll then want to transform the file we've just read. This change will allow us to pass the content we read to derive the SHA1 through to the transform worker *without reading the file from disk a second time*. `maybeReturnContent` is admittedly a slightly awkward API, but it's Metro-internal, and it was the best I could come up with for returning content only if it's already been read (otherwise, it's faster for transform workers to read it from disk), and only if needed (otherwise there's potentially significant extra IPC for no reason). Changelog: Internal Differential Revision: D69372887 Reviewed By: vzaidman
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing a lazy hashing API in metro-file-map This diff adds a new async `getOrComputeSha1` API to the exposed `FileSystem` interface implemented by `TreeFS`. `TreeFS` is provided during construction with a means to use the `FileProcessor`. - In common with other `FileSystem` methods, it returns null if the path does not point to a watched regular file. - It dereferences symlinks to pass only real, absolute paths to `processFile`, so that it accepts the same range of inputs as `getSha1`. - Caches SHA1 in metro-file-map, so that it may be persisted. Safely clears the value on modification, including when modification races processing. - Emits a `metadata` event to inform auto-saving caches about a change to internal state. - This diff does not exercise the new API, except in tests. - TreeFS consumers are required to pass `processFile`, non-breaking as this API is [experimental](https://github.com/facebook/metro/blob/main/packages/metro-file-map/README.md#experimental-metro-file-map). Changelog: Internal Differential Revision: D69123130
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Pull Request resolved: #1435 Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Differential Revision: D69373618
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Reading files at most once With the exception of Eden+Watchman, computing the SHA1 requires reading the content, but if we lazily compute the SHA1, find no cache hit (likely, given no cached SHA1 implies no local transform cache), we'll then want to transform the file we've just read. This change will allow us to pass the content we read to derive the SHA1 through to the transform worker *without reading the file from disk a second time*. `maybeReturnContent` is admittedly a slightly awkward API, but it's Metro-internal, and it was the best I could come up with for returning content only if it's already been read (otherwise, it's faster for transform workers to read it from disk), and only if needed (otherwise there's potentially significant extra IPC for no reason). Changelog: Internal Differential Revision: D69372887 Reviewed By: vzaidman
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing a lazy hashing API in metro-file-map This diff adds a new async `getOrComputeSha1` API to the exposed `FileSystem` interface implemented by `TreeFS`. `TreeFS` is provided during construction with a means to use the `FileProcessor`. - In common with other `FileSystem` methods, it returns null if the path does not point to a watched regular file. - It dereferences symlinks to pass only real, absolute paths to `processFile`, so that it accepts the same range of inputs as `getSha1`. - Caches SHA1 in metro-file-map, so that it may be persisted. Safely clears the value on modification, including when modification races processing. - Emits a `metadata` event to inform auto-saving caches about a change to internal state. - This diff does not exercise the new API, except in tests. - TreeFS consumers are required to pass `processFile`, non-breaking as this API is [experimental](https://github.com/facebook/metro/blob/main/packages/metro-file-map/README.md#experimental-metro-file-map). Changelog: Internal Differential Revision: D69123130
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Pull Request resolved: #1435 Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Differential Revision: D69373618
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Reading files at most once With the exception of Eden+Watchman, computing the SHA1 requires reading the content, but if we lazily compute the SHA1, find no cache hit (likely, given no cached SHA1 implies no local transform cache), we'll then want to transform the file we've just read. This change will allow us to pass the content we read to derive the SHA1 through to the transform worker *without reading the file from disk a second time*. `maybeReturnContent` is admittedly a slightly awkward API, but it's Metro-internal, and it was the best I could come up with for returning content only if it's already been read (otherwise, it's faster for transform workers to read it from disk), and only if needed (otherwise there's potentially significant extra IPC for no reason). Changelog: Internal Reviewed By: vzaidman Differential Revision: D69372887
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing a lazy hashing API in metro-file-map This diff adds a new async `getOrComputeSha1` API to the exposed `FileSystem` interface implemented by `TreeFS`. `TreeFS` is provided during construction with a means to use the `FileProcessor`. - In common with other `FileSystem` methods, it returns null if the path does not point to a watched regular file. - It dereferences symlinks to pass only real, absolute paths to `processFile`, so that it accepts the same range of inputs as `getSha1`. - Caches SHA1 in metro-file-map, so that it may be persisted. Safely clears the value on modification, including when modification races processing. - Emits a `metadata` event to inform auto-saving caches about a change to internal state. - This diff does not exercise the new API, except in tests. - TreeFS consumers are required to pass `processFile`, non-breaking as this API is [experimental](https://github.com/facebook/metro/blob/main/packages/metro-file-map/README.md#experimental-metro-file-map). Changelog: Internal Reviewed By: vzaidman Differential Revision: D69123130
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Differential Revision: D69373618
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Reading files at most once With the exception of Eden+Watchman, computing the SHA1 requires reading the content, but if we lazily compute the SHA1, find no cache hit (likely, given no cached SHA1 implies no local transform cache), we'll then want to transform the file we've just read. This change will allow us to pass the content we read to derive the SHA1 through to the transform worker *without reading the file from disk a second time*. `maybeReturnContent` is admittedly a slightly awkward API, but it's Metro-internal, and it was the best I could come up with for returning content only if it's already been read (otherwise, it's faster for transform workers to read it from disk), and only if needed (otherwise there's potentially significant extra IPC for no reason). Changelog: Internal Differential Revision: D69372887 Reviewed By: vzaidman
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing a lazy hashing API in metro-file-map This diff adds a new async `getOrComputeSha1` API to the exposed `FileSystem` interface implemented by `TreeFS`. `TreeFS` is provided during construction with a means to use the `FileProcessor`. - In common with other `FileSystem` methods, it returns null if the path does not point to a watched regular file. - It dereferences symlinks to pass only real, absolute paths to `processFile`, so that it accepts the same range of inputs as `getSha1`. - Caches SHA1 in metro-file-map, so that it may be persisted. Safely clears the value on modification, including when modification races processing. - Emits a `metadata` event to inform auto-saving caches about a change to internal state. - This diff does not exercise the new API, except in tests. - TreeFS consumers are required to pass `processFile`, non-breaking as this API is [experimental](https://github.com/facebook/metro/blob/main/packages/metro-file-map/README.md#experimental-metro-file-map). Changelog: Internal Differential Revision: D69123130
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Pull Request resolved: #1435 Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Reviewed By: GijsWeterings Differential Revision: D69373618
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Reading files at most once With the exception of Eden+Watchman, computing the SHA1 requires reading the content, but if we lazily compute the SHA1, find no cache hit (likely, given no cached SHA1 implies no local transform cache), we'll then want to transform the file we've just read. This change will allow us to pass the content we read to derive the SHA1 through to the transform worker *without reading the file from disk a second time*. `maybeReturnContent` is admittedly a slightly awkward API, but it's Metro-internal, and it was the best I could come up with for returning content only if it's already been read (otherwise, it's faster for transform workers to read it from disk), and only if needed (otherwise there's potentially significant extra IPC for no reason). Changelog: Internal Reviewed By: vzaidman Differential Revision: D69372887 fbshipit-source-id: aa65780f3bda1c012c8533a5496649973cb94966
Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing a lazy hashing API in metro-file-map This diff adds a new async `getOrComputeSha1` API to the exposed `FileSystem` interface implemented by `TreeFS`. `TreeFS` is provided during construction with a means to use the `FileProcessor`. - In common with other `FileSystem` methods, it returns null if the path does not point to a watched regular file. - It dereferences symlinks to pass only real, absolute paths to `processFile`, so that it accepts the same range of inputs as `getSha1`. - Caches SHA1 in metro-file-map, so that it may be persisted. Safely clears the value on modification, including when modification races processing. - Emits a `metadata` event to inform auto-saving caches about a change to internal state. - This diff does not exercise the new API, except in tests. - TreeFS consumers are required to pass `processFile`, non-breaking as this API is [experimental](https://github.com/facebook/metro/blob/main/packages/metro-file-map/README.md#experimental-metro-file-map). Changelog: Internal Reviewed By: vzaidman Differential Revision: D69123130 fbshipit-source-id: 74d50ed7c74bd54322e6357bf4a7c26db6a51d56
) Summary: ## Stack In this stack we're moving towards metro-file-map being able to *lazily* compute file metadata - in particular the SHA1 hash - only when required by the transformer. More context in #1325 (comment) ## Implementing config `watcher.unstable_lazySha1` This diff introduces a new opt-in config that - Disables eager computation of `sha1` for all watched files. - Adds support in `Transformer` to accept a callback that asynchronously returns SHA1, and optionally file content. - Maintains support for the old sync API, for anyone using `Transformer` directly. This will likely be dropped in a coming major. Along with the already landed, default-on [auto-saving cache](#1434), this should provide order of magnitude[1] faster startup on large projects, with no compromise to warm build perf, and very little slowdown in cold builds in most cases[2]. [1] Metro needs to watch file subtrees, but typically only a small proportion of those files are used in a build. By hashing up front, we can spend up to several minutes hashing files that will never be used. [2] Cold file caches with warm transform caches - typically only when using a remote cache - may be observably slower due to the need to read and hash a file that wouldn't otherwise need to be read, though this still only moves the cost from startup to build. For truly cold builds, this change adds SHA1 computation time to transform time, but requires no additional IO. SHA1 computation is typically much faster than Babel transformation, and we might consider faster algorithms in future (SHA1 is Eden-native). Pull Request resolved: #1435 Changelog: ``` - **[Experimental]**: Add `watcher.unstable_lazySha1` to defer SHA1 calculation until files are needed by the transformer Reviewed By: GijsWeterings Differential Revision: D69373618 fbshipit-source-id: 2c67e0710b678e491b3760f5d235b4767339d841
Lazy hashing is shipped behind All being well it'll be default in 0.82 for RN 0.79. |
TL;DR: thank you for adding symlink support! It works extremely well for us, but there are some directions we think metro-file-map could be taken that would work better for large non-Facebook monorepos (non-Facebook as in, on standard filesystems).
These directions include (more below):
content.sha1hex
for files in node_modulesOur usecase
At Discord, we have a large monorepo, whose node_modules are managed by pnpm, with the following structure:
Because of this layout (which we can't really change), Metro ends up issuing a Watchman query that can take around 20 seconds on even the fastest machines: it requests the
content.sha1hex
of every single file in node_modules, which for us, is around 300k.If we remove
node_modules
as a watchFolder, then no node_modules end up in the file map, and Metro cannot find them (which is understandable).Because of this behavior of metro-file-map, we're unable to exclude
node_modules/.pnpm
from our Watchman config, which means thatwatchman watch-project .
is also incredibly slow—even though Metro is the only tool we use that actually needs to have Watchman readnode_modules/.pnpm
(the other tools just follow symlinks).Some directions I've explored
I set out this past week to try patching Metro in various ways and seeing how behaviors improved. I have a couple of approaches that I think could be worth upstreaming:
Idea 1: Lazily computing sha1s
I noticed that the Watchman query issued by Metro was taking around 20 seconds, and realized that it went down to <1s when removing
content.sha1hex
as a field from the query. So I tried the following:Disgusting. But it made it so that we never get stuck on a slow query. Now, I think that asking watchman for sha1's is actually really reasonable—for app code. For node_modules, we're asking it to sha1sum hundreds of thousands of files that Metro will likely never actually have to read.
One thing to consider may be a hybrid approach here, then: issue two watchman queries—one for all the app code (with
content.sha1hex
), and one for all node_modules (withoutcontent.sha1hex
). Then, around the moment the sha1 is needed by the transformer, we compute it and write it to the filemap (preferably more elegantly than above).However, this still has a problem: we are still loading hundreds of thousands of files into the filemap, most of which we'll probably never read, which is probably part of why, at Discord, we have to give Metro 16 gigabytes of RAM.
Idea 2: Excluding node_modules from the query entirely
This would, imo, be the most monorepo-friendly approach. Instead of querying watchman for node_modules, Metro could resolve imports as they are encountered (by, yes, following symlinks), and just read from the filesystem (caching as you go). I understand this would probably be a large behavioral change to metro-file-map (much larger than Idea 1).
But the downstream benefits of this would be massive:
node_modules/.pnpm
in.watchmanconfig
, which for us at least, takeswatch-project
down from 500k files to just 200k, speeding it up by the same proportion.In all this exploration, I've started to feel a little bit comfortable inside of the Metro monorepo, and I'd be interested in contributing some of these ideas—but wanted to talk direction/interest first. I hope this perspective from someone dealing with a hefty repo has been valuable. Thanks :)
The text was updated successfully, but these errors were encountered: