Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unpack-trees: enable fscache for sparse-checkout #151

Closed

Conversation

derrickstolee
Copy link

When updating the skip-worktree bits in the index to align with new
values in a sparse-checkout file, Git scans the entire working
directory with lstat() calls. In a sparse-checkout, many of these
lstat() calls are for paths that do not exist.

Enable the fscache feature during this scan.

@derrickstolee
Copy link
Author

With this change, running git read-tree -m -u HEAD on a sparse-checkout repo shows lstat() calls for directories only, not their contained blobs (as observed by procmon). This reduced my test from 2 minutes to ~16s.

A more involved change could recognize the missing parent folder and stop recursing on all lower folders.

@derrickstolee
Copy link
Author

This applies only to git-for-windows so this PR is mirrored in git-for-windows#2224.

@kewillford
Copy link
Member

lstat() calls for directories only

Does the file system guarantee that if a file that is deep within the directory structure is changed all the modified timestamps of parent folders are updated? Or does that not matter in this case where it is only updating the ce_flags?

@jeffhostetler
Copy link

@kewillford The file system DOES NOT propagate the mtime changes up the tree. If a file is created or deleted, the mtime on the immediate parent directory is updated, but that's about it. In particular, modifying a file does not affect the mtime of the parent directory.

When updating the skip-worktree bits in the index to align with new
values in a sparse-checkout file, Git scans the entire working
directory with lstat() calls. In a sparse-checkout, many of these
lstat() calls are for paths that do not exist.

Enable the fscache feature during this scan. Since enable_fscache()
calls nest, the disable_fscache() method decrements a counter and
would only clear the cache if that counter reaches zero.

In a local test of a repo with ~2.2 million paths, updating the index
with git read-tree -m -u HEAD with a sparse-checkout file containing
only /.gitattributes improved from 2-3 minutes to ~6 seconds.

Signed-off-by: Derrick Stolee <[email protected]>
@derrickstolee
Copy link
Author

Merged in git-for-windows/git.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants