Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: resolve git file attrs #150

Merged
merged 17 commits into from
Jun 11, 2024
14 changes: 7 additions & 7 deletions fmt/src/git.rs
Original file line number Diff line number Diff line change
Expand Up @@ -106,17 +106,17 @@ pub fn resolve_file_attrs(repo: &Repository) -> anyhow::Result<HashMap<String, G
let mut cache = repo.diff_resource_cache(mode, Default::default())?;

let head = repo.head_commit()?;
let mut prev_commit = head.clone();
let mut prev_tree = head.tree()?;

for info in head.ancestors().all()? {
let sorting = gix::traverse::commit::simple::Sorting::ByCommitTimeNewestFirst;
for info in head.ancestors().sorting(sorting).all()? {
let info = info?;
let this_commit = info.object()?;
let time = this_commit.time()?;
let time = gix::date::Time::new(info.commit_time(), 0);

let tree = this_commit.tree()?;
let tree = info.id().object()?.peel_to_tree()?;
let mut changes = tree.changes()?;
changes.track_path().for_each_to_obtain_tree_with_cache(
&prev_commit.tree()?,
&prev_tree,
&mut cache,
|change| {
let filepath = workdir.join(change.location.to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you want to use gix::path::from_bstr(change.location), never use String when anything path related is happening as they can't represent everything that's possible on the filesystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice. I try to write:

                let filepath = gix::path::from_bstring(change.location);
                match attrs.entry(filepath) {

But the filepath is something like "fmt/tests/tests.rs" which cannot match the selections "/Users/tison/Brittani/hawkeye-native/fmt/tests/tests.rs" on the later get.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling filepath.canonicalize() here will result in IO error No such file or directory. I guess it's because no base dir specified.

Perhaps I can use still workdir.join(filepath)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git tracks paths relative to the repository root, hence one will have to join them with the worktree root before using them on the filesystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Thanks for your information.

This changes iteration currently take about 4 seconds on a repo having ~1300 files and ~2600 commits.

I wonder what is the major performance factor and whether we can improve it (e.g., does git work well in such situations?).

It seems most cycles are used to parse the tree data which we can do nothing to improve.

BTW, gitoxide already out performance than git command as:

for i in $(git ls-files); do git --no-pager log --follow --format=%ad -- $i > /dev/null; done

takes 1 minute and 5 seconds to finish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to get an apples-apples comparison since Git definitely does way more work here as it goes through the whole history once per file, but the Rust code only has to go through once. From that point of view, Git's performance is impressive.

I wonder what is the major performance factor and whether we can improve it (e.g., does git work well in such situations?).

Object-access performance is critical, and there are some variables regarding caches that can be set. They can improve performance by a couple of percent, but it's nothing more significant.
But wait, in this line the max-performance feature can be added, it should be noticable.

Expand All @@ -141,7 +141,7 @@ pub fn resolve_file_attrs(repo: &Repository) -> anyhow::Result<HashMap<String, G
Ok::<_, Infallible>(Default::default())
},
)?;
prev_commit = this_commit;
prev_tree = tree;
cache.clear_resource_cache();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem no help. Because we still find object for tree? And iter over all the changes on every location.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For e576dad

Expand Down