Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to optimize hard links in the Windows version. #2227

Closed
1 task
Driwars opened this issue Jun 12, 2019 · 5 comments
Closed
1 task

Need to optimize hard links in the Windows version. #2227

Driwars opened this issue Jun 12, 2019 · 5 comments

Comments

@Driwars
Copy link

Driwars commented Jun 12, 2019

There are more than 13 thousand! This is not serious.

  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options

git version 2.17.1.windows.2
cpu: x86_64
built from commit: a60968cf435951d9411fc0f980a2e362d5cccea2
sizeof-long: 4
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.17763.107]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Editor Option: VIM
Path Option: Cmd
SSH Option: OpenSSH
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

** insert your response here **

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

no matter


no matter
  • What did you expect to occur after running these commands?

no matter

  • What actually happened instead?

no matter

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?
    no matter
@dscho
Copy link
Member

dscho commented Jun 12, 2019

There are more than 13 thousand! This is not serious.

What exactly do you mean? Please do spend the time to report issues clearly, otherwise the only consequence will be frustration all around.

@Driwars
Copy link
Author

Driwars commented Jun 12, 2019

I will try to explain. Git creates a lot of hardlinks during the installation, according to a report from the NTFSLinksView program of 13884 pieces inside the C: \Program Files\Git directory. 5 lines of the report for an example from 13 thousand.

git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-add.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-archive.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-am.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-annotate.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-apply.exe | 07.04.2019 14:11:24

git-add.exe=git-add.exe=git-archive.exe=and many others have a hash F77A8DB3E3F0BFB68917A834CFD0F5C6. This is one file that made a lot of links.

On one side, the link does not take up space, but on the other hand, the explorer displays a bunch of files, archiving and backup systems perceive files as different.

In the "Git thumbdrive edition" this problem is solved. There are only 94 hardlinks, which is much better. But replaced by another dirty hack git-am.exe is a duplicate (not link) of git-add.exe file and many more.

In this regard, a big request to add a correction that will allow to remove all this garbage links and duplicate files. Thank.

@PhilipOakley
Copy link

I think this is a case of a bit of misunderstanding and confusion.

Just because they are 'hard links' doesn't mean that any more space is use locally. If some older backup system is following the links and resulting in an explosion in storage requirement (at the backup) then it is time to upgrade or replace that backup method as being past its sell by date.

Or dig into the build-extra install code and see if there are tweaks that would be acceptable (on your side) such that you can propose a Pull Request for review (you will need to understand why the portable version reports a different number to the installed version..).

There is already a note in the release notes (C:\Program Files\Git\ReleaseNotes.html) about the confusion from hard links

Older versions of the Windows Explorer do not calculate Git for Windows' on-disk size correctly, as it is unaware of hard links. Therefore, it might look like Git for Windows takes up 1.5GB when in reality it is about a third of that.

@dscho
Copy link
Member

dscho commented Jun 13, 2019

On one side, the link does not take up space, but on the other hand, the explorer displays a bunch of files, archiving and backup systems perceive files as different.

I agree with @PhilipOakley that any backup system that mishandles hardlinks is not actually a backup system.

In the "Git thumbdrive edition" this problem is solved.

No, no, no.

What is "solved" in the portable Git is that it allows us to use a hardlink-unaware 7-Zip extractor to unpack Git without a horrible disk foot print.

Git creates a lot of hardlinks during the installation, according to a report from the NTFSLinksView program of 13884 pieces inside the C: \Program Files\Git directory. 5 lines of the report for an example from 13 thousand.

git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-add.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-archive.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-am.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-annotate.exe | 07.04.2019 14:11:24
git.exe | C:\Program Files\Git\mingw64\bin\git.exe | Hard Link | C:\Program Files\Git\mingw64\libexec\git-core\git-apply.exe | 07.04.2019 14:11:24

That analysis is far from complete, and the 13k number is patently bogus:

$ find /c/Program\ Files/Git/ -type f -printf "%n %p\\n" | grep -v '^1 ' | wc -l
  219

Out of those 219 entries (a far lower number than the 13,884 you claimed), 125 are hardlinks from git.exe to the built-ins. This is Git's current design, to still keep supporting super ancient scripts that relied on the "dashed" forms of the Git commands.

Two of those entries refer to git-lfs.exe, as it lives both in the bin and the libexec\git-core folder.

The remainder are .dll files that live both in bin and the libexec\git-core folder (and must be present in both, to accommodate for DLL search order).

In short, this report suggests that the hardlinks are a problem, while they actually are a solution.

If we would appease the (incorrect) impression that the hardlinks pose a problem themselves, we would re-open far more serious problems that we thought we had solved already.

a big request to add a correction that will allow to remove all this garbage links and duplicate files.

It is incorrect that those links are garbage, and it is also incorrect to characterize the files as "duplicate". The hardlinks are not the problem here, it is a lack of understanding why they are needed. Hopefully I was able to provide enough information to fix that lack.

@dscho dscho closed this as completed Jun 13, 2019
@PhilipOakley
Copy link

@Driwars Also, that Nirsoft package appears to get slightly (British understatement;-) confused about the links anyway.

If you ask for just one level, I got just git.exe and git-lfs.exe, which when double clicked on each report each other as their target, so it is as though it is reporting both ends of the linkages.

Definitely time to upgrade those tools...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants