Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

15 second hang when doing commit on repo with LFS enabled #4275

Closed
1 task done
AntAgna opened this issue Feb 7, 2023 · 18 comments
Closed
1 task done

15 second hang when doing commit on repo with LFS enabled #4275

AntAgna opened this issue Feb 7, 2023 · 18 comments

Comments

@AntAgna
Copy link

AntAgna commented Feb 7, 2023

  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options

git version 2.39.1.windows.1
cpu: x86_64
built from commit: b03dafd9c26b06c92d509a07ab01b01e6d0d85ee
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.22621.1194]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
> type "C:\Program Files\Git\etc\install-options.txt"

Editor Option: VisualStudioCode
Custom Editor Path:
Default Branch Option: main
Path Option: Cmd
SSH Option: OpenSSH
Tortoise Option: false
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Git Pull Behavior Option: Rebase
Use Credential Manager: Enabled
Performance Tweaks FSCache: Enabled
Enable Symlinks: Enabled
Enable Pseudo Console Support: Disabled
Enable FSMonitor: Enabled

  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Happens on my work laptop, which is configured to be part of a Domain.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Happens in git bash, cmd, PowerShell and also from GUI apps.

git commit -m "test"
  • What did you expect to occur after running these commands?

I expect the commit to be finished in less than one second

  • What actually happened instead?

Very often, it gets stuck for 15 or 16 seconds.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

It happens on multiple repositories that have LFS enabled.
I did not try to reproduce on a public repository.

My Analysis

The problem started a few weeks ago. I think it must have started after an update to Git or an update to Windows.

The problem is intermittent: Some days it does not happen at all, some days it happens most of the time.

Many Git commands hang for ~15 seconds, including git commit and git reset.

Using Git Trace2 and SysInternals Process Monitor, I have been able to see what git.exe is doing while it is hung :

  • git.exe calls git-lfs post-commit
  • git-lfs.exe calls uname
  • uname.exe tries to do network lookups and hangs for 15 seconds

By looking at uname source, I see that it tries to lookup the hostname and domain name. I think this triggers a DNS lookup.

When I am away from the office and with the VPN disconnected, the DNS lookup for the domain fails, probably with a timeout of 15 seconds.
This explains the intermittency : The problems does not happen when I am at the office nor when I am connected to the VPN.

I think it is likely that many others are having this annoying problem.

This may be more of an MSYS/Cygwin problem.

@rimrul
Copy link
Member

rimrul commented Feb 8, 2023

My Analysis

The problem started a few weeks ago. I think it must have started after an update to Git or an update to Windows.

The problem is intermittent: Some days it does not happen at all, some days it happens most of the time.

Many Git commands hang for ~15 seconds, including git commit and git reset.

Using Git Trace2 and SysInternals Process Monitor, I have been able to see what git.exe is doing while it is hung :

  • git.exe calls git-lfs post-commit
  • git-lfs.exe calls uname
  • uname.exe tries to do network lookups and hangs for 15 seconds

That's a good analysis.

By looking at uname source, I see that it tries to lookup the hostname and domain name. I think this triggers a DNS lookup.

Both gethostname_cygwin() and getdomainname() just call GetNetworkParams(), that shouldn't trigger a DNS lookup.

@dscho
Copy link
Member

dscho commented Feb 8, 2023

git-lfs.exe calls uname

So it seems that this is the actual root cause, and the problem could be alleviated by changing Git LFS?

@AntAgna
Copy link
Author

AntAgna commented Feb 8, 2023

Both gethostname_cygwin() and getdomainname() just call GetNetworkParams(), that shouldn't trigger a DNS lookup.

I was intrigued by this.
I managed to attach a debugger to uname.exe and here is the stack trace :

 	ntdll.dll!NtAlpcSendWaitReceivePort()	Unknown
 	rpcrt4.dll!LRPC_BASE_CCALL::DoSendReceive()	Unknown
 	rpcrt4.dll!LRPC_CCALL::SendReceive()	Unknown
 	rpcrt4.dll!I_RpcSendReceive()	Unknown
 	rpcrt4.dll!NdrSendReceive()	Unknown
 	rpcrt4.dll!NdrpClientCall3()	Unknown
 	rpcrt4.dll!NdrClientCall3()	Unknown
>	logoncli.dll!DsEnumerateDomainTrustsW()	Unknown
 	msys-2.0.dll!000000021016a464()	Unknown
 	msys-2.0.dll!000000021016e9f1()	Unknown
 	msys-2.0.dll!0000000210171070()	Unknown
 	msys-2.0.dll!0000000210132c12()	Unknown
 	msys-2.0.dll!0000000210048ab8()	Unknown
 	msys-2.0.dll!0000000210047716()	Unknown
 	msys-2.0.dll!00000002100477c4()	Unknown

So maybe GetNetworkParams() triggers a call to DsEnumerateDomainTrusts() which tries to contact the Active Directory server.

@AntAgna
Copy link
Author

AntAgna commented Feb 8, 2023

git-lfs.exe calls uname

So it seems that this is the actual root cause, and the problem could be alleviated by changing Git LFS?

Yes changing git-lfs to stop calling uname would fix this.

In the git-lfs source, I see only one call to uname, in isCygwin()

It seems to be used only to determine if it is running in a Cygwin/MSYS environment. If it is running in a Cygwin/MSYS environment, then it sometimes needs to translate paths.

Would there be another way to determine if running in Cygwin ?

Should I post this issue in git-lfs ?

@dscho
Copy link
Member

dscho commented Feb 9, 2023

@AntAgna for a brief moment, I thought that changing https://github.com/git-lfs/git-lfs/blob/3bf8ad72b3738deca0d66d47e895710990aafd92/tools/cygwin_windows.go#L41 to call uname -s instead of uname might avoid the need for that network call, but I am no longer certain. Can you test whether uname -s is fast while uname is slow?

@AntAgna
Copy link
Author

AntAgna commented Feb 10, 2023

Can you test whether uname -s is fast while uname is slow?

I have tested that and I get the same 15 second wait with uname -s

@AntAgna
Copy link
Author

AntAgna commented Feb 10, 2023

I have installed Cygwin from cygwin.com and tried calling uname

  • When calling uname from the cygwin environment, does not stall
  • When calling uname from a standard command prompt in C:\cygwin64\bin regularly stalls for 15 seconds
  • When calling uname from a standard command prompt in C:\Program Files\Git\usr\bin regularly stalls for 15 seconds
  • When calling uname from git-bash, does not stall

I tried the hostname command and I observe the same result : 15 second wait in cmd with both versions but no wait from inside the cygwin environment.

I find it peculiar that the same .exe file behaves differently depending on how it was launched. I looked at the difference between the environment variables in the cmd and cygwin environments. Cygwin adds an environment variable HOSTNAME (and others that seemed unrelated). Adding the HOSTNAME variable in cmd does not prevent the 15 second wait.

I installed the source for the cygwin coreutils package.
I see that uname.exe calls the uname() function, which imitates the Linux uname system call.
The uname() function calls the Windows API gethostname().
So gethostname() is called regardless of the command line arguments given to uname.exe.

In the gethostname() documentation, we see :

The gethostname function queries namespace providers to determine the local host name ...
If no namespace provider responds, then the gethostname function returns the NetBIOS name of the local computer.

So querying the network with a 15 seconds timeout seems to be normal behavior for this function.

Launching Git Bash (either from the start menu or from c:\Program Files\Git\usr\bin\bash.exe) also regularly hangs for 15 seconds. But once Git Bash is launched, calling uname or hostname seems to be always fast.

@dscho
Copy link
Member

dscho commented Feb 15, 2023

The uname() function calls the Windows API gethostname().

Indeed.

So gethostname() is called regardless of the command line arguments given to uname.exe.

Yes, this is the common problem with some POSIX functions (stat() is in the same boat): they pretend to be granular, but return so much information that it takes multiple Win32 API calls to gather all that data.

For the record, here is the implementation of that cygwin_gethostname() function.

@dscho
Copy link
Member

dscho commented Feb 15, 2023

Apparently that implementation changed in Cygwin v3.0.0: before, it used GetComputerName(), now it uses GetNetworkParams(): git-for-windows/msys2-runtime@2166f7d

My suspicion is that the latter call is slow in your case, the former is fast. Could you verify my suspicion @AntAgna?

@AntAgna
Copy link
Author

AntAgna commented Feb 15, 2023

I have written a simple program to test this : TestGetComputerName

It tries to call both GetComputerNameEx() and GetNetworkParams() and also calls git's hostname.exe

Results:

  • Calling GetComputerNameEx() and GetNetworkParams() is instantaneous
  • Calling git's hostname.exe or uname.exe hangs for 15 seconds

So calling the GetNetworkParams() function does not trigger the problem.
Maybe something else happens in MSYS's uname.exe & hostname.exe that cause a network lookup.

@dscho
Copy link
Member

dscho commented May 26, 2023

Maybe something else happens in MSYS's uname.exe & hostname.exe that cause a network lookup.

That's strange. The default uname.exe invocation only calls the uname() function, and that function seems not to call anything costly except cygwin_gethostname().

Maybe the cost it is in the startup of the MSYS2 runtime? Could you run strace -o uname.trace uname.exe and analyze the generated uname.trace file to see whether there is any obvious culprit causing long delays?

@AntAgna
Copy link
Author

AntAgna commented Jun 6, 2023

Maybe the cost it is in the startup of the MSYS2 runtime? Could you run strace -o uname.trace uname.exe and analyze the generated uname.trace file to see whether there is any obvious culprit causing long delays?

I have captured a strace of uname.exe with the problem happening :
uname.trace.zip

In that trace, the time seems to be spent in mount_info::conv_to_posix_path
cygwin_gethostname() does not seem to take much time.

I have noticed that things other than Git sometimes also freeze for 15 seconds when the VPN is disconnected.
In particular, opening Windows File Explorer on the This PC page freezes sometimes for 15 seconds.
So I think the problem happens when a process tries to get information about some files or paths that trigger Windows to try to contact an unreachable remote server.
Windows must cache the result for some time which causes the problem to be intermittent.

When working in git repos without LFS, the problem does not get triggered.
When working in git repos with LFS, the problem happens very often.

As a workaround, I now keep the office VPN active when working from out of the office when possible.

@dscho
Copy link
Member

dscho commented Aug 14, 2023

the time seems to be spent in mount_info::conv_to_posix_path
cygwin_gethostname() does not seem to take much time.

Would you be able to try with a custom build of the MSYS2 runtime where you modify cygwin_gethostname() to essentially hard-code the host name?

@dscho
Copy link
Member

dscho commented Sep 3, 2023

Git for Windows v2.42.0(2) was released with a fix for the same symptom: long hangs when disconnected from the domain controller. Could y'all give it a try and report back if this fix happened to fix this here issue, too?

@AntAgna
Copy link
Author

AntAgna commented Sep 6, 2023

I have installed version 2.42.0.windows.2
The 15 second delay problem still happens with version 2.42, both when performing git operations in a repo with LFS enabled and when calling uname.exe in C:\Program Files\Git\usr\bin

The issue #4459 does seem to be related as it was also describing 15 second delays.
It looks like the #4459 issue is saying that the version 2.41 triggered a 15 seconds delay very frequently and multiple times in a row instead of once and only when LFS is enabled.
In both cases, the delay seems to be caused by a timeout when performing a domain lookup in the cygwin code.

I have used a debugger to see the stack trace of uname.exe when it hangs :

 	ntdll.dll!00007ffaa530fec4()	Unknown
 	rpcrt4.dll!LRPC_BASE_CCALL::DoSendReceive()	Unknown
 	rpcrt4.dll!LRPC_CCALL::SendReceive()	Unknown
 	rpcrt4.dll!I_RpcSendReceive()	Unknown
 	rpcrt4.dll!NdrSendReceive()	Unknown
 	rpcrt4.dll!NdrpClientCall3()	Unknown
 	rpcrt4.dll!NdrClientCall3()	Unknown
>	logoncli.dll!DsEnumerateDomainTrustsW()	Unknown
 	msys-2.0.dll!00000002100e8cd4()	Unknown
 	msys-2.0.dll!00000002100ed379()	Unknown
 	msys-2.0.dll!00000002100ef850()	Unknown
 	msys-2.0.dll!0000000210183302()	Unknown
 	msys-2.0.dll!0000000210047012()	Unknown
 	msys-2.0.dll!0000000210045c86()	Unknown
 	msys-2.0.dll!0000000210045d34()	Unknown

The hang seems to happen in the API DsEnumerateDomainTrustsW().
In the cygwin code, I see only one place that calls this API, cygheap_domain_info::init()

@dscho
Copy link
Member

dscho commented Nov 15, 2023

Could you please test with v2.43.0-rc2?

@AntAgna
Copy link
Author

AntAgna commented Nov 15, 2023

Could you please test with v2.43.0-rc2?

My employer has replaced my work laptop with a brand new one. The new laptop is not joined to the domain. So I can't reproduce the issue anymore.

@dscho
Copy link
Member

dscho commented Nov 16, 2023

Could you please test with v2.43.0-rc2?

My employer has replaced my work laptop with a brand new one. The new laptop is not joined to the domain. So I can't reproduce the issue anymore.

Okay, let's close this ticket, then.

@dscho dscho closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants