-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkout is 10 times slower on Windows #1851
Comments
not sure if #1803 (Up for grabs, though 'closed') has any extra info regarding this. Found using https://github.com/git-for-windows/git/issues?utf8=%E2%9C%93&q=jobs |
the #1083 was the one I'd seen. It does mention the difficulty of some aspects of job/process control between the nix world and the Win world. It may not be directly applicable but gives a view about that aspect (IIUC you have multiple jobs running but they are being run serially, so their 'control' is failing as they are not all being released together, which has a pseudo similarity). The other aspect was to highlight the search capability across all of the issues (open and closed) which is not always easy to 'see' within the Gui. So I was just point it out. |
I don't think #1083 is related to this, as the |
I built git from source, run I am not sure where to look, so I performed comparison between linux and windows logs with
clone_linux.txt clone_win.txt these are normalized versions (stripped timestamps, pc names, pids, path differences and also on windows "resolved executable dir" lines). The strange difference in Windows
Linux
|
Sometime on windows the output is messed up (like multiple threads are printing simultaneously):
I do not see this happen on linux. Also another strange differences win vs linux: it is because git does fork on linux, but on windows it spawns a new process and runs the command? |
It looks as if maybe the submodule shape of Boost gives Git headaches? Can you run with |
I already did it in #1851 (comment)
|
Sorry, I had missed that. One thing that I noticed in those logs was that there are tons of recursive calls... this is a very, very deeply recursed repository, isn't it? So how to proceed? I do not, unfortunately, have time to spare. So the best I can do is to try to help you figure it out. What I would do in your shoes would be to try to instrument the code in |
It is not recursive. https://github.com/boostorg/boost is a super-project with around 150 submodules there is no (or I am not aware of any) other submodule that has its own submodules. So after cloning the super-project repository git should easily clone in parallel all of its submodules.
The main problem for me is that From what I see repositories are fetched in parallel (but windows is slower: 12 sec vs 39 sec), but BOTH linux and windows checkout repositories one by one, with no signs of parallel work, and because any git action on windows is slower, with 150 checkouts it plots into that big difference:
|
I fear that there really is no alternative to digging into the Unix shell script called |
I digged huge time differences down to the checkout process. Writing files on Windows takes 20 times more time than on Linux. It looks like Linux caching mechanism is much better, while Windows on |
Additional note on |
@Kojoley :-( So how do we proceed from here? Git for Windows cannot possibly make file I/O faster, I don't think. I guess the only remaining idea I have is to suggest cloning recursively using WSL, but you would need to upgrade to Windows 10 to do that (and that would probably improve the file I/O to begin with). Close? |
Originally I spotted this problem on CI (Appveyor vs Travis), which has Windows Server 2016.
I am not an Windows expert, I tried to google but failed to find anything useful. The last contribution bit I can make is to benchmark file creation on Windows and Linux. I have done it and I can suggest to switch from POSIX file API to WinAPI as it gives minor but persistent speedup. Windows: #include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifdef _WIN32
#include <Windows.h>
#else
#include <time.h>
long GetTickCount(void)
{
struct timespec now;
if (clock_gettime(CLOCK_MONOTONIC, &now))
return -1;
return now.tv_sec * 1000.0 + now.tv_nsec / 1000000.0;
}
#endif
void generate_file(int n, unsigned char const* data, unsigned size)
{
char file_name[32];
#ifndef USE_WINAPI
FILE * f;
#else
HANDLE h;
#endif
#ifdef _WIN32
sprintf(file_name, "tree\\%i.txt", n);
#else
sprintf(file_name, "tree/%i.txt", n);
#endif
#ifndef USE_WINAPI
f = fopen(file_name, "w");
fwrite(data, 1, size, f);
fclose(f);
#else
h = CreateFileA(
file_name, // lpFileName
GENERIC_WRITE, // dwDesiredAccess
0, // dwShareMode
NULL, // lpSecurityAttributes
CREATE_ALWAYS, // dwCreationDisposition
FILE_ATTRIBUTE_NORMAL, // dwFlagsAndAttributes
NULL // hTemplateFile
);
if (h == INVALID_HANDLE_VALUE) abort();
DWORD written;
BOOL r = WriteFile(h, data, size, &written, NULL);
if (!r || written < size) abort();
CloseHandle(h);
#endif
}
int main(void)
{
int i;
long ts;
unsigned char data[10000];
for (i = 0; i < 10000; ++i) {
data[i] = rand() % UCHAR_MAX;
}
ts = GetTickCount();
for (i = 0; i < 10000; ++i) {
generate_file(i, data, 10000);
}
printf("took %li ms\n", GetTickCount() - ts);
} Windows POSIX API:
Windows (MinGW) POSIX API:
Linux:
Windows WinAPI:
|
I thought we already side-stepped many of the POSIX emulations in |
Maybe I am wrong, but |
I have done recently more research and benchmarks, there are even lower level WinAPI calls for file manipulations like Note: I have short names generation disabled in my system. |
Do you have Git smudge / clean filter? |
I do not know what it is. I did not see git to install file system minifilter driver so I am almost sure it has nothing to do with slow |
I fear that this ticket has run its course and all we can do is to close it now. |
I did not receive any help on it here, I spend quite bunch of time profiling, and I think gave enough information to at least optimize file opening for about 30%. |
You mean by switching from So while I would love to have a 30% improvement on performance, I just don't see how this could be done efficiently (and safely!) here. |
The details are washed out from my memory, but I remember that checkout file handling was an isolated part, so altering it should not be a huge patch. Switching to |
Don't get me wrong, I would really love to see a 30% improvement. But Git's source code was born on Linux, and Linux remains its primary development platform (with a few macOS-based developers thrown in), so changing everything to Having said that, even if I closed this ticket, it would be awesome if we could find ways to benefit from your deep analysis. Do you have (even experimental, hacky) patches that demonstrate a speed-up of the actual |
I could understand that argument 10 years when it was crucial to just get git working on Windows, but now git is broadly used on Windows, Microsoft bought GitHub and uses git in their products development, so I do not understand why Linux should cripple the things down. I am sure a file system abstraction would benefit everyone and solve a lot of git issues. Most of the time having files written to a hard drive has no real value, you are simply need a virtual file system over git repository.
I am sorry, it was a year ago and now I do not have the dev environment I was experimenting in to check whether I had something to share. |
Setup
defaults?
to the issue you're seeing?
Nope
Details
CMD
Minimal, Complete, and Verifiable example
this will help us understand the issue.
I expect to see cloning and checkout things go in parallel.
It goes sequentially, so there is no actual difference between
--jobs=10
and--jobs=1
.In Process Explorer I see how one by one

git.exe
processes are spawned and never seen multiple simultaneously.The timings are following: on Windows - 5 minutes for cloning Boost, on Linux - 30 seconds.
URL to that repository to help us with testing?
https://github.com/boostorg/boost.git
The text was updated successfully, but these errors were encountered: