Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform shallow clones on the TypeScript submodule. #9

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DanielRosenwasser
Copy link
Member

This should make clones of the TypeScript repo way quicker, since we really don't need full history available.

@jakebailey
Copy link
Member

Tested this, and it still gave me a full clone, it seems.

@no-yan
Copy link

no-yan commented Mar 13, 2025

Although this PR itself is appropriate, it does not work as expected due to the Git issues described below.

@DanielRosenwasser
If necessary, I think it would be great to update the README with alternative cloning instructions as a workaround. Of course, I'm happy to take care of it myself!


Current Issues

We have two ways to clone submodules, but both suffer from performance issues:

  1. git clone --recurse-submodules --shallow-submodules fetches all branches:

Even with the --shallow-submodules option, Git fetches histories of unused branches. For large repositories like TypeScript, which contain hundreds of branches, this leads to downloading a significant amount of unnecessary data, greatly increasing clone times.

related issue: git/git#1740

  1. git submodule update --init ignores shallow settings:

Even if .gitmodules specifies shallow settings, Git does not follow these settings after running git submodule init. This results in downloading excessive history data, making clones slow and inefficient.

Command logs

  1. git clone --recurse-submodules --shallow-submodules

As shown in the logs, more than 300 branch objects are requested, which results in a large download size.

> GIT_TRACE=1 GIT_TRACE_PACKET=1 GIT_TRACE_PERFORMANCE=1 GIT_CURL_VERBOSE=1  git clone --branch shallowClone --recurse-submodules --shallow-submodules [email protected]:microsoft/typescript-go.git

// Submodule processing begins
Submodule '_submodules/TypeScript' (https://github.com/microsoft/TypeScript.git) registered for path '_submodules/TypeScript'

// Executes git clone --no-single-branch, requesting objects for all branches from the server
18:38:15.581439 run-command.c:759       trace: start_command: /opt/homebrew/opt/git/libexec/git-core/git clone --no-checkout --progress --depth=1 --separate-git-dir /Users/noyan/tmp/go-recur-clone/.git/modules/_submodules/TypeScript --no-single-branch -- https://github.com/microsoft/TypeScript.git /Users/noyan/tmp/go-recur-clone/_submodules/TypeScript

// Requests objects for 300 branches from the server
17:51:55.092224 pkt-line.c:85           packet:        clone> command=fetch
17:51:55.092238 pkt-line.c:85           packet:        clone> agent=git/2.48.1
17:51:55.092243 pkt-line.c:85           packet:        clone> object-format=sha1
17:51:55.092246 pkt-line.c:85           packet:        clone> 0001
17:51:55.092249 pkt-line.c:85           packet:        clone> thin-pack
17:51:55.092252 pkt-line.c:85           packet:        clone> ofs-delta
17:51:55.092256 pkt-line.c:85           packet:        clone> deepen 1
17:51:55.092261 pkt-line.c:85           packet:        clone> want 0aac72020ee8414218273f654eb7ce1dc2dd0d6b
17:51:55.092265 pkt-line.c:85           packet:        clone> want 1ae7dbbcf091c9e52296bccfbb376f6fc397bd85
17:51:55.092268 pkt-line.c:85           packet:        clone> want 5aa2eb744a3cffe570e54a4d382d67013284742b
17:51:55.092271 pkt-line.c:85           packet:        clone> want 78ee4cacafc20491fca5557da7908580df18db0e
...

remote: Enumerating objects: 84605, done.
remote: Counting objects: 100% (34907/34907), done.
remote: Compressing objects: 100% (17581/17581), done.
remote: Total 25934 (delta 15901), reused 16546 (delta 7746), pack-reused 0 (from 0)

18:41:57.210076 git.c:476               trace: built-in: git fetch --depth=1

Receiving objects: 100% (25934/25934), 34.26 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (15901/15901), completed with 4849 local objects.
remote: Enumerating objects: 569, done.
remote: Counting objects: 100% (303/303), done.
remote: Compressing objects: 100% (51/51), done.
remote: Total 59 (delta 53), reused 10 (delta 6), pack-reused 0 (from 0)
Unpacking objects: 100% (59/59), 49.10 KiB | 88.00 KiB/s, done.
remote: Enumerating objects: 569, done.
remote: Counting objects: 100% (303/303), done.

remote: Compressing objects: 100% (51/51), done.
  1. git submodule update --init

Even with shallow=true, it is not fully followed, and a large amount of history is downloaded.

> git clone --branch shallowClone [email protected]:microsoft/typescript-go.git; cd typescript-go
> git submodule update --init

// Retrieves HEAD and its history
// i.e., all commits before HEAD in the main branch
18:38:00.970221 git.c:476               trace: built-in: git fetch origin 52c59dbcbee274e523ef39e6c8be1bd5e110c2f1

19:28:36.112895 pkt-line.c:85           packet:        fetch> command=fetch
19:28:36.112905 pkt-line.c:85           packet:        fetch> agent=git/2.48.1
19:28:36.112908 pkt-line.c:85           packet:        fetch> object-format=sha1
19:28:36.112910 pkt-line.c:85           packet:        fetch> 0001
19:28:36.112912 pkt-line.c:85           packet:        fetch> thin-pack
19:28:36.112914 pkt-line.c:85           packet:        fetch> ofs-delta
19:28:36.112917 pkt-line.c:85           packet:        fetch> shallow 0aac72020ee8414218273f654eb7ce1dc2dd0d6b
19:28:36.112921 pkt-line.c:85           packet:        fetch> want 52c59dbcbee274e523ef39e6c8be1bd5e110c2f1
19:28:36.112926 pkt-line.c:85           packet:        fetch> have 0aac72020ee8414218273f654eb7ce1dc2dd0d6b

remote: Enumerating objects: 622228, done.
remote: Counting objects: 100% (622214/622214), done.
remote: Compressing objects: 100% (165732/165732), done.
Receiving objects: 100% (598243/598243), 1.91 GiB | 10.97 MiB/s, done.
Resolving deltas: 100% (443668/443668), completed with 19471 local objects.

Workaround

Perform a normal git clone first, then run git submodule update with depth 1.

git clone [email protected]:microsoft/typescript-go.git
git submodule update --init --depth 1

Use the same approach when updating submodules:

git submodule update --depth 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants