-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use git's partial clone feature to speed up pip #9086
Conversation
4f9f4d3
to
f7ac0d5
Compare
Unfortunately, I found that this only works if the server has a couple of config options set. I've set them in the unit tests here, and I know that github supports those config options, but without a way to detect if the server supports filtering, I'm not sure if it's possible to proceed here |
This comment has been minimized.
This comment has been minimized.
What would happen when the server is not configured so? Since the most majority of our users likely use services that do support this, it may be reasonable to implement a fallback mechanism if possible. |
f7ac0d5
to
7079edf
Compare
Sorry for the delay. Fell off my radar, but back to clean it up. I updated the test to confirm that the git client actually behaves totally fine when the server lacks support for filtering (emits a warning upon git pull, but proceeds correctly w/o filtering). As you mentioned, majority of users at this point are using services that will support this feature. |
7079edf
to
fdf47ac
Compare
Ended up running black on |
Looks good to me now, @pypa/pip-committers do we want a |
To answer that it would help knowing if this feature has been supported by git (client) since long enough. |
It's been around since around 2018 https://git-scm.com/docs/partial-clone since git 2.19 (though support increased over the versions) If using an older version of git which doesn't understand the "--filter" flag, it may break pip. In that case, something like "--use-deprecated" or even a silent fallback to a version without the --filter flag I went and manually tested by building old versions of git from source and v2.16.0 seems to crash without filter flag
and v2.7.0 seems to work fine
According to https://en.wikipedia.org/wiki/Git#Releases 2.17 released in 2018 Depending on our willingness to break people using >3 year old versions of git, we could opt to include a |
Doing a case study with ubuntu: ubuntu 18.04 LTS ships with 2.17 https://packages.ubuntu.com/bionic/git However, if you go back further to ubuntu 16.04 LTS, it ships with 2.7.4 of git https://launchpad.net/ubuntu/xenial/+source/git Ubuntu 16.04 LTS ended support in 2021 recently - though there is still some kind of "extended security maintenance" until 2024, though I suspect we can allow those folks to be on their own - installing older versions of pip. |
I happened to have the opportunity to deal with this “foo is released 5 years ago so we could probably drop it” stuff, and it turns out it’s entirely not uncommon for pip users to run things on pretty old setups. If the support was first released in 2018, we probably couldn’t safely assume its existence until somewhere like 2028 (and people will still loudly complain, it’s just we’ll have better ground refuting them by the time) |
2018 is not that old. So I'd suggest the silent fallback on older git version. We already have |
We need to detect the Git server's version as well, so |
Git clients that support —filter should all automatically do a fallback if
the server doesn’t support. I can manually validate this for all major git
versions since 2.17
The unit test in the diff actually validates this
Is there a way to unit test with different git versions?
—Nipunn
…On Sun, Aug 8, 2021 at 2:57 AM Tzu-ping Chung ***@***.***> wrote:
We need to detect the Git *server*'s version as well, so get_git_version
isn't enough (and I don't think there's a reliable way to detect that).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9086 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ5PI27UQGCJZ5TRWINAF3T3ZIHDANCNFSM4TG2J35Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
I confirmed that since git client version 2.17, cloning with
Agreed that 2018 is recent and that it'll probably be a long time since we stop seeing it. At my last company, we were using ubuntu 14.04 still (trying hard to get off it, but it wasn't so easy). I'll update the diff to leverage |
fdf47ac
to
7266efe
Compare
7266efe
to
35ba1e4
Compare
35ba1e4
to
55ec2f9
Compare
745ca16
to
bf90abf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contrib thanks !
Perhaps, in a followup, it would be interesting to lru_cache git_get_version()
.
One last question, have you done some feature and/or performance comparisons between |
No I haven't! Docs are here https://git-scm.com/docs/git-rev-list#Documentation/git-rev-list.txt---filterltfilter-specgt From reading the docs, --filter=blob:none is sort of the 0IQ - don't get anything option. I have done some perf comparisons between |
Clone with --filter=blob:none - as it fetches all metadata, but only dynamically fetches the blobs as needed by checkout. Since typically, pip only needs the blobs for a single revision, this can be a big improvement, especially when fetching from repositories with a lot of history, particularly on slower network connections. Added unit test for the rev-less path. Confirmed that both of the if/else paths are tested by the unit tests.
bf90abf
to
2d35b80
Compare
(rebased to handle conflicts). Should be better now. |
Ok, I did some more testing and all looks good. I also re-read this GitHub article which seems to indicate that blobless clone is the good compromise for pip. Thanks again! |
Clone with --filter=blob:none - as it fetches all
metadata, but only dynamically fetches the blobs as
needed by checkout. Since typically, pip only needs the blobs for
a single revision, this can be a big improvement, especially
when fetching from repositories with a lot of history,
particularly on slower network connections.
Added unit test for the rev-less path. Confirmed that both
of the if/else paths are tested by the unit tests.