-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] caching VCS requirements that are considered immutables #6640
Comments
This sounds like a good idea to me. /cc @cjerdonek since he's most familiar with the VCS code. |
This is actually the thing I'm concerned about with this (and the reason I hadn't responded to this issue yet). I feel uncomfortable about the idea of changing pip's behavior based on what the revision "looks like." This kind of approach seems brittle to me, and yes, would break edge cases where someone has or might want to name a branch that happens to have the form of a sha. I can see that capability being useful as a work-around or in testing situations, or maybe for reasons we don't know. My instinct is that pip should really have a more structured way to specify "this revision is a SHA" as opposed to relying on whether the string "looks like" something. But I haven't really given thought to what that should be since it could likely involve a new format / notation of URL, etc. |
Would you then be open to having an option that enables the feature? Or an option to disable it if we think it could still be on by default (in which case |
I'm certainly open to having a way to specify "this revision is a SHA," but I'm not sure yet what that should be. Do you have any ideas? Having a global option that means "interpret revisions that look like SHA's as commit hashes" seems less attractive to me because it would be an option that is basically a work-around for not being able to specify revisions in a structured way (and pip already has a lot of specialized options). I do wonder if this (caching) could be leveraged with the PEP that we're working on -- because at that point, we would have the information we need in a structured way (in the JSON file). So maybe working on this after the PEP would make more sense? What do you think? |
I've no immediate idea on how to change the pip requirements format to express that a rev is a commit id. Especially to do it in a way that would not break tools out there that interpret it. Another way to address this concern would be to decide to cache or not after building the wheel (here?), when we have a local checkout that we can inspect to know if the rev is a commit or a branch. The only remaining edge case would then be when a branch has the same name as an existing commit id. That, if possible at all, is exotic enough to be ignored. This approach indeed relates closely to the PEP for #609, since to implement it we need to extract the commit id to store it in the direct url origin metadata as So yes, we can keep this in mind while implementing #609 and come back to it just after. The ball is with you to get the PEP rolling :) |
Also, I think there's also the possibility of doing something where the revision is cached, but only after pip inspects it to see if the revision corresponds to a branch, tag, or SHA, etc. So the difference in perspective (e.g. from your PR I looked at) would be: when a revision is encountered, we don't ask "does this revision correspond to a commit hash because it looks like a SHA," but rather "is this revision cached" and use its presence in the cache to know if it corresponded to a commit hash. And seeing if it looks like a hash would basically be a check / optimization to know whether it has the possibility of being in the cache. |
Yes, what I just typed up is I think the same as what you're suggesting, though the way I would phrase it is "decide to cache after cloning the URL and inspecting the refs and SHA, etc. to see if the revision was a commit hash." pip already has this logic (except possibly for the last step of checking whether the revision matches the SHA). I do think we should keep pip's behavior of preferring the branch name over the SHA if there is a collision (since that is what Git does). So it would only be cached if the revision is a SHA that doesn't match a branch or tag name, etc. |
By the way, when this (vcs caching) and #609 (vcs freeze) are implemented, it will make |
#609 is probably more inline with something that we should look into adding into Pipfile, if it isn't already included. |
From what I recall of the discussion, @cjerdonek had offered to sponsor this PEP (I'm not 100% sure, because the sponsor details haven't been added to the PEP yet) so he'll be better able to help you navigate the process, but I think that, as PEP author, the ball is really with you :-) From a quick skim to remind myself of where the discussion is at, I think that the debate on Discourse has died down. I'm not aware of any actions people are waiting on, so I think what you should probably do now is summarise the position, and list out what you think the next steps are, and how you think they should be progressed (remembering that this is all being done on volunteer time, so it's important to allow for people simply not having the free time to work on this). In my experience, PEPs do need a lot of work to "manage" the process, as otherwise they tend to die through lack of interest, rather than any inherent flaw in the proposal. |
@pfmoore He was actually giving me a friendly nudge because he's waiting on an email from me. :) I'm now co-authoring the PEP with him, and he was awaiting a reply from me confirming it's ready for submission. The list should see an update within the next couple days or so once I get to replying to his email.. |
Oh excellent - sorry for butting in in that case :-) The problems of things happening in a mixture of public and private forums. Sorry @sbidoul for the unwarranted assumption on my part. |
What's the problem this feature will solve?
Pip allows installing from VCS URL references such as git+https://github.com/org/repo@{ref}#egg=name.
When {ref} points to an immutable revision (ie a commit hash / sha in case of git),
it would help performance to cache the wheel that was built from that reference,
and subsequently reuse it.
Describe the solution you'd like
I'm willing to implement the PR myself.
My plan is to add a
is_immutable(cls, revision/link)
to theVersionControl
class.In the case of git it would check revision looks like a sha ('^[a-fA-F0-9]{40}$').
Then exploit it in caching mechanisms.
Potential issues:
with exactly 40 characters in a way that git could confuse with a a sha
in case of binary wheels (ie containing C code), the VCS URL may not contain enough information to ensure a reproducible build, as some build parameters may be environmental; in such cases the cache may trigger returning a wheel that otherwise would have to be rebuilt (eg for a different architecture): I'm not seeing any practical issue here either; if there would be a risk we can consider controlling the feature with a flag ([update] pip's wheel cache takes care of this as wheel names contain platform specific tagssuch as --(no)-vcs-cache
)Alternative Solutions
I currently have a pip wrapper that does something close, but I thought it would be useful to others, so I propose to implement it in pip itself.
Before attempting an implementation, I wanted to check if such a feature is of interest to pip maintainers, or if any of the potential issues would be considered blocking.
Additional context
N/A
The text was updated successfully, but these errors were encountered: