-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pex3 lock create
does not work with VCS requirements as input
#1556
Comments
@Eric-Arellano the question is what is the desired outcome? I think a valid one is to more cleanly reject creating the lock since, afaict, the state of the art is no one does this currently. Of course another one, which will require more effort, is to advance the state of the art and hash cloned repo contents and verify against that. The final one I can think of is status quo for other lockers - don't actually record a hash for vcs urls at all and don't verify them as a consequence when using the lock later. |
Somehow the lock file should record that a package was installed from VCS. With git, branches and tags should be resolved to the commit hash. Otherwise the lockfile does not accurately represent the full set of dependencies. |
@cognifloyd I agree with this from a Pex perspective. It should lock all or nothing. A partially repeatable / partially secure lock is no lock at all. |
Note that git is only part of the story as well. Currently locking outright fails purposefully for local projects. Missing a similar outright failure for git was really just an oversight: So the current bias is to fail these locks outright. That said, local projects could work by hashing the contents of the local project source tree and requiring future lock consumption to observe the local project with the same project tree content hash. |
Either way though, refusing to lock or accomodating project trees from vcs or local path, the current set of PyPIRequirement, URLRequirement, and LocalProjectRequirement will probably need to grow to support a VCSRequirement or some such. |
One reality bomb on handling vcs requirements in locks is that top level requirements are ~easy in that you know them and could take the tack of pulling down all the top-level vcs repos and changing the requirements to local project requirements you pre-hash the project tree of or use the commit hash of. Although I've never seen one in the wild, you might have a distribution though whose requirements include vcs requirements, in other words a transitive vcs requirement Pex will know nothing about in advance. In that case Pex cannot pre-pull the repo at all and the only hope is enough structured information in the pip logs to post-pull the repo to determine its hash. |
@Eric-Arellano ping on desired outcome. Pants is the customer here driving the initial feature and you're the proxy. To re-cap, there seem only 2 viable paths:
Note that 1 can be delivered and it does not preclude circling back around and adding support for VCS requirements and local projects later. |
Once #1563 lands I'll clear the bug label but not close the issue unless that's deemed good enough for now. If it's not, this will get re-labelled as a feature request / enhancement. |
So, if pex refuses to create the lock file when VCS requirements are used, does that mean projects with VCS requirements will be unable to use lockfiles in pants? |
The project I'm trying to add pants + lockfiles is a partial-mono-repo. It has a variety of python packages in one main repo, but a selection of packages are developed in separate repos. Luckily the requirements for those external repos are all much more minor and none of them include VCS requirements. If I can't use lockfiles, then pants loses a lot of its appeal. I want to replace the homegrown/painful mono-repo "lock" file (a list of manually pinned dependencies) with something that makes it easier to bump dependencies. I really don't like resolving the dependency list by hand so I can update the manually maintained "lock" file. (Yes, I have downloaded/opened wheels to inspect the metadata and figure out how far I can bump deps, or how far back I have to go to find something that works). So, I hope that you can include VCS requirements (or at least git VCS requirements) in the lock. ie I hope you'll do:
PS - I'm the pants user who reported issues with VCS locking to Eric. |
Yes. Today you ~can use lockfiles with VCS requirements in Pants, but they are a lie. See: pantsbuild/pants#14020
OK. Its nice to have this sort of feedback written down. Particularly in issues since everyone can see them and they are not ephemeral like slack conversations which we lose after N messages. |
The current pex lock infrastructure cannot handle VCS requirements in the same way it can't handle local project requirements. Unlike local project requirements though, the failure to handle these requirements was uncontrolled. Add explicit parsing of VCS requirements to allow them to be singled out and rejected along side local project requirements. This addresses the UX for the failure noted in #1556 but may not be the final word there. If there is a desire to actually handle VCS requirements in locks, the `VCSRequirement` parsing added here can be expanded to extract the needed extra data (commit id certainly) to build that support. Closes #1562
Alright, bug fixed but leaving open as a feature request. |
Apologies for my delay in replying, John. As discussed in meeting today: Pants needs to support VCS requirements and local requirements - we know we have several users relying on this feature. We also would like to (at least eventually) require using lockfiles because of their security importance; we're already doing that w/ JVM, and other tools like Poetry/Yarn/Cargo also have this requirement. I at first thought incorrectly that a partial lock would be feasible - if you're using a VCS requirement or local requirement, it's not fully locked down and the best we can do is treat it like a normal requirement. You've explained that adding partial lock support would be a big effort, and it of course opens a big hole for security. So, your state-of-the-art proposal on how to properly lock VCS + local projects seems like the best way forward. |
We'll see how tricky this is. As a 1st cut I'll try just supporting git and maybe local projects that are git controlled as well. |
@cognifloyd in your example, you have VCS requirements that are, on the face of it, mutable; i.e.: they do not reference commit ids, just branches or tags. Pex can handle this and pre-clone repos, grab commit ids and then hand off to Pip as a local project directory, writing down the original VCS url in the lock file and using the commit id for the hash. That, though, falls apart if there are interior nodes in the dependency graph that likewise use VCS urls without commit ids. All Pex can do with these is find out about them after Pip has run and done the lock resolve. At that point Pex could clone the newly discovered interior node VCS urls to find out commit ids, but that is broken since the commit id for a branch may have changed in the time between the pip run and the lock post-processing. It seems to me there are 3 choices here:
Do you have opinions here? |
It sounds like these 3 options are ordered in terms of required work to implement? I'm wondering if it makes sense to start with 1, along with requiring those roots to be pinned. Fwict, that's the easiest to implement, and it's also the strictest / least magical. Then, if it turns out there are real-world use cases that either need unpinned roots or pinned interior nodes, add that as a follow-up enhancement.
From Pants's perspective, I am comfortable requiring that Pants users pin by commit. I can't think of an instance where that will block a user - they can simply update the requirement when they want to change. |
I would like option 3:
The VCS requirements we have go to smaller repos with fewer deps. The sub repos do not have any VCS requirement. So I think raising an error for any internal VCS nodes, or requiring commit IDs for them would be fine. At least in StackStorm's code, targeting a branch but locking to a commit is exactly what I'd like to do, so I would prefer that work for top-level VCS dependencies if possible. |
Alright, it turns out option 4 works, VCS urls of any form can be anywhere in the dependency graph and things work. It turns out I could just rely on Pip to handle vcs downloading and post process the resulting VCS zips it creates to get a hash of the contained source tree for a reproducible strong fingerprint regardless of commit id or no. Some cleanup work left, but end to end create VCS locks and then consume them is working for all 4 VCS systems supported by Pip: https://github.com/jsirois/pex/tree/issues/1556/end-run |
This supports all forms of VCS requirements Pip supports as direct or transitive requirements. VCS source archives fetched by Pip are sha256 hashed by Pex providing a stable & secure fingerprint even for VCS requirements pointing to possibly mutable tags or branches or even insecure commit ids. Fixes #1556
This works:
But this fails:
The text was updated successfully, but these errors were encountered: