Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pex3 lock create does not work with VCS requirements as input #1556

Closed
Eric-Arellano opened this issue Dec 22, 2021 · 18 comments · Fixed by #1687
Closed

pex3 lock create does not work with VCS requirements as input #1556

Eric-Arellano opened this issue Dec 22, 2021 · 18 comments · Fixed by #1687

Comments

@Eric-Arellano
Copy link
Contributor

This works:

pex 'ansicolors @ git+https://github.com/jonathaneunice/colors.git@c965f5b9103c5bd32a1572adb8024ebe83278fb0'

But this fails:

❯ pex3 lock create 'ansicolors @ git+https://github.com/jonathaneunice/colors.git@c965f5b9103c5bd32a1572adb8024ebe83278fb0'
Traceback (most recent call last):
  File "/Users/ericarellano/.local/bin/pex3", line 8, in <module>
    sys.exit(main())
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/cli/pex.py", line 29, in main
    result = catch(command.run)
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/commands/command.py", line 130, in catch
    return func(*args, **kwargs)
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/cli/command.py", line 71, in run
    return subcommand_func(self)
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/cli/commands/lock.py", line 199, in _create
    create(
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/cli/commands/lockfile/__init__.py", line 113, in create
    downloaded = resolver.download(
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/resolver.py", line 1097, in download
    build_requests, download_results = _download_internal(
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/resolver.py", line 1002, in _download_internal
    download_results = download_request.download_distributions(
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/resolver.py", line 134, in download_distributions
    return list(
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/jobs.py", line 510, in execute_parallel
    yield spawn_result.spawned_job.await_result()
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/jobs.py", line 214, in await_result
    job.wait()
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/jobs.py", line 73, in wait
    self._check_returncode(stderr)
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/pip.py", line 519, in _check_returncode
    result = analyzer.analyze(line)
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/pip.py", line 442, in analyze
    project_name_and_version, partial_artifact = self._extract_resolve_data(
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/pip.py", line 425, in _extract_resolve_data
    pin = Pin.canonicalize(ProjectNameAndVersion.from_filename(urlparse.urlparse(url).path))
  File "/Users/ericarellano/.local/pipx/venvs/pex/lib/python3.9/site-packages/pex/dist_metadata.py", line 237, in from_filename
    raise UnrecognizedDistributionFormat(
pex.dist_metadata.UnrecognizedDistributionFormat: The distribution at path '/jonathaneunice/colors.git@c965f5b9103c5bd32a1572adb8024ebe83278fb0' does not have a file name matching known sdist or wheel file name formats.
@jsirois
Copy link
Member

jsirois commented Dec 23, 2021

@Eric-Arellano the question is what is the desired outcome? I think a valid one is to more cleanly reject creating the lock since, afaict, the state of the art is no one does this currently. Of course another one, which will require more effort, is to advance the state of the art and hash cloned repo contents and verify against that. The final one I can think of is status quo for other lockers - don't actually record a hash for vcs urls at all and don't verify them as a consequence when using the lock later.

@cognifloyd
Copy link

Somehow the lock file should record that a package was installed from VCS. With git, branches and tags should be resolved to the commit hash. Otherwise the lockfile does not accurately represent the full set of dependencies.

@jsirois
Copy link
Member

jsirois commented Dec 25, 2021

@cognifloyd I agree with this from a Pex perspective. It should lock all or nothing. A partially repeatable / partially secure lock is no lock at all.

@jsirois
Copy link
Member

jsirois commented Dec 25, 2021

Note that git is only part of the story as well. Currently locking outright fails purposefully for local projects. Missing a similar outright failure for git was really just an oversight:
https://github.com/pantsbuild/pex/blob/9f4d40989f9213cfbe535fb48e6dd76e22a46738/pex/cli/commands/lockfile/__init__.py#L78-L105

So the current bias is to fail these locks outright. That said, local projects could work by hashing the contents of the local project source tree and requiring future lock consumption to observe the local project with the same project tree content hash.

@jsirois
Copy link
Member

jsirois commented Dec 25, 2021

Either way though, refusing to lock or accomodating project trees from vcs or local path, the current set of PyPIRequirement, URLRequirement, and LocalProjectRequirement will probably need to grow to support a VCSRequirement or some such.

@jsirois
Copy link
Member

jsirois commented Dec 25, 2021

One reality bomb on handling vcs requirements in locks is that top level requirements are ~easy in that you know them and could take the tack of pulling down all the top-level vcs repos and changing the requirements to local project requirements you pre-hash the project tree of or use the commit hash of. Although I've never seen one in the wild, you might have a distribution though whose requirements include vcs requirements, in other words a transitive vcs requirement Pex will know nothing about in advance. In that case Pex cannot pre-pull the repo at all and the only hope is enough structured information in the pip logs to post-pull the repo to determine its hash.

@jsirois
Copy link
Member

jsirois commented Jan 8, 2022

@Eric-Arellano ping on desired outcome. Pants is the customer here driving the initial feature and you're the proxy. To re-cap, there seem only 2 viable paths:

  1. Fail fast (with a dedicated error message) and reject locking vcs requirements just like local projects are currently rejected fast up front.
  2. Advance the state of the art and lock all VCS requirements (floating or not - and handle local projects as well along the way).

Note that 1 can be delivered and it does not preclude circling back around and adding support for VCS requirements and local projects later.

@jsirois
Copy link
Member

jsirois commented Jan 10, 2022

Once #1563 lands I'll clear the bug label but not close the issue unless that's deemed good enough for now. If it's not, this will get re-labelled as a feature request / enhancement.

@cognifloyd
Copy link

So, if pex refuses to create the lock file when VCS requirements are used, does that mean projects with VCS requirements will be unable to use lockfiles in pants?

@cognifloyd
Copy link

The project I'm trying to add pants + lockfiles is a partial-mono-repo. It has a variety of python packages in one main repo, but a selection of packages are developed in separate repos.
https://github.com/st2sandbox/st2/blob/pants/requirements.txt#L19-L23

Luckily the requirements for those external repos are all much more minor and none of them include VCS requirements.
For one repo, orquesta, the "lock" file uses a git commit hash during development. Then, just before a release the requirement is updated to use a tag.

If I can't use lockfiles, then pants loses a lot of its appeal. I want to replace the homegrown/painful mono-repo "lock" file (a list of manually pinned dependencies) with something that makes it easier to bump dependencies. I really don't like resolving the dependency list by hand so I can update the manually maintained "lock" file. (Yes, I have downloaded/opened wheels to inspect the metadata and figure out how far I can bump deps, or how far back I have to go to find something that works).

So, I hope that you can include VCS requirements (or at least git VCS requirements) in the lock. ie I hope you'll do:

Advance the state of the art and lock all VCS requirements

PS - I'm the pants user who reported issues with VCS locking to Eric.

@jsirois
Copy link
Member

jsirois commented Jan 10, 2022

So, if pex refuses to create the lock file when VCS requirements are used, does that mean projects with VCS requirements will be unable to use lockfiles in pants?

Yes. Today you ~can use lockfiles with VCS requirements in Pants, but they are a lie. See: pantsbuild/pants#14020

PS - I'm the pants user who reported issues with VCS locking to Eric.

OK. Its nice to have this sort of feedback written down. Particularly in issues since everyone can see them and they are not ephemeral like slack conversations which we lose after N messages.

jsirois added a commit that referenced this issue Jan 11, 2022
The current pex lock infrastructure cannot handle VCS requirements in
the same way it can't handle local project requirements. Unlike local
project requirements though, the failure to handle these requirements
was uncontrolled. Add explicit parsing of VCS requirements to allow them
to be singled out and rejected along side local project requirements.

This addresses the UX for the failure noted in #1556 but may not be the
final word there. If there is a desire to actually handle VCS
requirements in locks, the `VCSRequirement` parsing added here can be
expanded to extract the needed extra data (commit id certainly) to build
that support.

Closes #1562
@jsirois
Copy link
Member

jsirois commented Jan 11, 2022

Alright, bug fixed but leaving open as a feature request.

@jsirois jsirois removed their assignment Jan 11, 2022
@Eric-Arellano
Copy link
Contributor Author

Apologies for my delay in replying, John.

As discussed in meeting today: Pants needs to support VCS requirements and local requirements - we know we have several users relying on this feature. We also would like to (at least eventually) require using lockfiles because of their security importance; we're already doing that w/ JVM, and other tools like Poetry/Yarn/Cargo also have this requirement.

I at first thought incorrectly that a partial lock would be feasible - if you're using a VCS requirement or local requirement, it's not fully locked down and the best we can do is treat it like a normal requirement. You've explained that adding partial lock support would be a big effort, and it of course opens a big hole for security.

So, your state-of-the-art proposal on how to properly lock VCS + local projects seems like the best way forward.

@jsirois
Copy link
Member

jsirois commented Mar 7, 2022

We'll see how tricky this is. As a 1st cut I'll try just supporting git and maybe local projects that are git controlled as well.

@jsirois
Copy link
Member

jsirois commented Mar 7, 2022

@cognifloyd in your example, you have VCS requirements that are, on the face of it, mutable; i.e.: they do not reference commit ids, just branches or tags. Pex can handle this and pre-clone repos, grab commit ids and then hand off to Pip as a local project directory, writing down the original VCS url in the lock file and using the commit id for the hash.

That, though, falls apart if there are interior nodes in the dependency graph that likewise use VCS urls without commit ids. All Pex can do with these is find out about them after Pip has run and done the lock resolve. At that point Pex could clone the newly discovered interior node VCS urls to find out commit ids, but that is broken since the commit id for a branch may have changed in the time between the pip run and the lock post-processing.

It seems to me there are 3 choices here:

  1. Pex disallows interior node VCS requirements in locks completely.
  2. Pex allows VCS requirements anywhere in a lock, but only if the VCS url has a commit id.
  3. A hybrid of 1 allowing root VCS url nodes to be un-pinned and 2 requiring commit id pins for any interior nodes encountered.

Do you have opinions here?

@Eric-Arellano
Copy link
Contributor Author

It sounds like these 3 options are ordered in terms of required work to implement? I'm wondering if it makes sense to start with 1, along with requiring those roots to be pinned. Fwict, that's the easiest to implement, and it's also the strictest / least magical.

Then, if it turns out there are real-world use cases that either need unpinned roots or pinned interior nodes, add that as a follow-up enhancement.

along with requiring those roots to be pinned.

From Pants's perspective, I am comfortable requiring that Pants users pin by commit. I can't think of an instance where that will block a user - they can simply update the requirement when they want to change.

@cognifloyd
Copy link

I would like option 3:

A hybrid of 1 allowing root VCS url nodes to be un-pinned and 2 requiring commit id pins for any interior nodes encountered.

The VCS requirements we have go to smaller repos with fewer deps. The sub repos do not have any VCS requirement. So I think raising an error for any internal VCS nodes, or requiring commit IDs for them would be fine.

At least in StackStorm's code, targeting a branch but locking to a commit is exactly what I'd like to do, so I would prefer that work for top-level VCS dependencies if possible.

@jsirois
Copy link
Member

jsirois commented Mar 17, 2022

Alright, it turns out option 4 works, VCS urls of any form can be anywhere in the dependency graph and things work. It turns out I could just rely on Pip to handle vcs downloading and post process the resulting VCS zips it creates to get a hash of the contained source tree for a reproducible strong fingerprint regardless of commit id or no. Some cleanup work left, but end to end create VCS locks and then consume them is working for all 4 VCS systems supported by Pip: https://github.com/jsirois/pex/tree/issues/1556/end-run

jsirois added a commit that referenced this issue Mar 25, 2022
This supports all forms of VCS requirements Pip supports as direct or 
transitive requirements. VCS source archives fetched by Pip are sha256
hashed by Pex providing a stable & secure fingerprint even for VCS
requirements pointing to possibly mutable tags or branches or even
insecure commit ids.

Fixes #1556
Repository owner moved this from In Progress to Done in Python multiple user lockfiles Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants