Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockfiles: confirm Poetry can robustly handle --platform #12557

Closed
Eric-Arellano opened this issue Aug 12, 2021 · 4 comments
Closed

Lockfiles: confirm Poetry can robustly handle --platform #12557

Eric-Arellano opened this issue Aug 12, 2021 · 4 comments

Comments

@Eric-Arellano
Copy link
Contributor

Background

When --platform is used, every distribution must have pre-built wheels for the requested platforms. sdists don't work.

Ideally, the lockfile generation will fail to lock if there are any deps w/out compatible wheels. It would also only include hashes for the requested platforms (#12458).

Related: this requirement for pre-built wheels has caused issues when paired with [python-setup].resolve_all_constraints. If set, we first try to resolve the entire constraints file, even if it is a superset of the pex_binary or python_awslamda. Often, that fails because there will be deps without wheels, even though they are used. See #12222.

How Poetry handles platforms

Poetry lockfiles are intended to work on any platform (e.g. Windows, Linux, macOS), regardless of what platform the lockfile was generated on. This is done via static analysis of dependencies' environment markers, noting platform-specific dependencies like when ; sys_platform = 'darwin'.

As noted in #12458, this means that Poetry lockfiles may naively include hashes for files that do not get used at installation time.

Poetry does not currently have a way to force the platform for the project to a certain value.

Proposal

When generating a lockfile that uses --platform, post-process the poetry.lock as proposed in #12458 (comment) to a) validate that all required wheels are present, and b) remove any unused distributions, like sdists. Then, we'll either error eagerly or generate a lockfile that should work on the requested platforms, but nothing else.

To address #11222 ([python-setup].resolve_all_requirements), we will have multiple user lockfiles via #11165. For example, we could have a dedicated lockfile for each pex_binary that uses python_binary. (It's likely there's a more optimal design here, but as a baseline, one-lockfile-per-binary means #11222 should be moot.)

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Aug 12, 2021

@jsirois identified one edge case where this scheme would "cry wolf" when there really is a solution. If:

  1. you're using a dep w/ loose constraints,
  2. and that dep's latest version does not support your platform,
  3. and that dep's earlier releases within the range from 1 do support your platform

then we will error when we could have instead handled it gracefully. Pip's --only-binary :all: would have known to use an earlier version in the constraint.

Fortunately, there's a workaround: our error message can suggest the person either build and host the wheel themselves or see if the project supports the platform in other release versions. This is a worse UX, but small enough of an edge case it's likely okay.

@jsirois
Copy link
Contributor

jsirois commented Aug 13, 2021

It doesn't change this analysis, but it does prevent you from finding bugs, so I'd like to point out this assertion is almost certainly wrong:

This is done via static analysis of dependencies' environment markers, noting platform-specific dependencies like when ; sys_platform = 'darwin'.

If I were to write a platform agnostic locker I'd do two main things:

  1. Assume the metadata in one dist is the same as in every other dist published for a given version.
  2. Run a single resolve evaluating any environment markers encountered as True (i.e.: ignore environment markers) *.

Given the results of step 2, I'd fetch all other dist hashes at a given version (assumption 1 underwrites this) and finally write down pinned requirements adding merged environment markers found walking dist metadata from step 2.

The main thrust being step 2 - ignore environment markers when resolving to get a platform agnostic maximal resolve. The notion that underwrites this is that dependencies with an environment markers clause are purely additive. Only when in the matching environment, add this requirement.

* That notion though is false! Most of the time it's true, but, for example:

My resolve encounters a dist with a requirement on pytest. For Python 2.7 and 3.5, say, it's on the pytest 4 series, but for Python 3.6+ it's on the 6 series:

pytest==4.6.11; python_version == '2.7' or python_version == '3.5'
pytest==6.2.4; python_version >= '3.6' and python_version < '3.10'

Here the requirement is not additive but bifurcates the resolve. I tested out poetry's handling of this case and it has a bug where the export of the resulting lock file contains just the 6.2.4 req with the merged env markers from the 4.6.11 req and the 6.2.4 req, which is incorrect.

@jsirois
Copy link
Contributor

jsirois commented Aug 13, 2021

The Poetry issue for the above: python-poetry/poetry#4381

Eric-Arellano added a commit that referenced this issue Aug 13, 2021
…-compile (#12549)

**Disclaimer**: This is not a formal commitment to Poetry, as we still need a more rigorous assessment it can handle everything we need. Instead, this is an incremental improvement in that Poetry handles things much better than pip-compile. 

It gets us closer to the final result we want, and makes it much more ergonomic to use the experimental feature—like `generate_all_lockfiles.sh` now not needing any manual editing. But we may decide to switch from Poetry to something like pdb or Pex.

--

See #12470 for why we are not using pip-compile. 

One of the major motivations is that Poetry generates lockfiles compatible with all requested Python interpreter versions, along with Linux, macOS, and Windows. Meaning, you no longer need to generate the lockfile in each requested environment and manually merge like we used to. This solves #12200 and obviates the need for #12463.

--

This PR adds only basic initial support. If we do decide to stick with Poetry, some of the remaining TODOs:

- Handle PEP 440-style requirements.
- Hook up to `[python-setup]` and `[python-repos]` options.
- Hook up to caching.
- Support `--platform` via post-processing `poetry.lock`: #12557
- Possibly remove un-used deps/hashes to reduce supply chain attack risk: #12458

--

Poetry is more rigorous than pip-compile in ensuring that all interpreter constraints are valid, which prompted needing to tweak a few of our tools' constraints.
@Eric-Arellano
Copy link
Contributor Author

Superseded by #12568. It seems that yes, Poetry could handle --platform if we did post-processing, but it would suffer from the above issue with "crying wolf" when not necessary, and several other Poetry issues like the bifurcated requirements bug found by John above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants