Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockfiles: determine if we need to post-process a resolve #12463

Closed
Eric-Arellano opened this issue Jul 29, 2021 · 13 comments
Closed

Lockfiles: determine if we need to post-process a resolve #12463

Eric-Arellano opened this issue Jul 29, 2021 · 13 comments

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Jul 29, 2021

Given that Python resolves can vary by environment (interpreter constraints, OS/arch), it's sometimes necessary to generate the lockfile in each environment that will be used: #12200

@jsirois proposed post-processing as a way to know when it is safe to skip multiple resolves:

We should do post-resolve processing of dist-info/METADATA and use Requires-Python and Requires-Dist metadata to exactly determine the breadth of validity of a lock.

If the resolve is not fully valid for the IC range and platforms, we can warn or error, such as instructing that you must generate on macOS and Linux and then merge.

--

Alternatively, we could pessimistically perform the resolve against every valid interpreter in the IC range (#12389), and merge into a single lockfile. For example, if the IC range is 3.6-3.8, then we know to generate w/ 3.6, 3.7, and 3.8; then we merge into a single lockfile by using env markers to handle conditional dependencies.

Pros and cons

Post-processing has two advantages over pessimistic generation:

  1. It handles all env markers, including platform_version.
    • We cannot pessimistically generate for all platforms because you can only run on the current machine. NB: I am not talking about the PEX flag --platform, where it is possible to do cross-machine resolves by only using wheels instead of sdists. I'm talking about generating a lockfile when --platform is not used and accounting for any of the deps having platform-specific env vars.
    • Without us eagerly detecting platform valility, the user will have a worse UX where a lockfile generated on, say, macOS won't work for a Linux user. The user will need to figure out why this happened and know to regenerate it on the different platform.
  2. Do not "cry wolf" when not necessary.
    • For example, you have an IC range 3.6-3.10 and only have 3.10 on your machine. If a post-process shows that's okay, no need to error about missing 3.6-3.9 like pessimistic generation would do.

The main downside is the development and maintenance burden:

  • E.g. robustly translating Requires-Python and Requires-Dist metadata into actions like "Regenerate for 3.6 and 3.7"
  • pip-compile does not currently have a way to inspect the .whl files for this post-analysis. We'd either need to improve pip-compile or start using Pex for lockfile generation. We may want to do that anyways, see Lockfiles: Should we use pip-compile or teach Pex to generate lockfiles? #12470.

I first posited that there may be performance regressions, but @jsirois clarified this is not so. See the edit history + below discussion.

Possible actions

  1. Implement now as a blocker to lockfile project.
  2. Punt for after the lockfile project is complete.
  3. Don't plan to implement.
@jsirois
Copy link
Contributor

jsirois commented Jul 29, 2021

Would require installing deps, rather than only downloading and resolving like pip-compile and Poetry both do.

This is false. You can read the metadata straight out of an sdist or whl. Where did the confidence for this assertion come from?

@jsirois
Copy link
Contributor

jsirois commented Jul 29, 2021

Follow up resolves have to happen sequentially, after the first resolve was processed.

Why? They can happen in parallel to save time at the cost of trees if their results turn out to not be needed in the end after post resolve processing.

@Eric-Arellano
Copy link
Contributor Author

This is false.

Oh, that's great! Thank you for clarifying.

Where did the confidence for this assertion come from?

I wasn't very confident on it, which was why I used a ? in Possible performance regressions? - but I see how that didn't express my lack of confidence here very explicitly.

Is this the PKG-INFO file for sdists, whereas it's METADATA for .whl files?

--

They can happen in parallel to save time at the cost of trees if their results turn out to not be needed in the end after post resolve processing.

True, although that would somewhat nullify this pro you had mentioned in review of #12389, right?

It's optimal in generating fewer lockfiles overall: we only need to resolve against other interpreter versions if actually necessary.

@jsirois
Copy link
Contributor

jsirois commented Jul 30, 2021

That second quote is of yourself above in the description. I never said that.

@Eric-Arellano
Copy link
Contributor Author

You said:

For some resolves, iterating the IC range won't be needed and post processing will prove that and avoid invalid failures / warnings.

@jsirois
Copy link
Contributor

jsirois commented Jul 30, 2021

In the mentioned review I said:

Is post-processing resolves for wheel tags and environment markers actually triggered part of the plan? For some resolves, iterating the IC range won't be needed and post processing will prove that and avoid invalid failures / warnings.

The point there is avoiding invalid failures / warnings.

Say the range is covered by 3.7, 3.8 and 3.9.
I only have 3.8 and 3.9. I run the two resolves and post-process. That post-process confirms I don't need to have run 3.7 to generate a complete lockfile with no holes.

@Eric-Arellano
Copy link
Contributor Author

The point there is avoiding invalid failures / warnings.

I see. I think I understand now: if we want, we can still pessimistically generate for every Python interpreter we discover. That need not change. What changes is crying wolf when we didn't need to, as right now the error in #12389 is overly eager.

Thank you for clarifying all of these points. I've updated the issue description to more accurately reflect the pros and cons. PTAL.

@Eric-Arellano
Copy link
Contributor Author

So, now this is simply a prioritization question, it seems. I'm now convinced that post-processing is both important for correctness and a better UX of not crying wolf. The question becomes how important are those things, e.g. important enough that we either contribute to pip-compile or give up on it in favor of PEX generating lockfiles (#12470 (comment))

@jsirois
Copy link
Contributor

jsirois commented Jul 30, 2021

The description edits lgtm, but there still an incorrect bit. Perhaps this was the bit that got things off the rails in the 1st place:

@jsirois proposed post-processing as a way to know when we need to do multiple resolves:

The post-processing is a way to know when we don't need to do any more resolves. I think you understand this now with the cry-wolf verbage you're using, which is correct.

@jsirois
Copy link
Contributor

jsirois commented Jul 30, 2021

If you want to avoid confusing collisions, you might kill mention of sys_platform and instead use platform_version as your environment marker example. That can depend on the date the OS kernel was compiled! That would totally avoid the platform overloading here.

@Eric-Arellano
Copy link
Contributor Author

Talked more about this w/ @jsirois in Slack. I've updated the issue again.

tl;dr: "pessimistic generation" should handle correctly IC ranges, but with two downsides:

  • Cannot eagerly error when there are platform-related env markers, meaning the resolve was platform-specific.
  • Will cry wolf, i.e. warn/error if certain interpreters are not installed on the machine even if it's not necessary.

We need to decide how to prioritize our engineering resources vs. the better UX and correctness we'd get.

If we stick with pip-compile, this is much harder to implement. If Pex generates lockfiles, it's more feasible. See #12470 for that discussion.

@Eric-Arellano
Copy link
Contributor Author

Excellent. @asherf tried to use lockfiles for Toolchain and was confused why the lockfile generated on macOS failed in Linux CI. Apparently the dep keyring has the dependencies SecretStorage>=3.2; sys_platform == "linux" and jeepney>=0.4.2; sys_platform == "linux". Of course, that was in zero way intuitive to Asher that the solution was to generate on Linux instead of macOS.

So, yes, we technically could punt on post-processing, but we really shouldn't for us to call this project complete.

@Eric-Arellano
Copy link
Contributor Author

Because Poetry already robustly handles environment markers (#12470 (comment)), I think post-processing is moot.

Eric-Arellano added a commit that referenced this issue Aug 7, 2021
…inor_versions()` (#12515)

Because Poetry already robustly handles generating a single lockfile that works with all requested Python interpreters (#12470 (comment)), there is no need for the "pessimistic generation" proposed in #12463.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Aug 13, 2021
…-compile (#12549)

**Disclaimer**: This is not a formal commitment to Poetry, as we still need a more rigorous assessment it can handle everything we need. Instead, this is an incremental improvement in that Poetry handles things much better than pip-compile. 

It gets us closer to the final result we want, and makes it much more ergonomic to use the experimental feature—like `generate_all_lockfiles.sh` now not needing any manual editing. But we may decide to switch from Poetry to something like pdb or Pex.

--

See #12470 for why we are not using pip-compile. 

One of the major motivations is that Poetry generates lockfiles compatible with all requested Python interpreter versions, along with Linux, macOS, and Windows. Meaning, you no longer need to generate the lockfile in each requested environment and manually merge like we used to. This solves #12200 and obviates the need for #12463.

--

This PR adds only basic initial support. If we do decide to stick with Poetry, some of the remaining TODOs:

- Handle PEP 440-style requirements.
- Hook up to `[python-setup]` and `[python-repos]` options.
- Hook up to caching.
- Support `--platform` via post-processing `poetry.lock`: #12557
- Possibly remove un-used deps/hashes to reduce supply chain attack risk: #12458

--

Poetry is more rigorous than pip-compile in ensuring that all interpreter constraints are valid, which prompted needing to tweak a few of our tools' constraints.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants