Lockfiles: determine if we need to post-process a resolve #12463

Eric-Arellano · 2021-07-29T22:53:42Z

Given that Python resolves can vary by environment (interpreter constraints, OS/arch), it's sometimes necessary to generate the lockfile in each environment that will be used: #12200

@jsirois proposed post-processing as a way to know when it is safe to skip multiple resolves:

We should do post-resolve processing of dist-info/METADATA and use Requires-Python and Requires-Dist metadata to exactly determine the breadth of validity of a lock.

If the resolve is not fully valid for the IC range and platforms, we can warn or error, such as instructing that you must generate on macOS and Linux and then merge.

--

Alternatively, we could pessimistically perform the resolve against every valid interpreter in the IC range (#12389), and merge into a single lockfile. For example, if the IC range is 3.6-3.8, then we know to generate w/ 3.6, 3.7, and 3.8; then we merge into a single lockfile by using env markers to handle conditional dependencies.

Pros and cons

Post-processing has two advantages over pessimistic generation:

It handles all env markers, including platform_version.
- We cannot pessimistically generate for all platforms because you can only run on the current machine. NB: I am not talking about the PEX flag --platform, where it is possible to do cross-machine resolves by only using wheels instead of sdists. I'm talking about generating a lockfile when --platform is not used and accounting for any of the deps having platform-specific env vars.
- Without us eagerly detecting platform valility, the user will have a worse UX where a lockfile generated on, say, macOS won't work for a Linux user. The user will need to figure out why this happened and know to regenerate it on the different platform.
Do not "cry wolf" when not necessary.
- For example, you have an IC range 3.6-3.10 and only have 3.10 on your machine. If a post-process shows that's okay, no need to error about missing 3.6-3.9 like pessimistic generation would do.

The main downside is the development and maintenance burden:

E.g. robustly translating Requires-Python and Requires-Dist metadata into actions like "Regenerate for 3.6 and 3.7"
pip-compile does not currently have a way to inspect the .whl files for this post-analysis. We'd either need to improve pip-compile or start using Pex for lockfile generation. We may want to do that anyways, see Lockfiles: Should we use pip-compile or teach Pex to generate lockfiles? #12470.

I first posited that there may be performance regressions, but @jsirois clarified this is not so. See the edit history + below discussion.

Possible actions

Implement now as a blocker to lockfile project.
Punt for after the lockfile project is complete.
Don't plan to implement.

The text was updated successfully, but these errors were encountered:

jsirois · 2021-07-29T23:09:18Z

Would require installing deps, rather than only downloading and resolving like pip-compile and Poetry both do.

This is false. You can read the metadata straight out of an sdist or whl. Where did the confidence for this assertion come from?

jsirois · 2021-07-29T23:10:49Z

Follow up resolves have to happen sequentially, after the first resolve was processed.

Why? They can happen in parallel to save time at the cost of trees if their results turn out to not be needed in the end after post resolve processing.

Eric-Arellano · 2021-07-30T17:17:09Z

This is false.

Oh, that's great! Thank you for clarifying.

Where did the confidence for this assertion come from?

I wasn't very confident on it, which was why I used a ? in Possible performance regressions? - but I see how that didn't express my lack of confidence here very explicitly.

Is this the PKG-INFO file for sdists, whereas it's METADATA for .whl files?

--

They can happen in parallel to save time at the cost of trees if their results turn out to not be needed in the end after post resolve processing.

True, although that would somewhat nullify this pro you had mentioned in review of #12389, right?

It's optimal in generating fewer lockfiles overall: we only need to resolve against other interpreter versions if actually necessary.

jsirois · 2021-07-30T18:00:00Z

That second quote is of yourself above in the description. I never said that.

Eric-Arellano · 2021-07-30T18:00:30Z

You said:

For some resolves, iterating the IC range won't be needed and post processing will prove that and avoid invalid failures / warnings.

jsirois · 2021-07-30T18:03:47Z

In the mentioned review I said:

Is post-processing resolves for wheel tags and environment markers actually triggered part of the plan? For some resolves, iterating the IC range won't be needed and post processing will prove that and avoid invalid failures / warnings.

The point there is avoiding invalid failures / warnings.

Say the range is covered by 3.7, 3.8 and 3.9.
I only have 3.8 and 3.9. I run the two resolves and post-process. That post-process confirms I don't need to have run 3.7 to generate a complete lockfile with no holes.

Eric-Arellano · 2021-07-30T18:37:15Z

The point there is avoiding invalid failures / warnings.

I see. I think I understand now: if we want, we can still pessimistically generate for every Python interpreter we discover. That need not change. What changes is crying wolf when we didn't need to, as right now the error in #12389 is overly eager.

Thank you for clarifying all of these points. I've updated the issue description to more accurately reflect the pros and cons. PTAL.

Eric-Arellano · 2021-07-30T18:45:49Z

So, now this is simply a prioritization question, it seems. I'm now convinced that post-processing is both important for correctness and a better UX of not crying wolf. The question becomes how important are those things, e.g. important enough that we either contribute to pip-compile or give up on it in favor of PEX generating lockfiles (#12470 (comment))

jsirois · 2021-07-30T19:14:22Z

The description edits lgtm, but there still an incorrect bit. Perhaps this was the bit that got things off the rails in the 1st place:

@jsirois proposed post-processing as a way to know when we need to do multiple resolves:

The post-processing is a way to know when we don't need to do any more resolves. I think you understand this now with the cry-wolf verbage you're using, which is correct.

jsirois · 2021-07-30T20:19:10Z

If you want to avoid confusing collisions, you might kill mention of sys_platform and instead use platform_version as your environment marker example. That can depend on the date the OS kernel was compiled! That would totally avoid the platform overloading here.

Eric-Arellano · 2021-07-30T23:10:45Z

Talked more about this w/ @jsirois in Slack. I've updated the issue again.

tl;dr: "pessimistic generation" should handle correctly IC ranges, but with two downsides:

Cannot eagerly error when there are platform-related env markers, meaning the resolve was platform-specific.
Will cry wolf, i.e. warn/error if certain interpreters are not installed on the machine even if it's not necessary.

We need to decide how to prioritize our engineering resources vs. the better UX and correctness we'd get.

If we stick with pip-compile, this is much harder to implement. If Pex generates lockfiles, it's more feasible. See #12470 for that discussion.

Eric-Arellano · 2021-08-04T23:53:20Z

Excellent. @asherf tried to use lockfiles for Toolchain and was confused why the lockfile generated on macOS failed in Linux CI. Apparently the dep keyring has the dependencies SecretStorage>=3.2; sys_platform == "linux" and jeepney>=0.4.2; sys_platform == "linux". Of course, that was in zero way intuitive to Asher that the solution was to generate on Linux instead of macOS.

So, yes, we technically could punt on post-processing, but we really shouldn't for us to call this project complete.

Eric-Arellano · 2021-08-06T21:19:28Z

Because Poetry already robustly handles environment markers (#12470 (comment)), I think post-processing is moot.

…inor_versions()` (#12515) Because Poetry already robustly handles generating a single lockfile that works with all requested Python interpreters (#12470 (comment)), there is no need for the "pessimistic generation" proposed in #12463. [ci skip-rust] [ci skip-build-wheels]

…-compile (#12549) **Disclaimer**: This is not a formal commitment to Poetry, as we still need a more rigorous assessment it can handle everything we need. Instead, this is an incremental improvement in that Poetry handles things much better than pip-compile. It gets us closer to the final result we want, and makes it much more ergonomic to use the experimental feature—like `generate_all_lockfiles.sh` now not needing any manual editing. But we may decide to switch from Poetry to something like pdb or Pex. -- See #12470 for why we are not using pip-compile. One of the major motivations is that Poetry generates lockfiles compatible with all requested Python interpreter versions, along with Linux, macOS, and Windows. Meaning, you no longer need to generate the lockfile in each requested environment and manually merge like we used to. This solves #12200 and obviates the need for #12463. -- This PR adds only basic initial support. If we do decide to stick with Poetry, some of the remaining TODOs: - Handle PEP 440-style requirements. - Hook up to `[python-setup]` and `[python-repos]` options. - Hook up to caching. - Support `--platform` via post-processing `poetry.lock`: #12557 - Possibly remove un-used deps/hashes to reduce supply chain attack risk: #12458 -- Poetry is more rigorous than pip-compile in ensuring that all interpreter constraints are valid, which prompted needing to tweak a few of our tools' constraints.

Eric-Arellano mentioned this issue Jul 30, 2021

Lockfiles: Should we use pip-compile or teach Pex to generate lockfiles? #12470

Closed

Eric-Arellano closed this as completed Aug 4, 2021

Eric-Arellano mentioned this issue Aug 5, 2021

[WIP] Generate lockfiles with each possible Python version #12389

Closed

Eric-Arellano mentioned this issue Aug 6, 2021

[internal] Remove unused InterpreterConstraints.partition_by_major_minor_versions #12515

Merged

Eric-Arellano mentioned this issue Aug 12, 2021

Use Poetry (for now) to generate lockfiles rather than pip-compile #12549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lockfiles: determine if we need to post-process a resolve #12463

Lockfiles: determine if we need to post-process a resolve #12463

Eric-Arellano commented Jul 29, 2021 •

edited

Loading

jsirois commented Jul 29, 2021

jsirois commented Jul 29, 2021

Eric-Arellano commented Jul 30, 2021

jsirois commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

jsirois commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

jsirois commented Jul 30, 2021

jsirois commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

Eric-Arellano commented Aug 4, 2021

Eric-Arellano commented Aug 6, 2021

Lockfiles: determine if we need to post-process a resolve #12463

Lockfiles: determine if we need to post-process a resolve #12463

Comments

Eric-Arellano commented Jul 29, 2021 • edited Loading

Pros and cons

Possible actions

jsirois commented Jul 29, 2021

jsirois commented Jul 29, 2021

Eric-Arellano commented Jul 30, 2021

jsirois commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

jsirois commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

jsirois commented Jul 30, 2021

jsirois commented Jul 30, 2021

Eric-Arellano commented Jul 30, 2021

Eric-Arellano commented Aug 4, 2021

Eric-Arellano commented Aug 6, 2021

Eric-Arellano commented Jul 29, 2021 •

edited

Loading