-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: centralize Pants lockfile validation metadata in a single file #14281
Comments
tl;dr: I think comments are essential in human-editable configuration files and would prefer to do the work to allow that, but could live with this proposal with some modifications if we have to go down this route. First, on the assumptions here:
Generally speaking, we should allow for comments in our lockfiles, either by making the So my summary of the problem is that PEX doesn't allow comments in lockfiles. I'd recommend point 2 (changing out the JSON parser in PEX to allow comments), but also I don't feel bad about implementing point 1 either as the solution to this, or as future work. Secondly, re the proposal itself: If we decide we need to separate the metadata and the lockfile itself, the metadata should also ensure that the lockfile was in fact generated from the input requirements specified in the separate config file: the status quo is that we validate the inputs are correct, this would be validating that the lockfile content based on the inputs. Probably the most reliable way to do that is to ensure that the lockfile is the same one as Pants generated. So we'd want to store a SHA digest of the lockfile output in the new metadata file, and have Pants issue a warning or error about the dangers of manually editing lockfiles. |
How do you propose handling the js world and package.json? |
|
My question: is this actually a big deal? And are there other downsides I'm missing to writing-then-stripping the header from the lockfile? ("wasted computation" seems irrelevant, this all should be fast.) |
We're definitely already reading the file twice in Python-land: once in Pants to extract the metadata, and once when pip consumes the lockfile. It wouldn't be the worst thing in the world to output a pre-processed lockfile as an intermediate build step. There'd be no extra read operations, and only one extra write. |
@chrisjrn if the answer for package.json is 1 then I'd need better justification to change the Pex format to a non-stdlib one since pants needs to support 1 or Eric's proposal to handle the js ecosystem anyhow. |
That makes sense to me for Pex to not consider Pants in determining which lockfile format to use. I personally find TOML more readable (less nesting), and I think it's useful it allows comments. So I'd encourage its use. But I know vendoring What I'm more interested in is confirming that it is acceptable for Pants to pre-process the lockfile before passing it to Pex. It seems fine to me to do. |
I really don't buy the comments / readability arguments since a lock file is not for editing. You do have to be able to read it, but you can honestly read the current Pex lock files just fine. N.B. tomli / tomli-w are not an option, Python 3 only for both. It would have to be toml. |
I think it's useful to have nice Git diffs with lockfiles. When you update a lockfile, you should be looking at what transitive deps have changed. TOML is particularly good at this imo because of how it handles nesting & whitespace. If we stick with json for PEX, we could use
Which is still unmaintained, although might get new life soon: uiri/toml#361 (comment) |
Yes, Pex already supports |
I don't like this idea of a centralized metadata file due to Chris's concerns about bad coupling. It also makes total sense why Pex wants to stick with JSON. Meaning that Pants will need to pre-process the lockfile. Not a very big deal. |
I don't feel comfortable using a file that's generateable and consumable by another tool without the right guardrails in place. Scenario: Pants support JS (:tada:) and is managing the lockfile. Some Pants-ignorant user's runs an |
Ahh I missed this:
That solves my concern. I'm +1 on this proposal. It ends all discussions about any particular tool's compatibility with Pants' needs. |
Here's a light brain dump on what this would entail:
|
I added a note about this to #18326. The support in that PR bypasses the problems here by only supporting dependencies defined in As soon as there are multiple inputs (or at least a pants-specific input beyond the ecosystem-native dep definition) that should be covered by the lockfile, that is when pants has to step in and start tracking metadata about the lockfile's generation to ensure manual regeneration does not exclude those pants-specific dependencies. This is an aspect about lockfiles that I hadn't considered, so I'm posting it as food for thought for others. Hopefully it can help us reach a resolution on the ideal cross-language way to handle the lockfile metadata and verification of the lockfile based on that metadata before we can trust that the lockfile is in a sane state / ready for use. |
Status quo
Currently, we store lockfile metadata in headers like this:
pants/3rdparty/python/lockfiles/flake8.txt
Lines 1 to 18 in 94cef33
@jsirois points out that this will be an issue with PEX and package-lock.json, which both use JSON. We will have to strip the headers at consumption time.
There are two concerning downsides to this:
.jsonc
to work around that, but that's invalid to callpackage-lock.json
a.jsonc
file.While we switched our proprietary JVM lockfile format to TOML to accommodate comments (#14175), we cannot strongarm every lock we support to be able to do this.
Proposal
Instead, we would need to start storing Pants metadata in dedicated, first-class file(s).
We could store one metadata file per lock, like
black.lock
andblack.lock.metadata
...but that would double the number of files you have to generate, where the metadata ones would also be very small. I believe this is more boilerplate than we want.Instead, I propose a single file used for lockfile validation, something like this:
(Note that we already check for ambiguity amongst all resolve names, so the keys will be unique.)
The text was updated successfully, but these errors were encountered: