Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't use triplet file hash. Hash variables instead #596

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

autoantwort
Copy link
Contributor

Comment on lines +159 to +161
print_variables("15fe3605-2429-46b9-b6b1-c5c008621f79")
vcpkg_triplet_file(${VCPKG_TRIPLET_ID})
print_variables("9dbc61d9-e274-40b4-b38a-4d2d3fe91341")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to probe the environment before and after the triplet.

Copy link
Contributor Author

@autoantwort autoantwort Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but for example you don't want to have the PATH in the abi. Another solution would be that you have to set a local variable to the value of the environment variable.

Do you have an example in which a environment variable should be included?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include whatever the triplet sets. So the only way to do that is comparing before/after and filtering unrelevant variables out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem would be that if a user sets a env variable outside of the triplet and uses this variable inside a port, this would be not caught (But this is probably impossible to detect. Except we include all env vars in the hash, but that would make the binarycache useless).

You need to include whatever the triplet sets. So the only way to do that is comparing before/after and filtering unrelevant variables out.

And if someone appends something to the PATH we should only hash the newly added stuff.

And if I look at
https://github.com/Neumann-A/my-vcpkg-triplets/blob/99d32a61fe890db22cc668fb4327891aa42f2699/armabi-v7a.cmake#L6
we should remove ${CMAKE_CURRENT_LIST_DIR} from the variable values before hashing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/microsoft/vcpkg/pull/15053/files#diff-e1605373cd9d4822d5c5db0dbeef7008e0657895640b0639f4a3ca5e72d40407 has code to extract changes in the environment from a batch script. You basically need something similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can use the code I already have. What I mean is that for example here:
https://github.com/Neumann-A/my-vcpkg-triplets/blob/99d32a61fe890db22cc668fb4327891aa42f2699/x64-windows-llvm.port.cmake#L22
you only want ${POSSIBLE_LLVM_BIN_DIR} gets hashed and not the whole ENV{PATH}

Copy link
Contributor Author

@autoantwort autoantwort Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And POSSIBLE_LLVM_BIN_DIR gets hashed anyways. Hm, do we have an example where it would be really necessary to hash changes of the env?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need something similar to VCPKG_ENV_PASSTHROUGH_UNTRACKED.

Something like VCPKG_TRIPLET_VARS_UNTRACKED and VCPKG_TRIPLET_ENV_VARS_(UN)TRACKED to explicitly enable/disable tracking of some (env) vars. Currently vcpkg just tracks everything from the triplet

really necessary to hash changes to the env?

Setting any variable which is related to building but not detected by the compiler detection: AR/ASM/FC

Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my ask for testing of a new parser, I think this from a code perspective is fine.

I would like to see @ras0219 / @ras0219-msft comment on what the product / ux implications may be before merging.

@@ -318,6 +344,51 @@ endfunction()
block_end = std::find(block_start, port_end, BLOCK_END_GUID);
}

if (opt_triplet_hashes)
{
auto parse_variables = [port_start, port_end](StringView start_guid, auto callback) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New parser should be extracted and tested. (And maybe use ParserBase et al. but since this is parsing our own content I don't think that's strictly necessary)

@@ -318,6 +344,51 @@ endfunction()
block_end = std::find(block_start, port_end, BLOCK_END_GUID);
}

if (opt_triplet_hashes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (opt_triplet_hashes)
if (auto triplet_hashes = opt_triplet_hashes.get())

hasher->add_bytes(key.begin(), key.end());
hasher->add_bytes(value.begin(), value.end());
});
opt_triplet_hashes.get()->push_back(hasher->get_hash());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opt_triplet_hashes.get()->push_back(hasher->get_hash());
triplet_hashes->push_back(hasher->get_hash());

@ras0219-msft
Copy link
Contributor

I haven't dug 100% into the specifics, but I see one very large obstacle blocking this direction: paths in variables. Similar to my comment at microsoft/vcpkg#19984 (comment), I don't see how this approach can be implemented in a way that doesn't make the triplets incredibly machine- and project-location-specific.

I'm also concerned about making the binary caching system even more opaque for users.

Consider values like VCPKG_CXX_FLAGS or VCPKG_VISUAL_STUDIO_PATH, which are almost certain to contain paths. With the current system, it's very simple to understand whether or not those will affect the package abi -- is the triplet file the same or different? It's also therefore reasonably simple to understand how to choose whether the absolute path should affect the abi or not -- just use VCPKG_ENV_PASSTHROUGH_UNTRACKED to bring in an environment variable and expand that if you want it "hidden".

My counter-proposal to solve #19984 is to eventually introduce a more declarative (json) format for triplets with a narrower set of capabilities than CMake that can be precisely analyzed. It would be structurally clear which parts of the document apply to what ports and it would be possible to "see through" variable expansions to determine semantically significant substitions.

Unfortunately, triplet file invalidation is not very high on our (the vcpkg core team) backlog of problems to solve, compared to bug fixing, docs improvements, artifacts, and PR review.

@Neumann-A
Copy link
Contributor

With the current system, it's very simple to understand whether or not those will affect the package abi -- is the triplet file the same or different?

No its not. It doesn't consider include() which I actively use to undermine the triplet hash to deal with per port customization in manifest mode without nuking everything which is unaffected. Of course this leads to inconsistencies in the binary hashes which is kind of bad but vcpkg doesn't provide a proper solution to deal with that.
VCPKG_VISUAL_STUDIO_PATH is also a bad example since the compiler hash will be affected by that any way so hashing that is also kind of unnecessary.

For me the problem seems to be string(APPEND z_vcpkg_output "${z_vcpkg_variable_name}=${${z_vcpkg_variable_name}}56732a63-363f-4326-8eac-738ed6390780") which returns the actual value of the variable instead of the "intended" or "set" value. If we however run cmake in trace mode (--trace --trace-format=json-v1) we can get output like:

{"args":["BLA","somevalue"],"cmd":"set","file":"E:/vcpkg_folders/bla.cmake","frame":1,"line":2,"time":1660201353.3553562}
{"args":["BLA_PATH","$ENV{PATH}"],"cmd":"set","file":"E:/vcpkg_folders/bla.cmake","frame":1,"line":3,"time":1660201353.3563538}

Tracing all set/unset commands a list of triplet variables can be obtained without absolute paths and the correct hashing even if other files are included. There are no absolute paths since those don't get expanded from the variables and if there is an absolute path vcpkg can at least warn about it and its consequences (since then it is explicitly set as an absolute path).

vcpkg could use --trace-redirect= to already output the trace into another file.

I think this way is way better than the current approach since e.g. simple formatting changes in an if statement within the triplet won't affect everything anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[vcpkg|binarycache] don't use triplet file hash. Hash variables instead.
4 participants