Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pycross: Add patching support to py_wheel_library #1436

Merged
merged 27 commits into from
Dec 19, 2023

Conversation

philsc
Copy link
Contributor

@philsc philsc commented Sep 28, 2023

This patch adds a few arguments to py_wheel_library to simulate how
http_archive accepts patch-related arguments.

I also amended the existing test to validate the behaviour at a very
high level.

References: #1360

This patch adds a few arguments to `py_wheel_library` to simulate how
`http_archive` accepts patch-related arguments.

I also amended the existing test to validate the behaviour at a very
high level.

References: bazelbuild#1360
@philsc philsc requested a review from rickeylev as a code owner September 28, 2023 16:24
doc =
"A list of files that are to be applied as patches after " +
"extracting the archive. This will use the patch command line tool.",
),
"python_version": attr.string(
doc = "The python version required for this wheel ('PY2' or 'PY3')",
values = ["PY2", "PY3", ""],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should probably remove support for PY2, which is probably outside the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can submit a follow-up PR.

@philsc philsc requested a review from aignas November 1, 2023 14:57
Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

patch_args = [
"-p1",
],
patch_tool_target = ":unix_patcher",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead just use patch_tool? Right now we are spawning python with ctx.actions which spawns Python which spawns Python which spawns patch. If there is a reason for using Python, maybe it is worth a comment?

I see you want to test on Windows as well as UNIX and now Windows just does nothing for now. Consider just adding another py_wheel_library target, which does patching which is incompatible with windows. Then you would need an extra test file that would be checking the patched contents. IMHO the lines of code might be the same as having unix_patcher.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but wanted to mirror the arguments for http_archive here. In http_archive, the patch_tool argument behaves the same as I have it here. Since it's a repository rule, it cannot support a label for a binary target. So I decided to create a new attribute called patch_tool_target. At least that was my original reasoning for it. If you feel strongly, I could be convinced to rename this to patch_tool and rename the string-based version to patch_tool_cmd or something like that.

Comment on lines 144 to 151
doc = "The patch(1) utility from the host to use. " +
"If set, overrides `patch_tool_target`.",
),
"patch_tool_target": attr.label(
executable = True,
cfg = "exec",
doc = "The label of the patch(1) utility to use. " +
"Only used if `patch_tool` is not set.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we restrict users to only use patch(1) tool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is simply copied from the http_archive implementation. I'm happy to change the wording, but keeping it similar to http_archive seemed reasonable at the time at least. I think part of the reason for writing it as patch(1) explicitly is to imply the API (i.e. patch on stdin).
WDYT?

https://github.com/bazelbuild/bazel/blob/0226327c0ac03d818c211a718494c7c7cdc9bafc/tools/build_defs/repo/http.bzl#L327-L330

@philsc philsc requested a review from aignas December 7, 2023 18:36
Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that having patching as part of the build action is the best way. Patches in general can modify the METADATA that is used to generate dependencies and as you mentioned, if we split patching into a separate action, then we may end up with double the space in the build action cache.

If we introduce patching at this layer, then we will have to have a different type of patching for the dependencies. What is more, making this hermetic requires the user to go through extra hoops and just leaving it in the repository rule world and non-hermetic by construction segregates the responsibilities slightly better, IMHO.

That said, let's discuss as I am happy to be convinced that my view is wrong. :)

@philsc
Copy link
Contributor Author

philsc commented Dec 17, 2023

I am not sure that having patching as part of the build action is the best way. Patches in general can modify the METADATA that is used to generate dependencies and as you mentioned, if we split patching into a separate action, then we may end up with double the space in the build action cache.

If we introduce patching at this layer, then we will have to have a different type of patching for the dependencies.

You mean patching what the dependencies themselves are? If so, you are correct. That can be done by modifying the intermediate file itself. The same place where you introduce what patches to apply.
https://github.com/philsc/rules_python/blob/unreviewed/phil/pypi_install/examples/pypi_install/intermediate_file_patcher.py

What is more, making this hermetic requires the user to go through extra hoops and just leaving it in the repository rule world and non-hermetic by construction segregates the responsibilities slightly better, IMHO.

That does sound nice, but that's not really what my goal is. My goal is to cut the logic in repository rules to an absolute minimum. Everything else should go into actions. I.e. I really just want to download the wheel, set up some minimal BUILD files that specify the dependencies and such, and leave everything else to actions.

Having a non-hermetic patch tool in an action is really no worse than having a non-hermetic patch tool in repository context. Making the patching hermetic is easier in action context because you can depend on the output of other actions (e.g. a bazel-compiled patch binary).

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that having patching as part of the build action is the best way. Patches in general can modify the METADATA that is used to generate dependencies and as you mentioned, if we split patching into a separate action, then we may end up with double the space in the build action cache.
If we introduce patching at this layer, then we will have to have a different type of patching for the dependencies.

You mean patching what the dependencies themselves are? If so, you are correct. That can be done by modifying the intermediate file itself. The same place where you introduce what patches to apply. https://github.com/philsc/rules_python/blob/unreviewed/phil/pypi_install/examples/pypi_install/intermediate_file_patcher.py

Thanks for the link, this will give me a little bit of more context.

What is more, making this hermetic requires the user to go through extra hoops and just leaving it in the repository rule world and non-hermetic by construction segregates the responsibilities slightly better, IMHO.

That does sound nice, but that's not really what my goal is. My goal is to cut the logic in repository rules to an absolute minimum. Everything else should go into actions. I.e. I really just want to download the wheel, set up some minimal BUILD files that specify the dependencies and such, and leave everything else to actions.

Having a non-hermetic patch tool in an action is really no worse than having a non-hermetic patch tool in repository context. Making the patching hermetic is easier in action context because you can depend on the output of other actions (e.g. a bazel-compiled patch binary).

Having non-hermetic tools within the action leads to the results of that action go into the action cache, which may make the cached entries harder to trust. That said, we can mark the actions with tags like no-cache to ensure that they don't get cached, whereas in the repository_ctx we don't have this choice.

Anyway, I am pro having this merged and discussing more as we go.

@aignas aignas added this pull request to the merge queue Dec 19, 2023
Merged via the queue into bazelbuild:main with commit 6246b8e Dec 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants