Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a peek goal to easily view BUILD metadata from command line #11347

Merged
merged 16 commits into from
Jul 20, 2021

Conversation

jriddy
Copy link
Contributor

@jriddy jriddy commented Dec 19, 2020

Problem

It would be convenient to have pants able to export build metadata, either for human consumption or programmatic consumption. This seems to be the essence of issue #4861, which the PR addresses.

Solution

This PR adds a new peek goal, which allows us to look at the BUILD files for given targets.

@jriddy jriddy marked this pull request as ready for review December 19, 2020 20:03
@jriddy jriddy changed the title WIP: Implement a peek goal Provide a peek goal to easily view BUILD metadata from command line Dec 19, 2020
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, really cool work!

--

For raw output, I agree with the decision to simply print the whole BUILD file. Way too tricky to otherwise try to identify which part is applicable. And we hope/expect BUILD files to be fairly small thanks to dep inference.

However, I could see it getting confusing when you specify multiple targets/files, and mapping what belongs to what. Thoughts on detecting the case of >1 "spec", and logging a warning to consider using the JSON format? We do that with ./pants py-constraints:

if len(addresses) > 1:
merged_constraints_warning = (
"(These are the constraints used if you were to depend on all of the input "
"files/targets together, even though they may end up never being used together in "
"the real world. Consider using a more precise query or running "
"`./pants py-constriants --summary`.)\n"
)
output_stdout(indent(fill(merged_constraints_warning, 80), " "))

--

The JSON output is excellent: it shows the user what Pants's view of a particular file/target actually is. That transparency is great for debugging. We might want to make JSON the default?

I'd recommend two tweaks:

  1. Rather than alias, call it target_type, or something similar. "Alias" is an implementation detail.
  2. I think we want to render default values? That way, we show a complete view of what Pants is doing. It also allows for safer JSON queries, as you're guaranteed the key will exist. If it gets too chatty, we could consider an option like --ignore-defaults.

--

In #4861, we mentioned adding a table format in a followup. I'm not sure that would really be necessary - the mix of Raw and JSON is very helpful already.

--

In #4861, I pondered

Should [we] evaluate the sources and dependencies fields, or show the raw values?

After seeing this PR, I think no, we should not. ./pants dependencies already exists for deps, and ./pants filedeps for sources. It's redundant imo, and also non-trivial, to include those here, including a performance hit.

@Eric-Arellano
Copy link
Contributor

If you have a chance, it'd also be great to add some tests. You'd want to use Approach #3: Rule Runner, maybe also some normal unit tests (Approach #1).

Here's something to get you started:

# Copyright 2020 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

import pytest
from textwrap import dedent

from pants.core.target_types import Files
from pants.backend.project_info import peek
from pants.backend.project_info.peek import Peek
from pants.testutil.rule_runner import RuleRunner


@pytest.fixture
def rule_runner() -> RuleRunner:
    return RuleRunner(rules=peek.rules(), target_types=[Files])


def test_example(rule_runner: RuleRunner) -> None:
    rule_runner.add_to_build_file("project", "# A comment\nfiles(sources=[])")
    result = rule_runner.run_goal_rule(Peek, args=["project"])
    assert result.stdout == dedent(
        """\
        -------------
        project/BUILD
        -------------
        # A comment
        files(sources=[])

        """
    )

Note that we likely do want to do exact string matches, as formatting matters for Pants goals and we don't want to regress.

It would be good to test running on both a BUILD target address (like project:app) and a file address, like project/app.py or project/app.py:lib.

Copy link
Contributor Author

@jriddy jriddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, really cool work!

Thanks! Glad to finally be able to contribute!

For raw output, I agree with the decision to simply print the whole BUILD file. Way too tricky to otherwise try to identify which part is applicable. And we hope/expect BUILD files to be fairly small thanks to dep inference.

I played (a very small bit) with some inspection tools to find the applicable part, but it looked complicated. It could be an issue in its own right if we wanted to pursue it more.

However, I could see it getting confusing when you specify multiple targets/files, and mapping what belongs to what. Thoughts on detecting the case of >1 "spec", and logging a warning to consider using the JSON format? We do that with ./pants py-constraints:

I don't think a warning is necessary if we make JSON the default output, but I'll defer to your judgement on this.

The JSON output is excellent: it shows the user what Pants's view of a particular file/target actually is. That transparency is great for debugging. We might want to make JSON the default?

I'd recommend two tweaks:

1. Rather than `alias`, call it `target_type`, or something similar. "Alias" is an implementation detail.

2. I think we want to render default values? That way, we show a complete view of what Pants is doing. It also allows for safer JSON queries, as you're guaranteed the key will exist. If it gets too chatty, we could consider an option like `--ignore-defaults`.

I agree with these suggestions.

  • Make JSON the default output style
  • Rename alias in output to target_type
  • Show default values by default
  • Add --exclude-defaults flag (optional)

In #4861, we mentioned adding a table format in a followup. I'm not sure that would really be necessary - the mix of Raw and JSON is very helpful already.

Agreed. It's human-readable enough, and machine readable with things like jq

In #4861, I pondered

Should [we] evaluate the sources and dependencies fields, or show the raw values?

After seeing this PR, I think no, we should not. ./pants dependencies already exists for deps, and ./pants filedeps for sources. It's redundant imo, and also non-trivial, to include those here, including a performance hit.

Agreed, and if you really want that, jq + xargs are your friends:

./pants peek ... | jq -r ... | xargs ./pants filedeps

Although I admit that trying to come up with a good example jq query made me realize that it doesn't seem to play well with arbitrary keys like we're using, so perhaps making a list of objects is a better way forward. Any opinion on this, @Eric-Arellano ?


def default(self, o):
"""Return a serializable object for o."""
if isinstance(o, Requirement):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is the best way to handle objects we find in in build metadata. Will they all be meaningfully serializable? Should we just do our best and str() what we can't encode to JSON directly?

@jriddy
Copy link
Contributor Author

jriddy commented Dec 20, 2020

If you have a chance, it'd also be great to add some tests. You'd want to use Approach #3: Rule Runner, maybe also some normal unit tests (Approach #1).

I always intended to add tests, just didn't get around to it on the first pass, as I wanted to make sure the behavior was right before I started baking it in. Thanks for the template.

  • rule runner tests
  • unit tests

@jsirois
Copy link
Contributor

jsirois commented Dec 20, 2020

Although I admit that trying to come up with a good example jq query made me realize that it doesn't seem to play well with arbitrary keys like we're using, ...

FWIW jq supports arbitrary keys by quoting them (you can also index an object with a quoted key - .["..."]) so I don't think jq needs to be a strong constraint on the output format you choose:

$ echo '
{"foo": "bar"}
{"//super/strange/pants:key": {"baz": 42}}
{"eggs": "spam"}
' | jq '."//super/strange/pants:key".baz'
null
42
null

@jriddy
Copy link
Contributor Author

jriddy commented Dec 20, 2020

I've addressed most of these issues, except for the tests, which I'll start adding.

I do have one question left, which is the encoding of build files. I get them as bytes in the digest, but how do I know what encoding to use to decode those bytes back to a string. The encoding could theoretically vary depending on how the file was saved right? Or do you normalize to a particular encoding like utf-8 when converting to a digest?

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looking great!

@jriddy
Copy link
Contributor Author

jriddy commented Dec 22, 2020

Thanks for the feedback. I'll update according to the comments and add some real test cases, but I won't have time to do this until about the 24th or so. Since it's the holidays, I won't expect responses. Have a good one!

Base automatically changed from master to main March 19, 2021 19:20
jriddy added 3 commits July 17, 2021 08:03
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@jriddy
Copy link
Contributor Author

jriddy commented Jul 17, 2021

Sorry I left this dormant so long. I mostly just forgot about it (or had it at the very back of my mind) until @cczona pinged me to remind me about this, so thanks for that!

I've addressed the previous comments and added some tests. I couldn't find a much better way to test it than to just test raw output generation. although I could come up with some more scenarios with a generator maybe, but y'all don't seem to be using hypothesis or another generative testing tool, so I guess I'll leave it at some hand-written examples.

Let me know if there's anything left I need to do to make this cut the mustard.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! One small change

@Eric-Arellano Eric-Arellano requested review from benjyw and stuhood July 19, 2021 17:56
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great: thanks a lot for iterating on this! Will merge on green.

@stuhood stuhood merged commit b5781b5 into pantsbuild:main Jul 20, 2021
@Eric-Arellano Eric-Arellano mentioned this pull request Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants