Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] Expanding cc dependency inference to support include_directories #17738

Merged
merged 3 commits into from
Dec 13, 2022

Conversation

sureshjoshi
Copy link
Member

@sureshjoshi sureshjoshi commented Dec 7, 2022

First of many smaller PRs extracted from #16424.

EDIT (Dec 9)

After discussions in this PR, most of the contents were reverted - with the new idea being that mylib/include should be a source root - which will effectively act as an include_directory for cc.

New LevelDB example is:

⏺ oss/leveldb % ./pants_from_sources --source-root-patterns="['/', '/include']" filedeps --transitive db/c.cc                                                                                                                                                                                              

db/BUILD
db/c.cc
include/leveldb/BUILD
include/leveldb/c.h
include/leveldb/cache.h
include/leveldb/comparator.h
include/leveldb/db.h
include/leveldb/env.h
include/leveldb/export.h
include/leveldb/filter_policy.h
include/leveldb/iterator.h
include/leveldb/options.h
include/leveldb/slice.h
include/leveldb/status.h
include/leveldb/write_batch.h

Leaving content below for posterity


This PR expands cc dependency inference to support relative paths that may not be specified in an #include statement. Using the integration_test addition as an example - a file sits in mylib/include/mylib/public1.h but is referenced via mylib/public1.h.

This is a pretty common library structure to split public headers from implementation, so it seems reasonable to automatically infer on it. An alternative option is to specify mylib/include as a source root, or to explicitly specify the public headers.

"mylib/include/mylib/BUILD": "cc_sources()",
"mylib/include/mylib/public1.h": "int foo1() { return 1; }",
"mylib/include/mylib/public2.h": "int foo2() { return 2; }",
"mylib/src/BUILD": "cc_sources()",
"mylib/src/private1.h": "int bar1() { return 1; }",
"mylib/src/private2.h": "int bar2() { return 2; }",
"mylib/src/main.c": dedent(
    """\
    #include "mylib/public1.h"
    #include "mylib/public2.h"
    #include "private1.h"
    #include "private2.h"
    """
),

For a non-SJ-created example, refer to LevelDB:

Before PR

oss/leveldb % ./pants_from_sources filedeps --transitive db/c.cc

db/BUILD
db/c.cc

Works for them because of this in their CMakeLists.txt

include_directories(
  "${PROJECT_BINARY_DIR}/include"
  "."
)

After PR

oss/leveldb % ./pants_from_sources filedeps --transitive db/c.cc

db/BUILD
db/c.cc
include/leveldb/BUILD
include/leveldb/c.h
include/leveldb/cache.h
include/leveldb/comparator.h
include/leveldb/db.h
include/leveldb/env.h
include/leveldb/export.h
include/leveldb/filter_policy.h
include/leveldb/iterator.h
include/leveldb/options.h
include/leveldb/slice.h
include/leveldb/status.h
include/leveldb/write_batch.h

[ci skip-rust]
[ci skip-build-wheels]

@sureshjoshi sureshjoshi added the category:internal CI, fixes for not-yet-released features, etc. label Dec 7, 2022
@sureshjoshi sureshjoshi added the backend: CC CC backend-related issues label Dec 7, 2022
Comment on lines 87 to 89
TODO: Would a more "correct" implementation be to grab SourceRoots and use those to determine the relative path? Currently,
this implementation progresses up the file's parent chain until it finds a directory that matches an include_directory, and then
uses that as the relative path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that include_directories are potentially equivalent in spirit to source roots. Currently dependency inference for Python operates on all source roots (IIRC).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so that's what I did in the original PR: https://github.com/pantsbuild/pants/pull/16424/files#diff-6c96e874c9a3590582c7988d29f8584d67fb5da102218a4ad5da320d8143f5bc

So, the preferred version is searching relative to {source_root}/{include} instead of going up the parent chain until we hit {include}? Perfectly happy either way, since I can contrive examples that would break both, so 6 of one 🤷🏽

The underlying assumption here is that (at least) the top-level folder of the library/module is a source_root. If not, the whole thing falls over. If that's a valid precondition to enforce, I'm good. C/C++ doesn't really have a standard marker that would define a lib root, unfortunately - so it would need to be explicitly placed by the user.

Copy link
Contributor

@tdyas tdyas Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source roots can also be used for namespacing decisions (at least in Python). How would C/C++ code be impacted by having a source root for the include directory but still have source code "above" an include directory in a parent directory referencing those headers? Such parent directories are outside the source root for the include directory and would need to be in their own source root, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you'd need to make foo and foo/include be source roots versus just making foo be the source root because the foo/include directory can be viewed as associated with the foo source root.

Copy link
Member Author

@sureshjoshi sureshjoshi Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the folder structure: root/include/mylib/foo.h and #include mylib/foo.h

A source root of only root requires the additional code in this PR. A source root in include would already be captured via your original dep infer rules (so long as it's also in the AllTargets call).

We could enforce something like a source root per lib structure, but it really only matters when passing include_directories to compilation, and the rest of a project would be pretty unaffected.

Having said that, either the visibility feature, or source roots would be nice to provide constraints, to make sure a given file isn't inferred willy-nilly across an entire repo.

e.g. This feels like it would be spiritually incorrect without some explicit dep on the mylib1 target (but possible if we didn't use visibility or source root constraints). Or maybe we would want this to work automagically?

2 libraries

root/mylib1/include/mylib/foo.h
root/mylib1/src/foo.c -> #include "mylib/foo.h"
root/mylib2/src/bar.c -> #include "mylib/foo.h"

Without an explicit constraint, bar.c could easily pull in mylib1's foo.h, but should it? Without compiling foo.c, linking will fail.

This is related to a Slack convo with Benjy, about what should happen in a situation like this - and I think it's related to what you're asking about regarding whether/how we need to use source roots above the include as well.

cc @benjyw

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh, okay okay - yeah, so that's what Swift would do to. Thanks for clearing that up.

This is sounding more and more like an unnecessary PR then, with the CC docs needing to point out that source roots will essentially operate the same as include_directories would, and then call it a day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdyas You were right about the source files source root - I didn't realize StrippedFileNames requires a source root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that adds another wrinkle, I think - because relative paths shouldn't need the source roots (if they're acting as include directories), I don't think...

Going to need to re-think this a bit - so that gcc/clang semantics also line up with what Pants expects a bit better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand "relative paths shouldn't need the source roots"? if in foo/bar.cc you #include baz/qux.h the preprocessor will first look for foo/baz/qux.h, and then in dir/baz/qux.h for every dir provided via -I, and then for every dir in the system include dirs. If foo/baz/qux.h exists then it is easy to infer that it is the right file, no? If not, that's when you look on the source roots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a comment on the fact that the StrippedFileNames API uses source roots, which I didn't know. That API was used already in the dep inference. Tangential to the include_directories discussion.

@sureshjoshi sureshjoshi merged commit 4534618 into pantsbuild:main Dec 13, 2022
@sureshjoshi sureshjoshi deleted the cc-dep-inference-descendants branch December 13, 2022 02:25
@stuhood stuhood mentioned this pull request Dec 17, 2022
stuhood added a commit that referenced this pull request Dec 17, 2022
### Internal

* Get rid of `_ToolLockfileMixin`. ([#17823](#17823))

* Fix `build-support/bin/deploy_to_s3.py`. ([#17818](#17818))

* Upgrade setproctitle to 1.3.2 ([#17804](#17804))

* Document the release tag back-fill script. ([#17808](#17808))

* Add a release tag backfill script. ([#17806](#17806))

* Set up release tag mapping workflow. ([#17801](#17801))

* go: fix path lookup for copied header files ([#17798](#17798))

* go: set `replace_sandbox_root_in_args=True` since replacement is used ([#17797](#17797))

* go: set missing `Goal.environment_behavior` on a debug goal ([#17799](#17799))

* go: rename option to --go-test-coverage-packages ([#17795](#17795))

* A stub page for dep validation docs. ([#17788](#17788))

* Upgrade to newest remote execution proto ([#17786](#17786))

* Adding tests for `cc` dependency inference showing source roots as `include_directories` ([#17738](#17738))

* Fix validation of release PEXes. ([#17785](#17785))

* Unconditionally register export-codegen goal in pants.core ([#17782](#17782))

* go: allow exporting cgo codegen for third party packages ([#17770](#17770))

* Properly set `pex_root` when running python sources in-sandbox. ([#17750](#17750))

* Renames `output_files` and `output_directories` fields for `experimental_shell_command` ([#17744](#17744))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: CC CC backend-related issues category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants