Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow build plan to also emit file inputs per invocation #6213

Closed
wants to merge 15 commits into from

Conversation

Xanewok
Copy link
Member

@Xanewok Xanewok commented Oct 23, 2018

This adds another mode for the build plan (cargo build --build-plan=detailed ...), which differs from the current --build-plan such that:

  • Additionally collects file dependencies per each invocation (main motivation)
  • rustc invocations are altered to include built native deps or more build-script-time-detected args (e.g. serde rustc version-gated --cfg flags)
  • Actually has to perform work akin to cargo build if necessary

Example --build-plan and --build-plan=detailed diff on Cargo itself

The main motivation is make the build system integration easier. With file dependencies more clear it'll be easier to automatically translate the plan to other build system rules or to reason about about artifact freshness (currently RLS would also benefit by being able to accurately answer the 'given these dirty files, what do I need to rerun' question for a cargo check equivalent build plan).

(I imagine the next step would be to somehow (somewhat) accurately detect build script output (sandboxed? special API?) to better reason about build scripts in general.)

With regards to the actual changes, I couldn't shake off the feeling that the changes made are too bolt-on. While I think tracking which commands are being prepared for the actual work should be on the same code path as a regular build, it seems hacky to:

  • treat every work to be as Freshness::Dirty
  • pass is the work fresh at construction time status to the work-preparing function to possibly skip them while actually executing the work
    (especially since the work can turn out to be dirty if it depends on a dirty one during job execution)

which effectively acts as an another layer of freshness tracking (all of this applies only to the --build-plan=detailed mode).

r? @alexcrichton

@Xanewok
Copy link
Member Author

Xanewok commented Oct 24, 2018

Oops, didn't test on stable:

   Compiling foo v0.5.0 (/Users/travis/build/rust-lang/cargo/target/cit/t266/foo)
error[E0658]: mod statements in non-mod.rs files are unstable (see issue #44660)
 --> src/module1.rs:1:5
  |
1 | mod nested;
  |     ^^^^^^
  |
  = help: on stable builds, rename this file to module1/mod.rs
error: aborting due to previous error

I'll fix this today.

@Xanewok
Copy link
Member Author

Xanewok commented Oct 25, 2018

So this PR combines two things:

  • Detects file inputs to rustc calls
  • Executes build scripts to detect their side-effects

While both are somewhat orthogonal to each other, currently script execution is necessary for the first item.

Leaving environment variables as another form of input/output aside, if it'd be possible for:

  • rustc to correctly emit dep-info despite some missing files (and --extern libs?)
  • build scripts to declare their outputs in the manifest before running

then we wouldn't have to execute anything to learn about file input/output dependencies (sans extra -L ... or other flags added by build scripts but that's what the current --build-plan does atm as well).

The only edge case I can think of is when compiling a crate where rustc expects an entire (nested?) module missing at compile-time, which is actually generated only by the build script. However, if we treat direct dependency outputs as inputs to a given rule, then I believe we still end up with a correct freshness tracking.

I'll see if it'd be feasible to make rustc not error out on missing files in --emit=dep-info-only mode.

cc @jsgf wrt build system integration (+ build.rs shenanigans)

@nrc actually this run and report complete build plan for cargo check would be useful for RLS if we decide to stop using Cargo-the-library in-process

@bors
Copy link
Contributor

bors commented Oct 31, 2018

☔ The latest upstream changes (presumably #6236) made this pull request unmergeable. Please resolve the merge conflicts.

@Xanewok
Copy link
Member Author

Xanewok commented Nov 5, 2018

Got one error on OSX:

---- fix::fix_path_deps stdout ----
running `/Users/travis/build/rust-lang/cargo/target/debug/cargo fix --allow-no-vcs -p foo -p bar`
thread 'fix::fix_path_deps' panicked at '
Expected: execs
    but: differences:
  1 - |[FIXING] bar/src/lib.rs (1 fix)|
    + |    Checking foo v0.1.0 (/Users/travis/build/rust-lang/cargo/target/cit/t564/foo)|
  2 - |[CHECKING] foo v0.1.0 ([..])|
    + |      Fixing bar/src/lib.rs (1 fix)|

hopefully it's spurious and related to #6236; rebased again.

@Xanewok
Copy link
Member Author

Xanewok commented Nov 6, 2018

Hm, the time regression (1 hr AppVeyor timeout) seems unlikely? I believe cloning is the most expensive operation but I think it shouldn't impact it this much? I'll retrigger CI and will try to optimize it.

@Xanewok Xanewok closed this Nov 6, 2018
@Xanewok Xanewok reopened this Nov 6, 2018
@alexcrichton
Copy link
Member

Gah sorry for the delay here, I inended to take a closer look much sooner!

At a high level, it sounds like this is basically executing cargo check, right? Although instead of "Checking ..." we're just executing rustc --emit dep-info, but otherwise build scripts are always executed to completion (as well as procedural macros compiled and such).

Looking this over it does also feel sort of bolted on, and I'm not sure I was able to quite follow why a Dirty freshness needed to be forced in a few locations? I think this can be implemented at a high level by "execute cargo check" and then inspect all the internal data structures afterwards to figure out what all files were used as input.

I think after that it'd be a perf optimization to not run rustc --emit metadata but rather just rustc --emit dep-info.

Would something like that be possible? Or was that already attempted and I should ready the patch more closely? (which is a totally valid!)

@bors
Copy link
Contributor

bors commented Nov 7, 2018

☔ The latest upstream changes (presumably #6270) made this pull request unmergeable. Please resolve the merge conflicts.

@Xanewok
Copy link
Member Author

Xanewok commented Nov 14, 2018

I'm sorry as well, I intended to reply earlier.

At a high level, it sounds like this is basically executing cargo check, right?

Uh, to be fair it's worse than that - it's basically running entire cargo build (the --build-plan=detailed variant). The reason for that is, I wanted the plan to be accurate for the cargo build routine and was too hesitant to mess with output files and tricking Cargo into rmeta compilation, passing those updated crates as new --extern but still reporting the usual rlib files and so on.
I figured it'd be a good starting point and we could optimize this further later on.

Looking this over it does also feel sort of bolted on, and I'm not sure I was able to quite follow why a Dirty freshness needed to be forced in a few locations?

The reason for that is the current architecture - when preparing work, we build 2 Work closures to be executed if a given unit is determined to be either fresh or dirty. The problem is, if the work is detected to be fresh (e.g. a second run cargo build run, with --build-plan or not) then we currently won't even prepare an appropriate invocation and we'll have a default, empty invocation for fresh units.

To not regress the fresh case I didn't want to modify them to always build the invocation/work even if it's fresh and so what I did is follow the current solution of treating every work as Dirty if we build with --build-plan=detailed but optionally skip the real work (executing rustc, running build script binaries).

I think this can be implemented at a high level by "execute cargo check" and then inspect all the internal data structures afterwards to figure out what all files were used as input.

Yup, that's basically what I'm trying to do here for the input files! Moved the build plan emission after collecting necessary dep-info data (https://github.com/rust-lang/cargo/pull/6213/files#diff-29d98a4a0b448960d02ef89415610e9d, relevant call) but there's that small problem with fresh work, as per above.

I think after that it'd be a perf optimization to not run rustc --emit metadata but rather just rustc --emit dep-info.

Unfortunately, I've hit a wall with that - here's a small note on my progress with that. The main blocker is that I think we still need external metadata to fully resolve conditional compilation and that's needed to precisely determine the set of files being required for the compilation. If RFC #2523 is merged as-is with #[cfg(accessible($path))] being able to be used on paths not only originating from std then we'd need the dep metadata compiled before we can resolve the inputs.

I hope I'm not overthinking this, though!

@alexcrichton
Copy link
Member

Hm ok I'm trying to tease apart some of the various pieces here. One (probably longstanding at this point) problem is that the build plan is produced as a side effect of compilation rather than directly, which means that the freshness bits feel pretty wonky. There's a lot of logic in Cargo's production of units of work to figure out what to actually execute, but the current code is primarily intended for incremental compilation (in perhaps not the best fashion) rather than producing a full-fledged build plan which was sort of bolted on after the fact. I think we can probably fly with this, but this strategy of having the build plan bolted on the side of Cargo seems likely to build up technical debt quickly and probably expose a lot of various bugs in the build plan.

It sounds like there's also a discrepancy between --emit dep-info and --emit metadata in terms of the preciseness of the inputs? It sounds like you're saying (but correct me if I'm wrong) that metadata emission produces more accurate dependency information than the dep-info output. If that's the case it's a bug in the compiler that we should fix long term, and ideally wouldn't impact the design in Cargo here too much except for ideally at most a small workaround.

@Xanewok
Copy link
Member Author

Xanewok commented Nov 14, 2018

I'd be happy to implement a more complete solution to that as long as it doesn't require rewriting everything, if you could give me some pointers 😅

About discrepancy, I'm sorry, I should be more clear about it. Right now, the rustc --emit dep-info call only needs to go through expansion but the compilation has to have the correct rmetas or rlibs to read or otherwise it'll fail and not generate a .d file. This means that currently we have to have our dependencies compiled/checked in order for a bare rustc --emit dep-info to succeed (which means every node except leaves have to be compiled anyway). This might be possible to solve in the compiler but I'm not sure how far can we go through the expansion having only dummy extern crate paths in the AST (I can't think of any edge case but maybe there are some?).

Also, the reason I mentioned RFC #2523 is that it'd be possible to have:

#[cfg(accessible(serde_json::Value))]
const DATA: .. = include_bytes!(..);

meaning we couldn't possibly resolve the conditional compilation and thus determine inputs unless we compile/check the possible dependencies. Does that make sense?

This only populates `input` JSON value when appropriate .d. files exist
already (e.g. after a succesful `cargo check` run).
The detailed plan doesn't work fully yet - we try and mimick
actual `cargo build` but we only get inputs for the targets we execute
work for?
This is annoying, because we want to traverse the entire unit dependency
graph and to do so we treat every work as `Freshness::Dirty`.
However, for detailed build plan we do want to reuse the resulting
artifacts/work from previous runs, but we can't just blindly skip work dirty
work if it was fresh at construction time in the general case, since freshness
can be transitively altered at job dequeue time.

This makes us skip dirty work at construction time only if we are in a
detailed build plan mode, effectively adding another layer of freshness
tracking due to every node being considered as `Dirty`.
This distinguishes file inputs for running build script, building build
script and regular rustc compilations.
@alexcrichton
Copy link
Member

Oh ok I think I see what you're saying about dep-info/metadata, it's impossible to resolve dep-info (in the future) unless dependencies are fully compiled to at least metadata. Thanks for clarifying!

My main worry here is just that build plan generation will have a stream of bugs until we design Cargo around being able to include it first-class, whereas today it's much more bolted on after the fact. Most of the organization in src/cargo/core/compiler and src/cargo/ops/cargo_rustc is already a bit of a mess, and it's just a sprawling change across a bunch of files to generate a build plan. Naively it seems like we should basically be able to produce an internal data structure which is then used to either execute a build or otherwise generate a build plan, so the build plan generation is largely isolated in its own separate area as would build execution.

I don't think though that we should just inevitably block on this waiting for a possible refactoring. Cargo's not really all that big and is relatively easy to refactor, but I hesitate because I don't really see a way forward to actually get this refactored to a point where I could be confident in it. The basic problem that I see is that reading this PR seems fine but I have no idea if it's right or how many other possible forgotten little branches are here and there which also need to be addressed.

I wonder if we can perhaps remove the previous concept of a build plan entirely to ease this transition? It seems that you're discovering that it's only suited for very basic tasks, and this more rich build plan is required for at least much higher fidelity to match Cargo's build. Along those lines it seems like the previous build plan information shouldn't stick around as it'll eventually want to become this version anyway?

@bors
Copy link
Contributor

bors commented Nov 18, 2018

☔ The latest upstream changes (presumably #6328) made this pull request unmergeable. Please resolve the merge conflicts.

@Xanewok
Copy link
Member Author

Xanewok commented Nov 20, 2018

I wonder if we can perhaps remove the previous concept of a build plan entirely to ease this transition?

A big advantage of the regular --build-plan is that it's almost instant (albeit inaccurate wrt build script output, modifying package or deps invocation/envs), whereas the detailed mode is as fast only when cargo build cache is up to date.

I think it'd be good to ask possible stakeholds whether such a trade-off is acceptable
cc @luser @hobofan (do we know of any other use cases?)

@luser
Copy link
Contributor

luser commented Nov 20, 2018

For our current usage of --build-plan to build Firefox with tup it's a hard requirement that cargo not do any actual work. I suspect other users with similar requirements (bazel et. al) would say the same. /cc @mshal

@alexcrichton
Copy link
Member

Isn't the "regular --build-plan" wrong though? It'll never be able to cover things like build scripts, cfgs, etc?

@hobofan
Copy link

hobofan commented Nov 22, 2018

@alexcrichton I tried to make the same point in #5579 (comment), and I agree. I don't think that --build-plan has been sufficiently speced out in a RFC in terms of what it covers/doesn't cover, or at least that should be well documented before stabilization.

@alexcrichton
Copy link
Member

I discussed this a bit with @Xanewok recently at rustfest, and here's some thoughts that we had:

  • Cargo already has a pessimization of what the possible inputs to a build script can be (all files in a package), and typically a build script will simply inform us that it only cares about a subset of these files.
  • Similarly, without running rustc, Cargo could assume the same pessimistic input set for rustc invocations.
  • Crates can always override this pessimization with include and exclude keys in Cargo.toml

As a result, I think that we can solve both problems here of (a) generating a build plan by executing zero work and (b) generating a correct build plan handling build scripts and such.

The "trick" here is that instead of executing rustc we'd execute a wrapper around rustc. This wrapper would take, as input, the directory that the build script dumped output into. The wrapper would parse the build script's output looking for arguments and such, and then it would invoke the real rustc with appropriate arguments.

Cargo could presuambly ship with such a built-in command/wrapper so it'd still just be cargo/rustc, but that way we could build a static build plan describing exactly what Cargo would otherwise do, and then execute that. (and maybe Cargo itself could switch to doing something like this?) In theory it should handle build scripts and such (cfgs flying around, etc). The downside is that it wouldn't do incremental rebuilds perfectly, but in theory this is largely only intended for Buck/Gecko like solutions in the first go where all of crates.io isn't incrementally rebuilt but only built once.

@Xanewok does that sound about right? I likely missed things!

@Xanewok
Copy link
Member Author

Xanewok commented Nov 28, 2018

@alexcrichton I think the division between lazy+dynamic and eager+static (precise?) is definitely good direction here!

I'm somewhat concerned with correctness - in general, pessimization applies to uploaded crates, however local crates can technically include/depend on files outside of the package directory (e.g. user home directory), so I don't think we can count on the pessimistic input file set to be correct but possibly suboptimal :(
I think people integrating cargo with an external build system will likely have locally developed or vendored packages and so it'd be risky to make this assumption.
Maybe we shouldn't just include the source file set in this case? Or include the pessimization as you're saying but somehow signal to the user that this is a best-effort (e.g. make it a --build-plan=crude or similar?).

Wrapping rustc sounds good, yeah! Having a lazy, but correct plan would be an improvement what we have now. However, I think at some point the consumers would like sitll to have a static, resolved version of the rustc calls (to translate the rules to other system or analyze the compilation process).
Also there's the case of DEP_X_Y keys where build scripts are not only isolated to one package but can pass information further to dependend packages - would we need to persist that information or execute these wrappers referring to the ones from depending crates?

Do you think it might be a good idea to have the build plan execution with more fine-grained accuracy?
For example:

  • crude, which is correct and almost instant but is still dynamic (in the sense that wrappers analyze the build script execution and execute rustc with appropriate flags etc.)
  • moderate, which is mostly static (possibly only runs build scripts and includes necessary cfgs, flags. etc. in the rustc executions)
  • detailed, which potentially also compiles dependent crates, but additionally includes input files?

Generally, it'd also be good if we could include the build scripts output (e.g. @luser's post on hardcoding outputs in tup in the Firefox build).
In the crude mode we could point to the OUT_DIR directory as 'output', but in the modes where we execute the build script we could enumerate those files or possibly also compare timestamps pre- and post-build script to see if it may have modified/created any files inside the package itself (e.g. lalrpop generated .rs files in-directory, prior to 0.15) (this still may potentially modify/create files in arbitrary places but we can't possibly scan the entire filesystem 😆 )

Does this sound reasonable?

@alexcrichton
Copy link
Member

I don't think we can count on the pessimistic input file set to be correct but possibly suboptimal :(

Oh this is what I meant by include and exclude keys in Cargo.toml. It's definitely true that the default heuristic is wrong, but I figured it's ok so long as there's a way to specify the correct dependencies.

Do you think it might be a good idea to have the build plan execution with more fine-grained accuracy?

I think one thing that's unsettling me is that it seems weird to have multiple build plans. In theory there's only one correct build plan and we should largely just make that as easy to use as we can. It seems odd to special case "fast" or "incorrect" build plans :(

However, I think at some point the consumers would like sitll to have a static, resolved version of the rustc calls (to translate the rules to other system or analyze the compilation process).
Also there's the case of DEP_X_Y keys where build scripts are not only isolated to one package but can pass information further to dependend packages - would we need to persist that information or execute these wrappers referring to the ones from depending crates?

I think it's fine to tell users that they won't literally run rustc though but will rather run a rustc-lookalike. Each compilation will be pretty isolated and anything that isn't rustc is what Cargo is already doing internally, only shifted to a different point in time.

I'm also imagining that these wrappers would have enough support to handle things like DEP_X_Y keys, that'd all be slurped up from various files and be located as inputs to all the various subcommands.

What I think we should go towards is basically the speed/immediateness of the current --build-plan plus the accuracy of the build plan in this PR. That'll require a number of wrappers here and there and some things which look somewhat odd in the build plan, but I don't think there's any reason why it wouldn't be able to fundamentally work.

@Xanewok
Copy link
Member Author

Xanewok commented Nov 28, 2018

Oh this is what I meant by include and exclude keys in Cargo.toml. It's definitely true that the default heuristic is wrong, but I figured it's ok so long as there's a way to specify the correct dependencies.

Ah, makes sense! Do you think we could warn against rustc/build script depending on stuff (when calculating fingerprint) that's outside the package but not specified in the include keys? I think noone uses that key to do that now 🙊

@alexcrichton
Copy link
Member

Certainly!

@Xanewok
Copy link
Member Author

Xanewok commented Dec 10, 2018

Conflicts have amassed, sorry about that. Can we keep this PR as related discussion or should I open an issue and close this for now?

We talked with @nrc briefly about some ideas and possible direction we can take.
In general, hiding implicit dependencies behind those wrappers might not seem like the best approach, since generally tools want to have direct knowledge to begin with.
To solve the fast vs correct problem, with each invocation we could attach a 'possibly incorrect' information if it transitively depends on a build script unit that's not already executed.

With this, the current instant build plan could be generated as-is with that info attached or users could execute the plan with --dry-run which would build and execute build scripts if not done before and cache the result (here --build-plan=detailed?).

@nrc @alexcrichton thoughts?

@alexcrichton
Copy link
Member

Oh I'm fine either way in terms of an issue vs a PR discussion. I talked with @nrc a bit on the bus at Orlando, and one thing we realized is that as more and more procedural macros come about the "slow" part here is likely to get slower and slower (because it's running cargo check). I feel like I remember convincing him that the fast-but-full-detail method was possible and would work, but I don't remember being super convincing at the same time. @nrc may be able to help fill in the gaps!

I'm personally still a fan (if we don't have time pressure, which I don't think we do) to produce a full build plan which uses Cargo shims as necessary. Such a plan would be produced instantaneously (run no work) and would have the downside of otherwise not executing literally rustc itself, but I think that's one which can be solved in time

@alexcrichton
Copy link
Member

I'm gonna close this to help clear out the queue a bit and I think it's a bit stale, but I'm sure we'll continue to converse about this!

@Xanewok
Copy link
Member Author

Xanewok commented Apr 26, 2019

No worries, I'll try and come back to this in the nearest future :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants