-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow build plan to also emit file inputs per invocation #6213
Conversation
Oops, didn't test on stable:
I'll fix this today. |
So this PR combines two things:
While both are somewhat orthogonal to each other, currently script execution is necessary for the first item. Leaving environment variables as another form of input/output aside, if it'd be possible for:
then we wouldn't have to execute anything to learn about file input/output dependencies (sans extra The only edge case I can think of is when compiling a crate where I'll see if it'd be feasible to make cc @jsgf wrt build system integration (+ @nrc actually this run and report complete build plan for |
☔ The latest upstream changes (presumably #6236) made this pull request unmergeable. Please resolve the merge conflicts. |
dc83941
to
5a749e8
Compare
Got one error on OSX:
hopefully it's spurious and related to #6236; rebased again. |
Hm, the time regression (1 hr AppVeyor timeout) seems unlikely? I believe cloning is the most expensive operation but I think it shouldn't impact it this much? I'll retrigger CI and will try to optimize it. |
Gah sorry for the delay here, I inended to take a closer look much sooner! At a high level, it sounds like this is basically executing Looking this over it does also feel sort of bolted on, and I'm not sure I was able to quite follow why a I think after that it'd be a perf optimization to not run Would something like that be possible? Or was that already attempted and I should ready the patch more closely? (which is a totally valid!) |
☔ The latest upstream changes (presumably #6270) made this pull request unmergeable. Please resolve the merge conflicts. |
I'm sorry as well, I intended to reply earlier.
Uh, to be fair it's worse than that - it's basically running entire
The reason for that is the current architecture - when preparing work, we build 2 To not regress the fresh case I didn't want to modify them to always build the invocation/work even if it's fresh and so what I did is follow the current solution of treating every work as
Yup, that's basically what I'm trying to do here for the input files! Moved the build plan emission after collecting necessary dep-info data (https://github.com/rust-lang/cargo/pull/6213/files#diff-29d98a4a0b448960d02ef89415610e9d, relevant call) but there's that small problem with fresh work, as per above.
Unfortunately, I've hit a wall with that - here's a small note on my progress with that. The main blocker is that I think we still need external metadata to fully resolve conditional compilation and that's needed to precisely determine the set of files being required for the compilation. If RFC #2523 is merged as-is with I hope I'm not overthinking this, though! |
Hm ok I'm trying to tease apart some of the various pieces here. One (probably longstanding at this point) problem is that the build plan is produced as a side effect of compilation rather than directly, which means that the freshness bits feel pretty wonky. There's a lot of logic in Cargo's production of units of work to figure out what to actually execute, but the current code is primarily intended for incremental compilation (in perhaps not the best fashion) rather than producing a full-fledged build plan which was sort of bolted on after the fact. I think we can probably fly with this, but this strategy of having the build plan bolted on the side of Cargo seems likely to build up technical debt quickly and probably expose a lot of various bugs in the build plan. It sounds like there's also a discrepancy between |
I'd be happy to implement a more complete solution to that as long as it doesn't require rewriting everything, if you could give me some pointers 😅 About discrepancy, I'm sorry, I should be more clear about it. Right now, the Also, the reason I mentioned RFC #2523 is that it'd be possible to have: #[cfg(accessible(serde_json::Value))]
const DATA: .. = include_bytes!(..); meaning we couldn't possibly resolve the conditional compilation and thus determine inputs unless we compile/check the possible dependencies. Does that make sense? |
This only populates `input` JSON value when appropriate .d. files exist already (e.g. after a succesful `cargo check` run).
The detailed plan doesn't work fully yet - we try and mimick actual `cargo build` but we only get inputs for the targets we execute work for?
This is annoying, because we want to traverse the entire unit dependency graph and to do so we treat every work as `Freshness::Dirty`. However, for detailed build plan we do want to reuse the resulting artifacts/work from previous runs, but we can't just blindly skip work dirty work if it was fresh at construction time in the general case, since freshness can be transitively altered at job dequeue time. This makes us skip dirty work at construction time only if we are in a detailed build plan mode, effectively adding another layer of freshness tracking due to every node being considered as `Dirty`.
This distinguishes file inputs for running build script, building build script and regular rustc compilations.
Oh ok I think I see what you're saying about dep-info/metadata, it's impossible to resolve dep-info (in the future) unless dependencies are fully compiled to at least metadata. Thanks for clarifying! My main worry here is just that build plan generation will have a stream of bugs until we design Cargo around being able to include it first-class, whereas today it's much more bolted on after the fact. Most of the organization in I don't think though that we should just inevitably block on this waiting for a possible refactoring. Cargo's not really all that big and is relatively easy to refactor, but I hesitate because I don't really see a way forward to actually get this refactored to a point where I could be confident in it. The basic problem that I see is that reading this PR seems fine but I have no idea if it's right or how many other possible forgotten little branches are here and there which also need to be addressed. I wonder if we can perhaps remove the previous concept of a build plan entirely to ease this transition? It seems that you're discovering that it's only suited for very basic tasks, and this more rich build plan is required for at least much higher fidelity to match Cargo's build. Along those lines it seems like the previous build plan information shouldn't stick around as it'll eventually want to become this version anyway? |
☔ The latest upstream changes (presumably #6328) made this pull request unmergeable. Please resolve the merge conflicts. |
A big advantage of the regular I think it'd be good to ask possible stakeholds whether such a trade-off is acceptable |
For our current usage of |
Isn't the "regular |
@alexcrichton I tried to make the same point in #5579 (comment), and I agree. I don't think that |
I discussed this a bit with @Xanewok recently at rustfest, and here's some thoughts that we had:
As a result, I think that we can solve both problems here of (a) generating a build plan by executing zero work and (b) generating a correct build plan handling build scripts and such. The "trick" here is that instead of executing rustc we'd execute a wrapper around rustc. This wrapper would take, as input, the directory that the build script dumped output into. The wrapper would parse the build script's output looking for arguments and such, and then it would invoke the real rustc with appropriate arguments. Cargo could presuambly ship with such a built-in command/wrapper so it'd still just be cargo/rustc, but that way we could build a static build plan describing exactly what Cargo would otherwise do, and then execute that. (and maybe Cargo itself could switch to doing something like this?) In theory it should handle build scripts and such (cfgs flying around, etc). The downside is that it wouldn't do incremental rebuilds perfectly, but in theory this is largely only intended for Buck/Gecko like solutions in the first go where all of crates.io isn't incrementally rebuilt but only built once. @Xanewok does that sound about right? I likely missed things! |
@alexcrichton I think the division between lazy+dynamic and eager+static (precise?) is definitely good direction here! I'm somewhat concerned with correctness - in general, pessimization applies to uploaded crates, however local crates can technically include/depend on files outside of the package directory (e.g. user home directory), so I don't think we can count on the pessimistic input file set to be correct but possibly suboptimal :( Wrapping Do you think it might be a good idea to have the build plan execution with more fine-grained accuracy?
Generally, it'd also be good if we could include the build scripts output (e.g. @luser's post on hardcoding outputs in tup in the Firefox build). Does this sound reasonable? |
Oh this is what I meant by
I think one thing that's unsettling me is that it seems weird to have multiple build plans. In theory there's only one correct build plan and we should largely just make that as easy to use as we can. It seems odd to special case "fast" or "incorrect" build plans :(
I think it's fine to tell users that they won't literally run rustc though but will rather run a rustc-lookalike. Each compilation will be pretty isolated and anything that isn't rustc is what Cargo is already doing internally, only shifted to a different point in time. I'm also imagining that these wrappers would have enough support to handle things like What I think we should go towards is basically the speed/immediateness of the current |
Ah, makes sense! Do you think we could warn against rustc/build script depending on stuff (when calculating fingerprint) that's outside the package but not specified in the |
Certainly! |
Conflicts have amassed, sorry about that. Can we keep this PR as related discussion or should I open an issue and close this for now? We talked with @nrc briefly about some ideas and possible direction we can take. With this, the current instant build plan could be generated as-is with that info attached or users could execute the plan with @nrc @alexcrichton thoughts? |
Oh I'm fine either way in terms of an issue vs a PR discussion. I talked with @nrc a bit on the bus at Orlando, and one thing we realized is that as more and more procedural macros come about the "slow" part here is likely to get slower and slower (because it's running I'm personally still a fan (if we don't have time pressure, which I don't think we do) to produce a full build plan which uses Cargo shims as necessary. Such a plan would be produced instantaneously (run no work) and would have the downside of otherwise not executing literally |
I'm gonna close this to help clear out the queue a bit and I think it's a bit stale, but I'm sure we'll continue to converse about this! |
No worries, I'll try and come back to this in the nearest future :) |
This adds another mode for the build plan (
cargo build --build-plan=detailed ...
), which differs from the current--build-plan
such that:rustc
invocations are altered to include built native deps or more build-script-time-detected args (e.g. serde rustc version-gated--cfg
flags)cargo build
if necessaryExample
--build-plan
and--build-plan=detailed
diff on Cargo itselfThe main motivation is make the build system integration easier. With file dependencies more clear it'll be easier to automatically translate the plan to other build system rules or to reason about about artifact freshness (currently RLS would also benefit by being able to accurately answer the 'given these dirty files, what do I need to rerun' question for a
cargo check
equivalent build plan).(I imagine the next step would be to somehow (somewhat) accurately detect build script output (sandboxed? special API?) to better reason about build scripts in general.)
With regards to the actual changes, I couldn't shake off the feeling that the changes made are too bolt-on. While I think tracking which commands are being prepared for the actual work should be on the same code path as a regular build, it seems hacky to:
work
to be asFreshness::Dirty
(especially since the work can turn out to be dirty if it depends on a dirty one during job execution)
which effectively acts as an another layer of freshness tracking (all of this applies only to the
--build-plan=detailed
mode).r? @alexcrichton