-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the backend and improve fingerprint diagnostics #2022
Conversation
r? @brson |
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
This performs a pass over the docs, touching up sections in a few places and also moving everything to using inline table syntax by default (the favorted way to add a dependency)
This commit started out identifying a relatively simple bug in Cargo. A recent change made it such that the resolution graph included all target-specific dependencies, relying on the structure of the backend to filter out those which don't need to get built. This was unfortunately not accounted for in the portion of the backend that schedules work, mistakenly causing spurious rebuilds if different runs of the graph pulled in new crates. For example if `cargo build` didn't build any target-specific dependencies but then later `cargo test` did (e.g. a dev-dep pulled in a target-specific dep unconditionally) then it would cause a rebuild of the entire graph. This class of bug is certainly not the first in a long and storied history of the backend having multiple points where dependencies are calculated and those often don't quite agree with one another. The purpose of this rewrite is twofold: 1. The `Stage` enum in the backend for scheduling work and ensuring that maximum parallelism is achieved is removed entirely. There is already a function on `Context` which expresses the dependency between targets (`dep_targets`) which takes a much finer grain of dependencies into account as well as already having all the logic for what-depends-on-what. This duplication has caused numerous problems in the past, an unifying these two will truly grant maximum parallelism while ensuring that everyone agrees on what their dependencies are. 2. A large number of locations in the backend have grown to take a (Package, Target, Profile, Kind) tuple, or some subset of this tuple. In general this represents a "unit of work" and is much easier to pass around as one variable, so a `Unit` was introduced which references all of these variables. Almost the entire backend was altered to take a `Unit` instead of these variables specifically, typically providing all of the contextual information necessary for an operation. A crucial part of this change is the inclusion of `Kind` in a `Unit` to ensure that everyone *also* agrees on what architecture they're compiling everything for. There have been many bugs in the past where one part of the backend determined that a package was built for one architecture and then another part thought it was built for another. With the inclusion of `Kind` in dependency management this is handled in a much cleaner fashion as it's only calculated in one location. Some other miscellaneous changes made were: * The `Platform` enumeration has finally been removed. This has been entirely subsumed by `Kind`. * The hokey logic for "build this crate once" even though it may be depended on by both the host/target kinds has been removed. This is now handled in a much nicer fashion where if there's no target then Kind::Target is just never used, and multiple requests for a package are just naturally deduplicated. * There's no longer any need to build up the "requirements" for a package in terms of what platforms it's compiled for, this now just naturally falls out of the dependency graph. * If a build script is overridden then its entire tree of dependencies are not compiled, not just the build script itself. * The `threadpool` dependency has been replaced with one on `crossbeam`. The method of calculating dependencies has quite a few non-static lifetimes and the scoped threads of `crossbeam` are now used instead of a thread pool. * Once any thread fails to execute a command work is no longer scheduled unlike before where some extra pending work may continue to start. * Many functions used early on, such as `compile` and `build_map` have been massively simplified by farming out dependency management to `Context::dep_targets`. * There is now a new profile to represent running a build script. This is used to inject dependencies as well as represent that a library depends on running a build script, not just building it. This change has currently been tested against cross-compiling Servo to Android and passes the test suite (which has quite a few corner cases for build scripts and such), so I'm pretty confident that this refactoring won't have at least too many regressions!
Adds some helpful error text if an error happens as well
This commit overhauls how a `Fingerprint` is stored on the filesystem and in-memory to help provide much better diagnostics as to why crates are being rebuilt. This involves storing more structured data on the filesystem in order to have a finer-grained comparison with the previous state. This is not currently surfaced in the output of cargo and still requires `RUST_LOG=cargo::ops::cargo_rustc::fingerprint=info` but if it turns out to be useful we can perhaps surface the output. There are performance considerations here to ensure that a noop build is still quite speedy for a few reasons: 1. JSON decoding is slow (these are just big structures to decode) 2. Each fingerprint stores all recursive fingerprints, so we can't just "vanilla decode" as it would decode O(n^2) items 3. Hashing is actually somewhat nontrivial for this many items here and there, so we still need as much memoization as possible. To ensure that builds are just as speedy tomorrow as they are today, a few strategies are taken: * The same fingerprint strategy is used today as a "first line of defense" where a small text file with a string contains the "total fingerprint" hash. A separately stored file then contains the more detailed JSON structure of the old fingerprint, and that's only decoded if there's a mismatch of the short hashes. The rationale here is that most crates don't need to be rebuilt so we shouldn't decode JSON, but if it does need to be rebuilt then the work of compiling far dwarfs the work of decoding the JSON. * When encoding a full fingerprint as JSON we don't actually include any dependencies, just the resolved u64 of them. This helps the O(n^2) problem in terms of decoding time and storage space on the filesystem. * Short hashes continue to be memoized to ensure we don't recompute a hash if we've already done so (e.g. shared dependencies). Overall, when profiling with Servo, this commit does not regress noop build times, but should help diagnose why crates are being rebuilt hopefully! Closes rust-lang#2011
Turns out the tests don't actually need this "more deterministic" output and it was also showing up relatively high in profiles (e.g. ~15% of a noop runtime) so if this is necessary let's find a better way!
To parse at least
Due to the use of crossbeam this also means we drop 1.1.0 support as crossbeam needs at least |
@bors r+ |
📌 Commit 47ea6d6 has been approved by |
This series of commits performs two primary tasks: * The backend is refactored (again!) to have a much better understanding of what's used as input to everything else in terms of parallelization and dependency tracking. The major goal here was to completely unify all parts of the backend which sort of ad-hoc track dependencies and deal with cross compilation. More details can be found in the first commit message. * Greatly improving diagnostics for why a crate is being rebuilt. This still requires setting `RUST_LOG`, but it was spurred on by #2011 and should help at least my own personal internal debugging of "why is this crate rebuilding?" which seems to come up quite often!
☀️ Test successful - cargo-linux-32, cargo-linux-64, cargo-mac-32, cargo-mac-64, cargo-win-gnu-32, cargo-win-gnu-64, cargo-win-msvc-32, cargo-win-msvc-64 |
This series of commits performs two primary tasks:
RUST_LOG
, but it was spurred on by Commands or tools for viewing build dependencies #2011 and should help at least my own personal internal debugging of "why is this crate rebuilding?" which seems to come up quite often!