From 80dc9b9e8eec15aea721550d8c112ae31b457cc3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Re=CC=81my=20Rakic?= Date: Tue, 17 Sep 2019 22:04:18 +0200 Subject: [PATCH] book: add a chapter describing the input relations --- book/src/SUMMARY.md | 1 + book/src/rules/relations.md | 137 ++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) create mode 100644 book/src/rules/relations.md diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 466cbc9b14e..036c5cb4c3b 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -6,5 +6,6 @@ - [Generating inputs from rustc](./generate_inputs.md) - [Rules](./rules.md) - [Atoms](./rules/atoms.md) + - [Relations](./rules/relations.md) - [Initialization](./rules/initialization.md) - [See also](./see_also.md) diff --git a/book/src/rules/relations.md b/book/src/rules/relations.md new file mode 100644 index 00000000000..64c27cb8ffc --- /dev/null +++ b/book/src/rules/relations.md @@ -0,0 +1,137 @@ +# Input relations + +Polonius computes its analyses starting from "input facts", which can be seen as a little database of information about a piece of Rust code (most often: a function). + +In this analogy, the database is the [`AllFacts` struct](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/facts.rs#L6-L65), which contains all the data in tables (or relations), here as a handful of `Vec`s of rows. The table rows are these "facts": this terminology comes from Datalog, which Polonius uses to do its computations (and the reason for the `rustc` flag [outputting this data](../generate_inputs.md) being named `-Znll-facts`, and the files themselves `*.facts`). + +In order to be used in various contexts (mainly: in-memory from `rustc`, and from on-disk test files in the Polonius repository) this structure is generic over the types of facts, only requiring them to be [`Atom`](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/facts.rs#L90-L94)s. The goal is to use interned values, represented as numbers, in the polonius computations. + +These generic types of facts are the concepts Polonius manipulates: abstract `origins` containing `loans` at `nodes` in the CFG (in the liveness computation, and move/overwrite analysis, there are also: `variables` and `paths`), and the relations are their semantics (including the specific relationships between the different facts). More details about these atoms can be found in their [dedicated chapter](./atoms.md). + +Let's start with the simplest relation: the representation of the Control Flow Graph, in the `cfg_edge` relation. + +Note: some of these relations are in the process of being renamed. This document will be updated when that happens. + +### 1. `cfg_edge` + +`cfg_edge(node1, node2)`: as its name suggests, this relation stores that there's a CFG edge between the node `node1` and the node `node2`. + +For each MIR statement location, 2 Polonius nodes are generated: the "Start" nodes and the "Mid" nodes (some of the other Polonius inputs will later be recorded at each of the nodes). These 2 nodes are linked by an edge recorded in this relation. + +Then, another edge will be recorded, linking this MIR statement to its successor statement(s): from the mid node of the current location to the start node of the successor location. Even though it's encoded differently in MIR, this will similarly apply when the successor location is in another block, linking the mid node of the current location to the start node of the successor block's starting location. + +For example, for this MIR (edited from the example for clarity, and to only show the parts related to the CFG): + +```rust +bb0: { + ... // bb0[0] + ... // bb0[1] + goto -> bb3; // bb0[2] +} + +... + +bb3: { + ... // bb3[0] +} + +``` + +we will record these input facts (as mentioned before, they'll be interned) in the `cfg_edge` relation, shown here as pseudo Rust: + +```rust +cfg_edge = vec![ + // statement at location bb0[0]: + (bb0-0-start, bb0-0-mid), + (bb0-0-mid, bb0-1-start), + + // statement at location bb0[1]: + (bb0-1-start, bb0-1-mid), + (bb0-1-mid, bb0-2-start), + + // terminator at location bb0[2]: + (bb0-2-start, bb0-2-mid), + (bb0-2-mid, bb3-0-start), +]; +``` + +### 2. `borrow_region` + +`borrow_region(origin, loan, node)`: this relation stores that the origin `origin` may refer to data from loan `loan` from the node `node` and onwards. + +For every borrow expression, a loan will be created and there will be a fact stored in this relation to link this loan to the origin of the borrow expression. + +For example + +```rust +let mut a = 0; +let r = &mut a; // this creates the loan L0 +// ^ let's call this 'a +``` + +there will be a `borrow_region` fact linking `L0` to `'a` at this node. This loan will flow along the CFG and the subset relationships between origins, and the computation will require that its terms are respected or it will generate an illegal access error. + +### 3. `universal_region` + +`universal_region(origin)`: this relation stores that the origin `origin` is a "universal region"/"free region"/"placeholder region" and not defined in the MIR body we're checking. + +Those are the default universal origins (`'static`) and the ones defined on functions which are generic over a lifetime. For example, with + +```rust +fn my_function<'a, 'b>(x: &'a u32, y: &'b u32) { + ... +} +``` + +the `universal_region` relation will also contain facts for `'a`, and `'b`. + +### 4. `killed` + +`killed(loan, node)`: this relation stores that a prefix of the path borrowed in loan `loan` is overwritten at the node `node`. + +For example, with + +```rust +let mut a = 1; +let mut b = 2; +let mut q = &mut a; +let r = &mut *q; // loan L0 of `*q` +// `q` can't be used here, one has to go through `r` +q = &mut b; // killed(L0) +// `q` and `r` can be used here +``` + +the loan `L0` will be `killed` by the assignment, and this fact stored in the relation. When we compute which loans origins contain along the CFG, the `killed` nodes will stop this loan's propagation to the next CFG node. + +### 5. `outlives` + +`outlives(origin1, origin2, node)`: this relation stores that the origin `origin1` outlives origin `origin2` at the node `node`. + +This is the standard Rust syntax `'a: 'b` where the *lifetime* `'a` outlives the lifetime `'b`. From the node of view of origins as sets of loans, this is seen as a subset-relation: with all the loans in `'a` flowing into `'b`, `'a` contains a subset of the loans `'b` contains. + +The type system defines subtyping rules for references, which will create `outlives` facts to relate the reference type to the referent type. + +For example, + +```rust +let a: u32 = 1; +let b: &u32 = &a; +// ^ let's call this 'a +// ^ and let's call this 'b +``` + +To be valid, this last expression requires that the type `&'a u32` is a subtype of `&'b u32`. This requires `'a: 'b` and the `outlives` relation will contain this basic fact that `'a` outlives / is a subset of / flows into `'b` at this node. + +### 6. `region_live_at` + +`region_live_at(origin, node)`: this relation stores that the origin `origin` appears in a live variable at the node `node`. + +These facts are created by the liveness computation, and its facts and relations will be described later in a lot more detail. + +### 7. `invalidates` + +`invalidates(node, loan)`: this relation stores that a loan `loan` is invalidated at the node `node`. + +Loans have terms which must be respected: ensuring shared loans are only used to read and not write or mutate, or that a mutable loan is the only way to access a referent. An illegal access of the path borrowed by the loan is said to *invalidate* the terms of the loan, and this fact will be recorded in the `invalidates` relation. + +Since the goal of the borrow checking analysis is to find these possible errors, this relation is important to the computation. Any loans it contains, and in turn, any origin containing those loans, are the key facts the computation tracks.