From 61998d52a2977924dd880b7a5da59ee629e964e9 Mon Sep 17 00:00:00 2001
From: Nadrieril <nadrieril+git@gmail.com>
Date: Mon, 4 Dec 2023 03:25:56 +0100
Subject: [PATCH] Explain the important concepts of exhaustiveness checking

---
 src/pat-exhaustive-checking.md | 146 +++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 8 deletions(-)

diff --git a/src/pat-exhaustive-checking.md b/src/pat-exhaustive-checking.md
index a3103de8f..9d20d81ba 100644
--- a/src/pat-exhaustive-checking.md
+++ b/src/pat-exhaustive-checking.md
@@ -7,7 +7,7 @@ are exhaustive.
 ## Pattern usefulness
 
 The central question that usefulness checking answers is:
-"in this match expression, is that branch reachable?".
+"in this match expression, is that branch redundant?".
 More precisely, it boils down to computing whether,
 given a list of patterns we have already seen,
 a given new pattern might match any new value.
@@ -42,10 +42,8 @@ because a match expression can return a value).
 
 ## Where it happens
 
-This check is done to any expression that desugars to a match expression in MIR.
-That includes actual `match` expressions,
-but also anything that looks like pattern matching,
-including `if let`, destructuring `let`, and similar expressions.
+This check is done anywhere you can write a pattern: `match` expressions, `if let`, `let else`,
+plain `let`, and function arguments.
 
 ```rust
 // `match`
@@ -80,9 +78,141 @@ fn foo(Foo { x, y }: Foo) {
 
 ## The algorithm
 
-Exhaustiveness checking is implemented in [`check_match`].
-The core of the algorithm is in [`usefulness`].
+Exhaustiveness checking is run before MIR building in [`check_match`].
+It is implemented in the [`rustc_pattern_analysis`] crate,
+with the core of the algorithm in the [`usefulness`] module.
 That file contains a detailed description of the algorithm.
 
+## Important concepts
+
+### Constructors and fields
+
+In the value `Pair(Some(0), true)`, `Pair` is called the constructor of the value, and `Some(0)` and
+`true` are its fields. Every matcheable value can be decomposed in this way. Examples of
+constructors are: `Some`, `None`, `(,)` (the 2-tuple constructor), `Foo {..}` (the constructor for
+a struct `Foo`), and `2` (the constructor for the number `2`).
+
+Each constructor takes a fixed number of fields; this is called its arity. `Pair` and `(,)` have
+arity 2, `Some` has arity 1, `None` and `42` have arity 0. Each type has a known set of
+constructors. Some types have many constructors (like `u64`) or even an infinitely many (like `&str`
+and `&[T]`).
+
+Patterns are similar: `Pair(Some(_), _)` has constructor `Pair` and two fields. The difference is
+that we get some extra pattern-only constructors, namely: the wildcard `_`, variable bindings,
+integer ranges like `0..=10`, and variable-length slices like `[_, .., _]`. We treat or-patterns
+separately.
+
+Now to check if a value `v` matches a pattern `p`, we check if `v`'s constructor matches `p`'s
+constructor, then recursively compare their fields if necessary. A few representative examples:
+
+- `matches!(v, _) := true`
+- `matches!((v0,  v1), (p0,  p1)) := matches!(v0, p0) && matches!(v1, p1)`
+- `matches!(Foo { a: v0, b: v1 }, Foo { a: p0, b: p1 }) := matches!(v0, p0) && matches!(v1, p1)`
+- `matches!(Ok(v0), Ok(p0)) := matches!(v0, p0)`
+- `matches!(Ok(v0), Err(p0)) := false` (incompatible variants)
+- `matches!(v, 1..=100) := matches!(v, 1) || ... || matches!(v, 100)`
+- `matches!([v0], [p0, .., p1]) := false` (incompatible lengths)
+- `matches!([v0, v1, v2], [p0, .., p1]) := matches!(v0, p0) && matches!(v2, p1)`
+
+This concept is absolutely central to pattern analysis. The [`constructor`] module provides
+functions to extract, list and manipulate constructors. This is a useful enough concept that
+variations of it can be found in other places of the compiler, like in the MIR-lowering of a match
+expression and in some clippy lints.
+
+### Constructor grouping and splitting
+
+The pattern-only constructors (`_`, ranges and variable-length slices) each stand for a set of
+normal constructors, e.g. `_: Option<T>` stands for the set {`None`, `Some`} and `[_, .., _]` stands
+for the infinite set {`[,]`, `[,,]`, `[,,,]`, ...} of the slice constructors of arity >= 2.
+
+In order to manage these constructors, we keep them as grouped as possible. For example:
+
+```rust
+match (0, false) {
+    (0 ..=100, true) => {}
+    (50..=150, false) => {}
+    (0 ..=200, _) => {}
+}
+```
+
+In this example, all of `0`, `1`, .., `49` match the same arms, and thus can be treated as a group.
+In fact, in this match, the only ranges we need to consider are: `0..50`, `50..=100`,
+`101..=150`,`151..=200` and `201..`. Similarly:
+
+```rust
+enum Direction { North, South, East, West }
+# let wind = (Direction::North, 0u8);
+match wind {
+    (Direction::North, 50..) => {}
+    (_, _) => {}
+}
+```
+
+Here we can treat all the non-`North` constructors as a group, giving us only two cases to handle:
+`North`, and everything else.
+
+This is called "constructor splitting" and is crucial to having exhaustiveness run in reasonable
+time.
+
+### Usefulness vs reachability in the presence of empty types
+
+This is likely the subtlest aspect of exhaustiveness. To be fully precise, a match doesn't operate
+on a value, it operates on a place. In certain unsafe circumstances, it is possible for a place to
+not contain valid data for its type. This has subtle consequences for empty types. Take the
+following:
+
+```rust
+enum Void {}
+let x: u8 = 0;
+let ptr: *const Void = &x as *const u8 as *const Void;
+unsafe {
+    match *ptr {
+        _ => println!("Reachable!"),
+    }
+}
+```
+
+In this example, `ptr` is a valid pointer pointing to a place with invalid data. The `_` pattern
+does not look at the contents of the place `*ptr`, so this code is ok and the arm is taken. In other
+words, despite the place we are inspecting being of type `Void`, there is a reachable arm. If the
+arm had a binding however:
+
+```rust
+# #[derive(Copy, Clone)]
+# enum Void {}
+# let x: u8 = 0;
+# let ptr: *const Void = &x as *const u8 as *const Void;
+# unsafe {
+match *ptr {
+    _a => println!("Unreachable!"),
+}
+# }
+```
+
+Here the binding loads the value of type `Void` from the `*ptr` place. In this example, this causes
+UB since the data is not valid. In the general case, this asserts validity of the data at `*ptr`.
+Either way, this arm will never be taken.
+
+Finally, let's consider the empty match `match *ptr {}`. If we consider this exhaustive, then
+having invalid data at `*ptr` is invalid. In other words, the empty match is semantically
+equivalent to the `_a => ...` match. In the interest of explicitness, we prefer the case with an
+arm, hence we won't tell the user to remove the `_a` arm. In other words, the `_a` arm is
+unreachable yet not redundant. This is why we lint on redundant arms rather than unreachable
+arms, despite the fact that the lint says "unreachable".
+
+These considerations only affects certain places, namely those that can contain non-valid data
+without UB. These are: pointer dereferences, reference dereferences, and union field accesses. We
+track during exhaustiveness checking whether a given place is known to contain valid data.
+
+Having said all that, the current implementation of exhaustiveness checking does not follow the
+above considerations. On stable, empty types are for the most part treated as non-empty. The
+[`exhaustive_patterns`] feature errs on the other end: it allows omitting arms that could be
+reachable in unsafe situations. The [`never_patterns`] experimental feature aims to fix this and
+permit the correct behavior of empty types in patterns.
+
 [`check_match`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/thir/pattern/check_match/index.html
-[`usefulness`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/thir/pattern/usefulness/index.html
+[`rustc_pattern_analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_pattern_analysis/index.html
+[`usefulness`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_pattern_analysis/usefulness/index.html
+[`constructor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_pattern_analysis/constructor/index.html
+[`never_patterns`]: https://github.com/rust-lang/rust/issues/118155
+[`exhaustive_patterns`]: https://github.com/rust-lang/rust/issues/51085