diff --git a/docs/RFCS/20180603_follower_reads.md b/docs/RFCS/20180603_follower_reads.md
new file mode 100644
index 000000000000..6ec979481476
--- /dev/null
+++ b/docs/RFCS/20180603_follower_reads.md
@@ -0,0 +1,629 @@
+- Feature Name: Follower Reads
+- Status: accepted
+- Start Date: 2018-06-03
+- Authors: Spencer Kimball, Tobias Schottdorf
+- RFC PR: #21056
+- Cockroach Issue: #16593
+
+# Summary
+
+Follower reads are consistent reads at historical timestamps from follower
+replicas. They make the non-leader replicas in a range suitable sources for
+historical reads. Historical reads include both `AS OF SYSTEM TIME` queries
+as well as transactions with a read timestamp sufficiently in the past (for
+example long-running analytics queries).
+
+The key enabling technology is the exchange of **closed timestamp updates**
+between stores. A closed timestamp update (*CT update*) is a store-wide
+timestamp together with (sparse) per-range information on the Raft progress.
+Follower replicas use local state built up from successive received CT updates
+to ascertain that they have all the state necessary to serve consistent reads
+at and below the leaseholder store's closed timestamp.
+
+Follower reads are only possible for epoch-based leases, which includes all user
+ranges but excludes some system ranges (such as the addressing metadata ranges).
+In what follows all mentioned leases are epoch-based.
+
+# Motivation
+
+Consistent historical reads are useful for analytics queries and in particular
+allow such queries to be carried out more efficiently and, with appropriate
+configuration, away from foreground traffic. But historical reads are also key
+to a proposal for [reference-like
+tables](https://github.com/cockroachdb/cockroach/issues/26301) aimed at cutting
+down on foreign key check latencies particularly in geo-distributed clusters;
+they help recover a reasonably recent consistent snapshot of a cluster after a
+loss of quorum; and they are one of the ingredients for [Change Data
+Capture](https://github.com/cockroachdb/cockroach/pull/25229).
+
+# Guide-level explanation
+
+Fundamentally, the idea is that we already keep multiple consistent copies of
+all data via replication, and that we want to utilize all of the copies to
+serve reads. Morally speaking, a read which only cares to access data that was
+written at some timestamp well in the past *should* be servable from all
+replicas (assuming normal operation), as replication typically catches up all
+the followers quickly, and most writes happen at "newer" timestamps. Clearly
+neither of these two properties are guaranteed though, so replicas have to be
+provided with a a way of deciding whether a given read request can be served
+consistently.
+
+The closed timestamp mechanism provides between each pair of stores a regular
+(on the order of seconds) exchange of information to that end. At a high level,
+these updates contain what one might intuitively expect:
+
+A follower trying to serve a read needs to know that a given timestamp is "safe"
+(in parlance of this RFC, "closed") to serve reads for; there must not be some
+in-flight or future write that would invalidate a follower read retroactively.
+Each store maintains a data structure, the **min proposal tracker** (*MPT*)
+described later, to establish this timestamp.
+
+Similarly, if a range's leaseholder commits a write into its Raft log at index
+`P` before announcing a *closed timestamp*, then the follower must wait until it
+has caught up to that index `P` before serving reads at the closed timestamp. To
+provide this information, each store also includes with each closed timestamp an
+updated minimum log index that the follower must reach before "activating" the
+associated closed timestamp on that replica.
+
+Providing the information only when there has been write activity on a given
+range since the last closed timestamp is key to performance, as a store can
+house upwards of 50000 replicas, and including information about every single
+one of them in each update is prohibitive due to the overhead of visiting them.
+
+This is similar to *range quiescence*, which avoids Raft heartbeats between
+inactive ranges. It's worth pointing out that quiescent ranges are able to serve
+follower reads, and that there is no architectural connection between follower
+reads in quiescence, though a range that is quiescent is typically one that
+requires no per-range CT update.
+
+As we've seen above, this RFC deals in "log positions" (and closed timestamps).
+For technical reasons, the "log position" is not the Raft log position but the
+**Lease Applied Index**, a concept introduced by us on top of Raft to handle
+Raft-internal reproposals and reorderings. Ultimately, what we're after is a
+promise of the form
+
+> no more proposals writing to timestamps less than or equal to  `T` are going
+to apply after log index `I`.
+
+This guarantee is tricky to extract from the Raft log index since proposing a
+command at log index `I` does not restrict it from showing up at higher log
+indices later, especially in leader-not-leaseholder situations. The *Lease
+Applied Index* was introduced precisely to have better control, and allows us to
+make the above promise.
+
+# Reference-level explanation
+
+This section will focus on the technical details of the closed timestamp
+mechanism, with an emphasis on correctness.
+
+A closed timestamp update contains the following information (sent by an origin `Store`):
+
+- the **liveness epoch** (of the origin `Store`)
+- a **closed timestamp** (`hlc.Timestamp`, typically trails "real time" by at least a constant target duration)
+- a **sequence number** (to allow discarding built-up state on missed updates)
+- a map from `RangeID` to **minimum lease applied index** (*MLAI*) that specifies
+  the updates to the recipient's map accumulated from all previous updates.
+
+The accumulated per-range state together with the closed timestamp serve as a
+guarantee of the form
+
+> Every Raft command proposed after the min lease applied index (MLAI)
+will be ahead of the closed timestamp (CT).
+
+Each store starts out with an empty state for each peer store and epoch, and
+merges the *MLAI* updates into the state (overwriting existing *MLAI*s).
+Whenever the sequence number received in an update from a peer store displays a
+gap, the state for that peer store is reset, and the current update merged into
+the empty state: this means that all information regarding ranges not explicitly
+mentioned in the current update is lost. Similarly, if the epoch changes, the
+state for any prior epoch is discarded and the update applied to an empty state
+for the new epoch.
+
+At a high level, the design splits into three parts:
+
+1. How are the outgoing updates assembled? This will mainly live in the Replica write
+path: whenever something is proposed to Raft, action needs to be taken to
+reflect this proposal in the next CT update.
+2. How are the received updates used and which reads can be served? This lives
+mostly in the read path.
+3. How are reads routed to eligible follower replicas? This lives both in
+`DistSender` and the DistSQL physical planner.
+
+We will talk about how they are used first, as that is the most natural
+starting point for correctness.
+
+To serve a read request at timestamp `T` via follower reads, a replica
+
+1. looks up the lease, noting the store (and thus node) and epoch it belongs to.
+1. looks up the CT state known for this node and epoch.
+1. checks whether the read timestamp `T` is less than or equal to the closed timestamp.
+1. checks whether its *Lease Applied Index* matches or exceeds the *MLAI* for the range (in the absence of an *MLAI*, this check fails by default).
+
+If the checks succeed, the follower serves the read (without an update to the
+timestamp cache necessary). If they don't, a `NotLeaseholderError` is returned.
+
+Note that if the read fails because no *MLAI* is known for that range, there
+needs to be some proactive action to prompt re-sending of the *MLAI*. This is
+because without write activity on the range (which is not necessarily going to
+happen any time soon) the origin store will not send an update. Strategies to
+mitigate this are discussed in a dedicated section below.
+
+## Implied guarantees
+
+Implicitly, a received update represents the following essential promises:
+
+- the origin node was, at any point in time, live for the given epoch and closed
+  timestamp. Concretely, this means that the origin node had a liveness update (for
+  the epoch) with the closed timestamp falling *before* the stasis period.
+
+  This guarantees that no other node could forcibly take over the lease at a
+  timestamp less than or equal to the closed timestamp, and consequently for any
+  lease (as seen on a follower) for that origin store and epoch the origin store
+  knows about all relevant Raft proposals that need to be applied before serving
+  follower reads.
+
+  In other words, **the ranges map in the update is authoritative** as long as:
+- the *MLAI map* contains an update for any range for which a command has been
+  proposed since the last update.
+
+  This guarantee is hopefully not a surprise, but implicit in this is the
+  requirement that any relevant write actually increments the lease applied
+  index. Luckily, all commands do, except for lease requests (not transfers --
+  see below for those), which don't mutate user-visible state.
+- the origin store won't (ever) initiate a lease transfer that would allow
+  another node to write at or below the closed timestamp. In other words, in the
+  case of a lease transfer the next lease would start at a timestamp greater than
+  the closed timestamp. This is likely impossible in practice since the transfer
+  timestamps and proposed closed timestamps are taken from the same hybrid logical
+  clock, but an explicit safeguard will be added just in case.
+
+  If this rule were broken, another lease holder could propose commands that
+  violate the closed timestamp sent by the original node (and a lagging follower
+  would continue seeing the old lease and convince itself that it was fine to
+  serve reads).
+
+  Lease transfers also require an update in the *MLAI map*; they need to
+  essentially force the follower to see the new lease before they serve further
+  follower reads (at which point they will turn to the new leaseholder's store
+  for guidance). Nothing special is required to get this behavior; a lease
+  transfer requires a valid *Lease Applied Index*, so the same mechanism that
+  forces followers to catch up on the Raft log for writes also makes them
+  observe the new lease. This requires that we wait until reaching the MLAI
+  for a closed timestamp until we decide which node's state to query.
+
+  Note that a node restart implies a change in the liveness epoch, which in
+  turn invalidates all of the information sent before the restart.
+
+## Recovering from missed updates
+
+To regain a fully populated *MLAI* map when first receiving updates (or after
+resetting the state for a peer node), there are two strategies:
+
+1. special case sequence number zero so that it includes an *MLAI* for all
+   ranges for which the lease is held. When an update is missed, the recipient
+   notifies the sender and it resets its sequence number to zero (thus sending
+   a full update next).
+2. ask for updates on individual ranges whenever a follower read request fails
+   because of a missing *MLAI*.
+
+We opt to implement both strategies, with the first doing the bulk of the work.
+The first strategy is worthwhile because
+
+1. the payload is essentially two varints for each range, amounting to no more than
+   20 bytes on the wire, adding up to a 1mb payload at 50000 leaseholder replicas
+   (but likely much less in practice).
+   Even with 10x as many, a rare enough 10mb payload seems unproblematic,
+   especially since it can be streamed.
+2. without an eager catch-up, followers will have to warm up "on demand" but the
+   routing layer has no insight into this process and will blindly route reads
+   to followers, which makes for a poor experience after a node restart.
+
+But this strategy can miss necessary updates as leases get transferred to
+otherwise inactive ranges. To guard against these rare cases, the second
+strategy serves as a fallback: recipients of updates can specify ranges they
+would like to receive an MLAI for in the next update. They do this when they
+observe a range state that suggests that an update has been missed, in
+particular when a replica has no known MLAI stored for the (non-recent) lease.
+
+## Constructing outgoing updates
+
+To get in the right mindset of this, consider the simplified situation of a
+`Store` without any pending or (near) future write activity, that is, there are
+(and will be) no in-flight Raft proposals. Now, we want to send an initial CT
+update to another store. This means two things:
+
+1. the need to "close" a timestamp, i.e. preventing any future write activity visible
+   at this timestamp, for any write proposed by this store as a leaseholder (for the
+   current epoch).
+2. Tracking an *MLAI* for each replica (for which the lease for the epoch is held).
+
+The first requirement is roughly equivalent to bumping the low water mark of
+the timestamp cache to one logical tick above the desired closed timestamp
+(though doing that in practice would perform poorly).
+
+The second one is also straightforward: simply read the *Lease Applied Index* for
+each replica; since nothing is in-flight, that's all the followers need to know
+about.
+
+In reality, there will sometimes be ongoing writes on a replica for which we want
+to obtain an *MLAI*, and so 1) and 2) get more complicated.
+
+Instead of adjusting the timestamp cache, we introduce a dedicated data
+structure, the **minimum proposal tracker** (*MPT*), which tracks (at coarse
+granularity) the timestamps for which proposals are still ongoing. In
+particular, it can decide when it is safe to close out a higher timestamp than
+before. This replaces 1), but retrieving an *MLAI* is also less straightforward
+than before.
+
+Assume the replica shows a *Lease Applied Index* of 12, but three proposals are
+in-flight whereas another two have cleared the command queue but are still
+evaluating. Presumably the in-flight proposals were assigned to *Lease Applied
+Indexes* 13 through 15, and the ones being evaluated will receive 15 and 16
+(depending on the order in which they enter Raft). This is where the *MPT*'s
+second function comes in: it tracks writes until they are assigned a
+(provisional) *Lease Applied Index*, and makes sure that an authoritative *MLAI*
+delta is returned with each closed timestamp. This delta is *authoritative* in
+the sense that it will reflect the largest **proposed** *MLAI* seen relevant
+to the newly closed timestamp (relative to the previous one).
+
+Consequently when we say that a proposal is tracked, we're talking about the
+interval between determining the request timestamp (which is after clearing the
+command queue) and determining the proposal's *Lease Applied Index*.
+
+It's natural to ask whether there may be "false positives", i.e. whether a
+command proposed for some *Lease Applied Index* may never actually return from
+Raft with a corresponding update to the range state. The response is that this
+isn't possible: a command proposed to Raft is either retried until's clear that
+the desired *Lease Applied Index* has already been surpassed (in which case
+there is no problem) or the leaseholder process exits (in which case there will
+be a new leaseholder and previous in-flight commands that never made it into the
+log are irrelevant).
+
+The naive approach of tracking the maximum assigned lease applied index is
+problematic. To see this, consider the relevant example of a store that wants to
+close out a timestamp around five seconds in the past, but which has high write
+throughput on some range. Tracking the maximum proposed lease applied index
+until we close out the timestamp `now()-5s` means that a follower won't be able
+to serve reads until it has caught up on the last five seconds as well, even
+though they are likely not relevant to the reads it wants to serve. This
+motivates the precise form of the *MPT*, which has two adjacent "buckets" that
+it moves forward in time: one tracking proposals relevant to the next closed
+timestamp, and one with proposals relevant for the one after that.
+
+The MPT consists of the previously emitted closed timestamp (zero initially) and
+a prospective next closed timestamp aptly named `next` (always strictly larger
+than `closed`) at or below which new writes are not accepted. It also contains
+two ref counts and *MLAI* maps associated to below and above `next`,
+respectively.
+
+Its API is roughly the following:
+
+```go
+// t := NewTracker()
+
+// In Replica write path:
+waitCmdQueue(ba)
+applyTimestampCache(ba)
+ts, done:= t.Track(ba.Timestamp)
+ba.ForwardTimestamp(ts)
+proposal := evaluate(ba)
+proposal.LeaseAppliedIndex = <X>
+done(proposal.LeaseAppliedIndex)
+propose(proposal)
+
+// In periodic store-level loop:
+closedTS, mlaiMap := t.CloseWithNext(clock.Now()-TargetDuration)
+sendUpdateToPeers(closedTS, mlaiMap)
+```
+
+Note that by using this API for *any* proposal it is guaranteed that we produce
+all the updates promised to consumers of the CT updates. A few redundant pieces
+of information may be sent (i.e. for lease requests triggering on a follower
+range) but these are infrequent and cause no harm.
+
+In what follows we'll to through an example, which for simplicity assumes that
+all writes relate to the same range (thus reducing the *MLAI* maps to scalars).
+The state of the *MPT* is laid out as in the diagram below. You see a previously
+closed timestamp as well as a prospective next closed timestamp. There are three
+proposals tracked at timestamps strictly less than `next`, and one proposal at
+`next` or higher. Additionally, for proposals strictly less than `next`, the
+*MLAI* `8` was recorded while that for the other side is `17`.
+
+```
+   closed           next
+      |            @8 | @17
+      |            #3 | #1
+      |               |
+      v               v
+---------------------------------------------------------> time
+```
+
+Let's walk through an example of how the MPT works. For ease of illustration, we
+restrict to activity on a single replica (which avoids having a *map* of
+*MLAI*s; now it's just one). Initially, `closed` and `next` demarcate some time
+interval. Three commands arrive; `next`'s right side picks up a refcount of three
+(new commands are forced above `next`, though in this case they were there to begin
+with):
+
+```
+          closed    next    commands
+             |     @0 | @0     /\   \_______
+             |     #0 | #3    /  \          |
+             v        v       v  v          v
+------------------------------x--x----------x------------> time
+```
+
+Next, it's time to construct a CT update. Since `next`'s left has a refcount of
+zero, we know that nothing is in progress for timestamps below `next`, which
+will now officially become a closed timestamp. To do so, `next` is returned to
+the client along with the *MLAI* for its left (there is none this time around).
+Additionally, the data structure is set up for the next iteration: `closed` is
+forwarded to `next`, and `next` forwarded to a suitable timestamp some constant
+target duration away from the current time. The commands previously tracked
+ahead of `next` are now on its left. Note that even though one of the commands
+has a timestamp ahead of `next`, it is now tracked to its left. This is fine; it
+just means that we're taking a command into account earlier than required for
+correctness.
+
+```
+                                         next
+                                       @0 | @0
+                    closed   commands  #3 | #0
+                      |        /\   \_____|__
+                      |       /  \        | |
+                      v       v  v        v v
+------------------------------x--x----------x------------> time
+```
+
+Two of the commands get proposed (at *LAI*s, say, 10 and 11), decrementing
+the left refcount and adding an *MLAI* entry of 11 (the max of the two) to it.
+Additionally, two new commands arrive, this time at timestamps below `next`.
+ These commands are forced above `next` first, so the refcount goes to the right.
+These new commands get proposed quickly (so they don't show
+up again) and the right refcount will drop back to zero (though it will retain the
+max *MLAI* seen, likely 13).
+
+```
+                            in-flight
+                   closed    command     next
+                      |         \       @11| @0
+                      |          \      #1 | #2
+                      v          v         v
+---------------------------------x-----------------------> time
+                                           ʌ
+                                           |
+            _______________________________/
+           |   forwarding    |
+           |                 |
+       new command         new command
+   (finishes quickly @13) (finishes quickly @12)
+```
+
+The remaining command sticks around in the evaluation phase. This is
+unfortunate; it's time for another CT update, but we can't send a higher closed
+timestamp than before (and must stick to the same one with an empty *MLAI* map)
+
+```
+                  (blocked)             (blocked)
+                            in-flight
+                   closed    command     next
+                      |         \       @11| @13
+                      |          \      #1 | #0
+                      v          v         v
+---------------------------------x-----------------------> time
+```
+
+Finally the command gets proposed at LAI 14. A new command comes in at some
+reasonable timestamp and the right side picks up a ref. Note the resulting
+odd-looking situation in which the left is @14 and the right @13 (this is fine;
+the client tracks the maximum seen):
+
+```
+                   closed                next     in-flight
+                      |                 @14| @13  proposal
+                      |                 #0 | #1     |
+                      v                    v        v
+----------------------------------------------------x----> time
+```
+
+Time for the next CT update. We can finally close `next` (emitting @14) and move
+it to `now-target duration`, moving the right side refcount and *MLAI* to the
+left in the process.
+
+```
+                                         closed   in-flight  @13| @0
+                                           |      proposal   #1 | #0
+                                           |        |     _____/
+                                           |        |    /
+                                           v        v   v
+----------------------------------------------------x----> time
+```
+
+## Initial catch-up
+
+The main mechanism for propagating *MLAI*s is triggered by proposals. When an
+initial update is created, valid *MLAI*s have to be obtained for all ranges for
+which followers are supposed to be able to serve reads. This raises two practical
+questions: for which replicas should an *MLAI* be produced, and how to produce one.
+
+We create an *MLAI* for all ranges for which (at the time of checking) the
+current state indicates that the lease is held by the local store (this can have
+both false positives and false negatives but a missed follower read will trigger
+a proactive upgrade for the range it occurred on).
+
+The initial catch-up is simple: before closing a timestamp (via the MPT), iterate
+through all ranges and (if they show the store as holding the lease) feed the
+MPT a proposal that lets it know the most recent *Lease Applied Index* on that
+replica:
+
+```go
+_, done:= t.Track(hlc.Timestamp{})
+repl.mu.Lock()
+lai := repl.mu.lastAssignedLeaseIndex
+repl.mu.Unlock()
+done(lai)
+```
+
+This can race with other proposals, but the MPT will track the maximum seen.
+
+## Timestamp forwarding and intents
+
+We forward commands' timestamps in order to guarantee that they don't
+produce visible data at timestamps below the CT. A case in which that
+is less obvious is that of an intent.
+
+To see this, consider that a transaction has two relevant timestamps:
+`OrigTimestamp` (also known as its read timestamp) and `Timestamp`
+(also known as its commit timestamp). while the timestamp we forward
+is `Timestamp`, the transaction internally will in fact attempt to
+write at OrigTimestamp (but relies on moving these intents to their
+actual timestamp later, when they are resolved). This prevents certain
+anomalies, particularly with `SNAPSHOT` isolation.
+
+Naively, this violates the guarantee: we promise that no more data will appear
+below a certain timestamp. Note however that this data isn't visible at
+timestamps below the commit timestamp (which was forwarded): to read the value,
+the intent has to be resolved first, which implies that it will move at least to
+`Timestamp` in the progress, restoring the guarantee required.
+
+Similarly, this does not impede the usefulness of the CT mechanism for
+recovery: the restored consistent state may contain intents. But the
+restored consistent state also allows resolving all of the intents in
+the same way, since what matters is the transaction record. The result
+will be that the intents are simply dropped, unless there is a committed
+transaction record, in which case they will commit.
+
+Note that for the CDC use case, this closed timestamp mechanism is a necessary,
+but not sufficient, solution. In particular, a CDC consumer must find (or track)
+and resolve all intents at timestamps below a given closed timestamp first.
+
+## Splits/Merges
+
+No action is necessary for splits: the leaseholders of the LHS and RHS are
+colocated and so share the same closed timestamp mechanisms. For convenience an
+update for the RHS is added to the next round of outgoing updates, otherwise
+follower reads for the RHS would cut out for a moment.
+
+Merges are more interesting since the leaseholders of the RHS and the LHS are
+not necessarily colocated. If the RHS's store has closed a higher timestamp, say
+1000, while the LHS's store is only at 500, after the merge commands might be
+accepted on the combined range under the closed timestamp 500 that violate the
+closed timestamp 1000. To counteract this, the `GetSnapshotForMerge` operation
+returns the closed timestamp on the origin store and the merging replica
+takes it into account. Initially, the split trigger will populate the
+timestamp cache for the right side of the merge; if this has too big an impact
+on the timestamp cache (especially as merges are rolled out, we might merge
+away large swaths of empty ranges), we can also store the timestamp on the
+replica and use it to forward proposals manually.
+
+## Routing layer
+
+This RFC proposes a somewhat simplistic implementation at the routing layer: At
+`DistSender` and its DistSQL counterpart, if a read is for a timestamp earlier
+than the current time less a target duration (which adds comfortable padding to
+when followers are ideally able to serve these reads), it is sent to the nearest
+replica (as measured by health, latency, locality, and perhaps a jitter),
+instead of to the leaseholder.
+
+When a read is handled by a replica not equipped to serve it via a regular or
+follower read, a `NotLeaseHolderError` is returned and future requests for that
+same (chunk of) batch will make no attempt to use follower reads; this avoids
+getting stuck in an endless loop when followers lag significantly. Similarly,
+follower reads are never attempted for ranges known not to use epoch based
+leases.
+
+## Further work
+
+While the design outlined so far should give a reasonably performant baseline,
+it has several shortcomings that will need to be addressed in follow-up work:
+
+### Lagging followers
+
+Assume that timestamps are closed at a 5s target duration every second, and
+that the last proposal taken into account for each closed timestamp finishes
+evaluating just before the timestamp is closed out. In that case, the *MLAI*
+check on the followers is more likely to fail for a short moment until the Raft
+log has caught up with the very recent proposal; if the catch-up takes longer
+than the interval at which the timestamps are closed out, no follower read will
+ever be possible. A similar scenario applies to followers far removed from the
+usual commit quorum or lagging for any other reason. This should be fairly
+rare, but seems important enough to be tackled in follow-up work.
+
+The fundamental problem here is that older closed timestamps are discarded when
+a new one is received, resulting in the follower never catching up to the current
+closed timestamp. If it remembered the previous CT updates, it could at least
+serve reads for that timestamp. This calls for a mechanism that holds on to
+previous *CT*s and *MLAI*s so that reads further in the past can be served.
+This won't be implemented initially to keep the complexity in the first version
+to a minimum.
+
+One way to address the problem is the following: On receipt of a CT update, copy
+the CT and MLAI into the range state if the Raft log has caught up to the MLAI
+(keeping the most recently overwritten value around to serve reads for). This
+means that the replica will always have a valid CT during normal operation,
+though one that lags the received updates (various variations on this theme
+exist). However, note the strong connection to the following section:
+
+### Recovery from insufficient quorum
+
+As mentioned in the initial paragraphs, follower reads can help recover a
+recent consistent state of an unavailable cluster, by determining the maximum
+timestamp at which every range has a surviving replica that can serve a
+follower read (if all replicas of a range are lost, there is obviously no hope
+of consistent recovery).
+At this timestamp, a consistent read of the entire keyspace (excluding
+expiration-based ranges) can be carried out and used to construct a backup.
+Note that if expiration-based replicas persisted the last lease they held, the
+timestamp could be lowered to the minimum over all surviving expiration based
+replicas' last leases, for a consistent (but less recent) read of the *whole*
+keyspace.
+
+For maximum generality, it is desirable to in principle be able to recover
+without relying on in-memory state, so that a termination of the running
+process does not bar a subsequent recovery.
+
+Naively this can be achieved by persisting all received *CT* updates (with
+some eviction policy that rolls up old updates into a more recent initial
+state), though the eventual implementation may opt to persist at the Replica
+level instead (where updates caught up to can more easily be pruned).
+
+### Range feeds
+
+[Range feeds] are a range-level mechanism to stream updates to an upstream
+Change Data Capture processor. Range feeds will rely on closed timestamps and
+will want to relay them to an upstream consumer as soon as possible.  This
+suggests a reactive mechanism that notifies the replicas with an active Range
+feed on receipt of a CT update; given a registry of such replicas, this is easy
+to add.
+
+### `AS OF SYSTEM TIME RECENT`;
+
+With the advent of closed timestamps, we can also simplify `AS OF SYSTEM TIME`
+by allowing users to let the server chose a reasonable "recent" timestamp in
+the past for which reads can be distributed better. Note that, other than
+what was requested in [this issue][autoaost], there is no guarantee about
+blocking on conflicting writers. However, since a transaction that has
+`PENDING` status with a timestamp that has since been closed out is likely
+to have to restart (or ideally refresh) anyway, we could consider allowing it
+to be pushed.
+
+## Rationale and Alternatives
+
+This design appears to be the sane solution given boundary conditions.
+
+## Unresolved questions
+
+### Configurability
+
+For now, the min proposal timestamp roughly trails real time by five seconds.
+This can be made configurable, for example via a cluster setting or, if more
+granularity is required, via zone configs (which in turn requires being able to
+retrieve the history of the settings value or a mechanism that smears out the
+change over some period of time, to avoid failed follower reads).
+
+Transactions which exceed the lag are usually forced to restart, though this
+will often happen through a refresh (which is comparatively cheap, though it
+needs to be tested).
+
+[RangeFeed]: https://github.com/cockroachdb/cockroach/pull/26782
+[autoaost]: https://github.com/cockroachdb/cockroach/issues/25405