Skip to content

Commit

Permalink
rfc: SELECT FOR UPDATE
Browse files Browse the repository at this point in the history
RFC regarding support of SELECT ... FOR UPDATE SQL syntax.
See Issue #6583.
  • Loading branch information
rytaft committed Oct 31, 2017
1 parent 2d07ef4 commit dd3d9ee
Showing 1 changed file with 286 additions and 0 deletions.
286 changes: 286 additions & 0 deletions docs/RFCS/20171024_select_for_update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
- Feature Name: Support `SELECT FOR UPDATE`
- Status: draft
- Start Date: 2017-10-17
- Authors: Rebecca Taft
- RFC PR: (PR # after acceptance of initial draft)
- Cockroach Issue: [#6583](https://github.com/cockroachdb/cockroach/issues/6583)

# Summary

Support the `SELECT ... FOR UPDATE` SQL syntax, which locks rows returned by the `SELECT`
statement. This pessimistic locking feature prevents concurrent transactions from updating
any of the locked rows until the locking transaction commits or aborts. Several potential customers
have asked for this feature, and it would also get us closer to feature parity with Postgres.
The easiest way to implement this in CockroachDB would be by setting row-level "dummy" intents,
but another option would be to set span-level locks.

# Motivation

As described in [this issue](https://github.com/cockroachdb/cockroach/issues/6583),
`SELECT ... FOR UPDATE` is not standard SQL, but many databases now support it, including
Postgres. Several third party products such as the [Quartz Scheduler](http://www.quartz-scheduler.org),
[OpenJPA](http://openjpa.apache.org) and [Liquibase](http://www.liquibase.org)
also rely on this feature, preventing some potential customers from switching to CockroachDB.

In some cases, `SELECT ... FOR UPDATE` is required to maintain correctness when running CockroachDB
in `SNAPSHOT` mode. In particular, `SELECT ... FOR UPDATE` can be used to prevent write skew anomalies.
Write skew anomalies occur when two concurrent transactions read an overlapping set of
rows but update disjoint sets of rows. Since the transactions each operate on private snapshots of
the database, neither one will see the updates from the other.

The [Wikipedia entry on Snapshot Isolation](https://en.wikipedia.org/wiki/Snapshot_isolation) has a
useful concrete example:

> ... imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either
V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0).
Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200
from V1, and T2 withdrawing $200 from V2. .... T1 and T2 operate on private snapshots of the database:
each deducts $200 from an account, and then verifies that the new total is zero, using the other account
value that held when the snapshot was taken. Since neither update conflicts, both commit successfully,
leaving V1 = V2 = -$100, and V1 + V2 = -$200.

`SELECT ... FOR UPDATE` is not needed for correctness when running in `SERIALIZABLE` mode,
but it may still be useful for controlling lock ordering and avoiding deadlocks. For example,
consider the following schedule:

```
T1: Starts transaction
T2: Starts transaction
T1: Updates row A
T2: Updates row B
T1: Wants to update row B (blocks)
T2: Wants to update row A (deadlock)
```

This sort of scenario can happen in any database that tries to maintain some level of correctness.
It is especially common in databases that use pessimistic two-phased locking (2PL) since transactions
must acquire shared locks for reads in addition to exclusive locks for writes. But deadlocks like the
one shown above also happen in databases that use MVCC like PostgreSQL and CockroachDB, since writes must
acquire locks on all rows that will be updated. Postgres and many other systems detect deadlocks by
identifying cycles in a "waits-for" graph, where nodes represent transactions, and directed edges represent
transactions waiting on each other to release locks. If a cycle (deadlock) is detected, transactions will be
selectively aborted until the cycle(s) are removed. Some other systems use a timeout mechanism, where
transactions will abort after waiting a certain amount of time to aquire a lock. In either case, the
deadlock causes delays and aborted transactions.

`SELECT ... FOR UPDATE` will help avoid deadlocks by allowing transactions to aqcuire all of their locks
up front. For example, the above schedule would change to the following:

```
T1: Starts transaction
T2: Starts transaction
T1: Locks rows A and B
T1: Updates row A
T2: Wants to update row B (blocks)
T1: Updates row B
T1: Commits
T2: Updates row B
T2: Updates row A
T2: Commits
```

Since `T1` locked rows A and B at the start of the transaction, the deadlock was prevented.
`SELECT ... FOR UPDATE` won't eliminate deadlocks, but it will make them less likely.

Many implementations of this feature also include options to control whether or not to wait on locks.
`SELECT ... FOR UPDATE NOWAIT` is one option, which causes the query to return an error if it is
unable to immediately lock all target rows. This is useful for latency-critical situations,
and could also be useful for auto-retrying transactions in CockroachDB. `SELECT ... FOR UPDATE SKIP LOCKED`
is another option, which returns only the rows that could be locked immediately, and skips over the others.
This option returns an inconsistent view of the data, but may be useful for cases when multiple
workers are trying to process data in the same table as if it were a queue of tasks.
The default behavior of `SELECT ... FOR UPDATE` is for the transaction to block if some of the
target rows are already locked by another transaction. Note that it is not possible to use the
`NOWAIT` and `SKIP LOCKED` modifiers without `FOR { UPDATE | SHARE | ... }`.

The first implementation of `FOR UPDATE` in CockroachDB will not include `NOWAIT` or `SKIP LOCKED` options.
It seems that some users want these features, but many would be satisfied with `FOR UPDATE` alone.
This proposal will implement the semantics that users expect for `FOR UPDATE`: `SELECT ... FOR UPDATE` will lock
all of the rows specified by the `SELECT` with exclusive locks until the transaction is committed or aborted.
There will be cases when additional rows are locked as well due to the mechanism for setting row-level
intents in CockroachDB, but that will not change the semantics. It could affect performance, though, if
other transactions are blocked due to the additional locks.

# Guide-level explanation

The [Postgres Documentation](https://www.postgresql.org/docs/current/static/sql-select.html#sql-for-update-share)
describes this feature as it is supported by Postgres. As shown, the syntax of the locking clause has the form

```
FOR lock_strength [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ]
```

where `lock_strength` can be one of

```
UPDATE
NO KEY UPDATE
SHARE
KEY SHARE
```

For our initial implementation in CockroachDB, we will likely simplify this syntax to

```
FOR UPDATE
```

i.e., no variation in locking strength, no specified tables, and no options for avoiding
waiting on locks. Using `FOR UPDATE` will result in locking the rows touched by the `SELECT` query
with exclusive locks. As described above, this feature alone is useful because it helps
maintain correctness when running CockroachDB in `SNAPSHOT` mode (avoiding write skew), and serves
as a tool for optimization (avoiding deadlocks) when running in `SERIALIZABLE` mode.

For example, consider the following transaction:

```
BEGIN;
SELECT * FROM employees WHERE name = 'John Smith' FOR UPDATE;
...
UPDATE employees SET salary = 50000 WHERE name = 'John Smith';
COMMIT;
```

will lock the rows of all employees named John Smith at the beginning of the transaction.
In the context of CockroachDB, "lock" corresponds to setting a "write intent" on a row,
preventing other concurrent transactions from simultaneously updating that row.
As a result, the `UPDATE employees ...` statement at the end of the transaction will not need
to acquire any additional locks (i.e., will not need to set additional row-level write intents).
Note that `FOR UPDATE` will have no effect if it is used in a stand-alone query that is not
part of any transaction.

In contrast to the above example, if the first `SELECT` statement in the transaction is
`SELECT * FROM employees WHERE name like '%Smith' FOR UPDATE;`, CockroachDB will lock
all of the rows in the `employees` table because it's not possible to determine from the predicate
which key spans are affected. This lack of precision will be an issue for any predicate that
does not directly translate to particular key spans. This is a departure from Postgres, since Postgres
generally locks exactly the rows returned by the query.

# Reference-level explanation

This section provides more detail about how and why the CockroachDB implementation of
the locking clause will differ from Postgres.

With the current model of CockroachDB, it is not possible to support the locking strengths
`NO KEY UPDATE` or `KEY SHARE` because
these options require locking at a sub-row granularity. It is also not clear that CockroachDB can support
`SHARE`, because there is currently no such thing as a "read intent". `UPDATE` can be supported by
marking the affected rows with dummy write intents.

By default, if `FOR UPDATE` is used in Postgres without specifying tables
(without the `OF table_name [, ...]` clause),
Postgres will lock all rows returned by the `SELECT` query. The `OF table_name [, ...]` clause
enables locking only the rows in the specified tables. To lock different tables with different
strengths or different options, Postgres users can string multiple locking clauses together.
For example,

```
SELECT * from employees e, departments d, companies c
WHERE e.did = d.id AND d.cid = c.id
AND c.name = `Cockroach Labs`
FOR UPDATE OF employees SKIP LOCKED
FOR SHARE OF departments NOWAIT
```
locks rows in the `employees` table that satisfy the join condition with an exclusive lock,
and skips over rows that are already locked by another transaction.
It also locks rows in the `departments` table that satisfy the join condition with a shared lock,
and returns an error if it cannot lock all of the rows immediately. It does not lock
the `companies` table.

Implementing this flexibility in CockroachDB for use of different tables and different options
may be excessively complicated, and it's not clear that our customers actually need
it. To avoid spending too much time on this, as mentioned above, we will probably just implement the most
basic functionality in which clients use `FOR UPDATE` to lock the rows touched by the query.
Initially we won't include the `SKIP LOCKED` or `NOWAIT` options, but it may be worth implementing
these at some point.

At the moment it is not possible to use `FOR UPDATE` in views (there will not be an error, but it will
be ignored). This is similar to the way `ORDER BY` and `LIMIT` are handled in views. See comment from
@a-robinson in [data_source.go:getViewPlan()](https://github.com/cockroachdb/cockroach/blob/5a6b4312a972b74b0af5de53dfdfb204dc0fd6d7/pkg/sql/data_source.go#L680). If `ORDER BY` and `LIMIT` are supported later, `FOR UPDATE` would come for free.
Postgres supports all of these options in views, since it supports any `SELECT` query, and re-runs
the query each time the view is used.

Another potential difference between the Postgres and CockroachDB implementations
relates to the precision of locking. In general, Postgres locks exactly the rows returned
by the query, and no more. There are a few examples given in the
[documentation](https://www.postgresql.org/docs/current/static/sql-select.html#sql-for-update-share)
where that's not the case. For example, `SELECT ... LIMIT 5 OFFSET 5 FOR UPDATE` may
lock up to 10 rows even though only 5 rows are returned. It may be more difficult to
be precise with locking in CockroachDB, since locking happens at the KV layer, and
predicates may be applied later. This should not affect correctness in terms of consistency or
isolation, but could affect performance if there is high contention. As described above,
when running in `SNAPSHOT` mode, `SELECT ... FOR UDPATE` will prevent write skew anomalies,
and in `SERIALIZABLE` mode, `SELECT ... FOR UDPATE` can help prevent deadlocks. These semantics
are unchanged if extra rows are locked.

## Detailed design

There are a number of changes that will need to be implemented in order to
support `FOR UPDATE`.

- Update the parser to support the syntax in `SELECT` statements.
- Update the KV API to include new messages ScanForUpdate and ReverseScanForUpdate
- Update the KV and storage layers to mimic processing of Scan and ReverseScan
and set dummy write intents on every row touched.

As described by Ben in [issue #6583](https://github.com/cockroachdb/cockroach/issues/6583),
`git grep -i reversescan` provides an idea of the scope of the change. The bulk of the changes
would consist of implementing new ScanForUpdate and ReverseScanForUpdate calls in the KV API.
These would work similarly to regular scans, but A) would be flagged as read/write commands
instead of read-only and B) after performing the scan, they'd use MVCCPut to write back the
values that were just read (that's not the most efficient way to do things, but Ben thinks it's
the right way to start since it will have the right semantics without complicating the
backwards-compatibility story). Then the SQL layer would use these instead of plan
Scan/ReverseScan when FOR UPDATE has been requested.

There was some discussion about whether we really needed new API calls, but Ben pointed out
that making it possible to write on `Scan` requests would make debugging a nightmare.

## Drawbacks

It seems that there is sufficient demand for <i>some</i> form of the `FOR UPDATE` syntax. However,
there are pros and cons to implementing or not implementing certain features.

- It will be a lot of work to implement all of the features supported by Postgres.
This is probably not worth our time since it's not clear these features will actually get used.
We can easily add them later if needed.
- If we use the approach of setting intents on every row touched by the `SELECT`, it
could hurt performance if we are selecting a large range. But setting intents on every
row will be easier to implement than the alternatives discussed below.

## Rationale and Alternatives

The proposed solution is to lock rows by writing dummy write intents on each row.
However, another alternative is to lock an entire Range if the `SELECT` statement
would return the majority of rows in the range. This is similar to the approach
suggested by the [Revert Command RFC](https://github.com/cockroachdb/cockroach/pull/16294).

The advantage of locking entire ranges is that it would significantly improve the performance
for large scans compared to setting intents on individual rows. The downside is that
this feature is not yet implemented, so it would be significantly more effort than using
simple row-level intents. It's also not clear that customers would use `FOR UPDATE` with
large ranges, so this may be an unneeded performance optimization. Furthermore,
locking the entire range based on the predicate could result in locking rows that should
not be observable by the `SELECT`. For instance, if the transaction performing the
`SELECT FOR UPDATE` query over some range is at a lower timestamp than a later `INSERT`
within that range, the `FOR UPDATE` lock should not apply to the newly written row.
This issue is probably not any worse than the other problems with locking precision described above,
though.

One advantage of implementing range-level locking is that we could reuse this feature
for other applications such as point-in-time recovery. The details of the proposed implementation
as well as other possible applications are described in the [Revert Command RFC](https://github.com/cockroachdb/cockroach/pull/16294).
However, in the interest of getting something working sooner rather than later,
I believe row-level intents make more sense at this time.

## Unresolved questions

- Should we move forward with implementing only `FOR UPDATE` at first,
or are some of the other features essential as well?
- Should we stick with row-level intents or implement range-level locking?
- Should we be concerned about locking precision (i.e., lock exactly the
rows returned by query and no more)?

0 comments on commit dd3d9ee

Please sign in to comment.