Cursor Isolation

Notes

A consistent read is done using a snapshot.
A current read is done using the latest committed version of a row and latest means right now, not the snapshot from statement start.
A locking read is done as part of the WHERE clause processing for INSERT ... SELECT, UPDATE, DELETE and SELECT ... FOR UPDATE statements.

Features

Read committed and repeatable read will be provided.
Read committed and repeatable read provide consistent read for SELECT statements via a snapshot per statement for read committed and snapshot per transaction for repeatable read.
Rows matched for locking-reads are locked until commit or rollback. This is pessimistic concurrency control (CC).
For read committed locking reads use the current committed value of a row. When a row is locked by a concurrent transaction the lock attempt will wait until unlock or timeout and the waiter will re-evaluate the search condition for the row to confirm it still is matched.
For repeatable read the behavior has yet to be decided when the row is changed after the transaction's snapshot (see write-conflict for Snapshot isolation below). If the transaction waits on a locked row and the locking transaction does a rollback, then the transaction can proceed.

TODO:

Determine whether locking reads use consistent or current reads.
Determine whether early unlock can be done for rows when the row does not match non-index predicates.

Background

The following briefly explains cursor isolation for Postgres, Oracle and MySQL/InnoDB because all use snapshot isolation, like RocksDB. While there is a standard for isolation levels the standard allows a lot of room for implementations to differ. The following is a good overview for the problem in general:

Anomalies

Anomalies are behaviors that can occur when transactions run concurrently. They are defined with lock-based concurrency control (CC) in mind and the first 3 are used to define cursor isolation levels.

dirty read - occurs when a transaction reads uncommitted changes from another transaction
non-repeatable read - occurs when a row is read more than once during a transaction and doesn’t have the same value for all reads
phantom read - occurs when the same query (WHERE clause) is used more than once in a transaction and doesn’t get the same results because rows have been added to or removed from the result set
write-skew - this is not one of the official anomalies but is a common race condition with concurrency. It occurs when transactions are not serializable. Let trx1 do select count() from A and then increment a count in table B. Let trx2 do select count() from B and increments a count in table A. The result from this when done concurrently might not match the result from a serial execution.

Cursor isolation levels are defined in terms of anomalies they allow:

serializable - transactions run as though there were a serial order. This prevents anomalies at the cost of performance.
repeatable read - phantom reads may occur with lock-based concurrency control and write-skew with snapshot-based concurrency control.
read committed - any data read is committed as of the read time. Non-repeatable and phantom reads may occur. Note that for snapshot-based CC the point-in-time is usually fixed via a snapshot at statement start. For lock-based CC the point-in-time is when the cursor reaches that row.
read uncommitted - dirty, non-repeatable and phantom reads may occur

These are some examples of anomalies:

Implementations

Snapshot isolation is used by Oracle, Postgres and MySQL/InnoDB. The transaction usually operates on a snapshot taken at statement or transaction start. Anomalies occur when rows viewed by insert, update & delete statements have changes committed by concurrent transactions after the snapshot. An example is when trxA gets a snapshot at time 10 and attempts to update a row that has a change committed by trxB at time 11. How this is resolved depends on the vendor and isolation level. Alas, behavior is not standardized. But this is the write-conflict for snapshot isolation described below for each of the implementations.

TokuDB uses gap locks and their docs state that InnoDB behavior is provided.

NuoDB uses different terminology but the WRITE_COMMITTED mode is similar to InnoDB REPEABLE_READ in that the WHERE clause for write operations sees changes from commits that came after the snapshot. The first blog post covers non-indexed tables and this one covers indexes.

Oracle

see the manual here, here, and here
Restart done so that UPDATE gets consistent result set is described here, [here] (http://pages.cs.wisc.edu/~anhai/courses/764-sp07-anhai/oracle.locking.htm), here, here, and in chapter 7
supports read committed and serializable
transactions see own changes but statements do not
read committed - uses snapshot per statement. Always enforces statement-level read consistency so that a statement sees data from one point in time. Locks rows that are touched by INSERT, UPDATE, DELETE until commit. Blocks on rows locked by concurrent transactions.
Documentation is limited for resolving the write-conflict described in Snapshot isolation above. The goal for Oracle is to provide a transaction-consistent view to INSERT, UPDATE and DELETE statements. One way it does that is by restarting statements with a snapshot after the commit from concurrent transactions. In the Snapshot isolation example the statement for trxA can be restarted. I have yet to read a description of this in Oracle docs. Is there a limit on the max number of restarts? Are there optimizations to avoid a restart when a row is changed but remains in the rows matched by the WHERE clause?
serializable - describe this later (TODO)

PostgreSQL

see the manual and a conference paper on serializable snapshot isolation
supports read committed, repeatable read and serializable
transactions see own changes but statements do not
read committed - plain SELECT gets snapshot at statement start and can see uncommitted changes from self. UPDATE, DELETE, SELECT FOR {SHARE, UPDATE} behave like SELECT when searching for target rows but also lock rows and block on rows locked by other transactions. When finally able to lock rows the search conditions are re-evaluated if the row was changed to determine whether the row still matches.
repeatable read - uses snapshot from transaction start. Phantom reads are not possible. UPDATE, DELETE, SELECT FOR {SHARE, UPDATE} behave like SELECT when searching for target rows and lock matched rows. This transaction can wait for such locks. Whether or not it waits, if a concurrent transaction (one that started after this transactions snapshot) commits changes to rows matched by this transaction then this transaction rolls back with “could not serialize error”. I wonder if roll back can be avoided when the found row was changed concurrently but is still in the set of found rows after the change.
serializable - uses a new algorithm and appears to be very clever. Describe this later (TODO).

InnoDB

see the manual, details on when locking reads are done and some internal details on read isolation and transaction locks
supports dirty read, read committed, repeatable read and serializable
transactions see own changes but statements do not
read committed - uses snapshot per statement for plain SELECT. SELECT FOR UPDATE, INSERT, UPDATE, DELETE read current committed values that may be more recent than the snapshot when a concurrent transaction updated them after the snapshot.
repeatable read - uses snapshot per transaction for plain SELECT. See the “read committed” for SELECT FOR UPDATE, INSERT, UPDATE, DELETE. Next-key locking provides repeatable reads for SELECT FOR UPDATE, INSERT, UPDATE, DELETE. Next key locks are implemented by locking index entries and the lock implies that the gap to the previous index entry is also locked. The use of “current” rather than “consistent” reads for INSERT, UPDATE, DELETE can cause surprises. For example a transaction might do SELECT to figure out what rows might match and then when an UPDATE is done with the same WHERE clause the matched rows can differ. Postgres can raise an error in that case. Oracle does not do repeatable read. For more details see the manual sections on record level locks and next key locking.
serializable - gets lock of share locks. Describe this later (TODO).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly