distsql: TxnCoordSender doesn't allow distribution if transaction changed data #13376

RaduBerinde · 2017-02-02T18:00:09Z

The TxnCoordSender assumes everything happens on a node, except for read-only transactions. DistSQL assumes that read-only queries can be distributed. The problem is in the difference between "transaction" and "query": if the transaction earlier modified data, a read-only query in that transaction apparently can't be distributed - i saw the error below while running TestLogicDistSQL on a 3 node cluster with the fake span resolver.

statement ok
BEGIN TRANSACTION

statement ok
CREATE TABLE kv (
  k CHAR PRIMARY KEY,
  v TIMESTAMP WITH TIME ZONE
)

statement ok
INSERT INTO kv (k,v) VALUES ('a', transaction_timestamp())

query T
SELECT k FROM kv      <----------

E170202 12:48:14.328709 17874 sql/distsqlrun/tablereader.go:159  [n3,fe9cf6117,TableReader=53] scan error: writing transaction timed out or ran on multiple coordinators
E170202 12:48:14.329505 17855 sql/distsqlrun/server.go:181  [n1] writing transaction timed out or ran on multiple coordinators

The select was simply running the scan on another node:

The text was updated successfully, but these errors were encountered:

tamird · 2017-02-02T19:45:06Z

The solution to this may be #10511.

…

On Thu, Feb 2, 2017 at 1:00 PM, RaduBerinde ***@***.***> wrote: The TxnCoordSender assumes everything happens on a node, except for read-only transactions. DistSQL assumes that read-only queries can be distributed. The problem is in the difference between "transaction" and "query": if the transaction earlier modified data, a read-only query in that transaction apparently can't be distributed - i saw the error below while running TestLogicDistSQL on a 3 node cluster with the fake span resolver. statement ok BEGIN TRANSACTION statement ok CREATE TABLE kv ( k CHAR PRIMARY KEY, v TIMESTAMP WITH TIME ZONE ) statement ok INSERT INTO kv (k,v) VALUES ('a', transaction_timestamp()) query T SELECT k FROM kv <---------- E170202 12:48:14.328709 17874 sql/distsqlrun/tablereader.go:159 [n3,fe9cf6117,TableReader=53] scan error: writing transaction timed out or ran on multiple coordinators E170202 12:48:14.329505 17855 sql/distsqlrun/server.go:181 [n1] writing transaction timed out or ran on multiple coordinators The select was simply running the scan on another node: [image: image] <https://cloud.githubusercontent.com/assets/16544120/22561644/0e1f14b4-e947-11e6-8d88-e29c3d5ddd3b.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13376>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABdsPDwDoLGVYK0IZdJl1oe9fqN_MIVoks5rYhmugaJpZM4L1cAM> .

RaduBerinde · 2017-02-02T20:02:36Z

Thanks, I meant to reference that issue but I forgot.

bdarnell · 2017-02-02T20:37:40Z

The fact that read-only transactions can be distributed like this is an accident. If we decide it's a feature we want to support (and I don't see why we wouldn't), we could simply skip the "multiple coordinator" check for read-only batches. No need for anything as intrusive as #10511.

RaduBerinde · 2017-02-02T20:44:41Z

It used to be like that; the check was made more strict to handle a subtle condition, details in 5fec899

Temporary workaround for cockroachdb#13376.

RaduBerinde · 2017-02-08T14:27:01Z

The scenario addressed by the "multiple coordinator" check (5fec899) is:

we have a transaction in which we wrote something
the txn gets abandoned (timed out)
a subsequent read-only query in that txn reads something that we wrote earlier

The read at the end should error out; without the check it would just run and it won't see anything that was written earlier in that txn (traces of that txn are gone).

One way to check for this wile still allowing distribution of the read-only query is to move up the check into sql. If the txn has an anchor key, the sql layer could call into the TxnCoordSender to verify that it still has the txn on file. It can check this after every statement it runs (when it still has a chance to return an error for the statement).

Note that this may become problematic if we ever stream results directly to the client (we would stream incorrect results before the error comes).

andreimatei · 2017-02-08T15:54:14Z

FWIW this change that introduced the check in TxnCoordSender was merged last March, a few weeks before the AbortCache was introduced to prevent similar problems (in 9fea9f7b0b3183987d32fe93548560460a1b8cb5).

bdarnell · 2017-02-08T19:38:17Z

The abort cache (like the sequence cache that did the same thing and preceded 5fec899) cannot (reliably) help with timed-out transactions, because the abort cache entry becomes eligible for GC at the same time the transaction times out.

So to reliably avoid the anomaly fixed in #5323, we need to either lift the transaction liveness check up to the sql/distsql level like @RaduBerinde suggested, or push it down to each node that handles part of the transaction (basically, allow multiple coordinators as long as only one writes, but each coordinator would be responsible for heartbeating or polling the transaction status).

Temporary workaround for cockroachdb#13376.

andreimatei · 2017-05-17T16:14:08Z

Note to self: currently the "writing transaction timed out or ran on multiple coordinators" check is in the wrong place. It's done in TxnCoordSender.Send, on the request's way out to the DistSender. But it should (also?) be done on the way in. We want to also check that the transaction has not been abandoned while the respective read was in progress.

…ashion The key idea here is to stop using a single TxnCoordSender but instead provide a factory for stateful instances of TxnCoordSender to be used with each client.DB transaction. The state stored by the TxnCoordSender can be fetched, collected, passed, and then used to augment a parent or sibling TxnCoordSender. This allows DistSQL to use the TxnCoordSender just like normal SQL and paves the way for mutating distributed SQL transactions. With this change, we get the added benefit of observed timestamps being correct returned to the root transaction through distributed sql flows, which should prevent unnecessary restarts from uncertainty interval errors. Fixes cockroachdb#10511 Fixes cockroachdb#13376 Release note: None

…ashion The key idea here is to stop using a single TxnCoordSender but instead provide a factory for stateful instances of TxnCoordSender to be used with each client.DB transaction. The state stored by the TxnCoordSender can be fetched, collected, passed, and then used to augment a parent or sibling TxnCoordSender. This allows DistSQL to use the TxnCoordSender just like normal SQL and paves the way for mutating distributed SQL transactions. With this change, we get the added benefit of observed timestamps being correct returned to the root transaction through distributed sql flows, which should prevent unnecessary restarts from uncertainty interval errors. Fixes cockroachdb#10511 Fixes cockroachdb#13376 Release note (dist sql): Removes the limitation that distributed SQL is not used once a transaction has had writes.

RaduBerinde assigned andreimatei and RaduBerinde Feb 2, 2017

RaduBerinde mentioned this issue Feb 2, 2017

distsql: make TestLogicDistSQL run on a 3-node cluster w/fake span resolver #13377

Closed

5 tasks

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 2, 2017

distsql: don't use distsql for writing txns

8dfc6db

Temporary workaround for cockroachdb#13376.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 7, 2017

distsql: don't use distsql for writing txns

660e812

Temporary workaround for cockroachdb#13376.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 8, 2017

distsql: don't use distsql for writing txns

ce7988b

Temporary workaround for cockroachdb#13376.

cuongdo added this to the Q2 2017 milestone Feb 22, 2017

RaduBerinde mentioned this issue Apr 17, 2017

distsql: tracking issue for enabling DistSQL by default #13666

Closed

18 tasks

andreimatei mentioned this issue Apr 18, 2017

client: move Transaction initialization into constructor #15024

Closed

dianasaur323 modified the milestones: 1.1, Q2 2017 Apr 20, 2017

andreimatei modified the milestones: 1.2, 1.1 Aug 11, 2017

RaduBerinde mentioned this issue Oct 17, 2017

distsql: tracking issue for feature parity gaps #19320

Closed

13 tasks

spencerkimball mentioned this issue Jan 14, 2018

kv: change TxnCoordSender so that many can be used in a distributed fashion #21447

Merged

spencerkimball closed this as completed in #21447 Jan 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distsql: TxnCoordSender doesn't allow distribution if transaction changed data #13376

distsql: TxnCoordSender doesn't allow distribution if transaction changed data #13376

RaduBerinde commented Feb 2, 2017

tamird commented Feb 2, 2017 via email

RaduBerinde commented Feb 2, 2017

bdarnell commented Feb 2, 2017

RaduBerinde commented Feb 2, 2017

RaduBerinde commented Feb 8, 2017

andreimatei commented Feb 8, 2017

bdarnell commented Feb 8, 2017

andreimatei commented May 17, 2017

distsql: TxnCoordSender doesn't allow distribution if transaction changed data #13376

distsql: TxnCoordSender doesn't allow distribution if transaction changed data #13376

Comments

RaduBerinde commented Feb 2, 2017

tamird commented Feb 2, 2017 via email

RaduBerinde commented Feb 2, 2017

bdarnell commented Feb 2, 2017

RaduBerinde commented Feb 2, 2017

RaduBerinde commented Feb 8, 2017

andreimatei commented Feb 8, 2017

bdarnell commented Feb 8, 2017

andreimatei commented May 17, 2017