Use full txn struct instead of just txn ID #55

spencerkimball · 2014-09-11T22:31:42Z

Move from dealing with just transaction ID to using the full transaction
struct in write intents and in request / response headers.

Cleaned up some code in MVCC that was unnecessarily complex and introduced
concept of transaction epochs for Put() and for ResolveIntent(). Transaction
epochs start at 0 and are incremented every time a transaction is retried.
Transactions are retried (instead of aborted) in the event that a value is
encountered within the "window of uncertainty" between RequestHeader.Timestamp
and RequestHeader.MaxTimestamp, or in the case of an SSI transaction, if
the final transaction timestamp is not equal to the initial transaction
timestamp.

Typically, when a transaction has to restart, it covers all of the same
keys on writes and increments the intents' epochs accordingly. However,
if there's conditional logic in the transaction, it might end up skipping
some keys which were written the previous run--those earlier intents should
then not be committed when the intents are resolved after txn commit.

spencerkimball · 2014-09-11T22:32:13Z

@bdarnell @Tobstar87

bdarnell · 2014-09-12T15:34:36Z

proto/api.proto

+  // transaction should always be sent with subsequent requests which
+  // are part of the same transaction. If multiple requests are sent
+  // in parallel as part of the same transaction, then subsequent
+  // specification of Txn should use the maximum of all returned


It makes me nervous to make the Transaction non-opaque to clients; this could be an obstacle to future changes to the struct. What if we make EndTransactionRequest take a list of Transactions, and require that each ResponseHeader.Txn must be passed back to the server (either in some RequestHeader that trades it in for a new Transaction in the response, or in the final EndTransactionRequest). We could retain the "use the maximum timestamp" rule as a heuristic when applications have multiple Transactions to choose from, but it would no longer be required for correctness.

Gotcha...I started work on EndTransaction and came to the conclusion that the Transaction object returned in the ResponseHeader shouldn't change / be supplied at all except for as a response to the first request in the transaction (specifying an empty, non-nil Transaction to start the transaction).

Instead of having the transaction timestamp being munged, we instead always rely on the client to pass up the maximum timestamp seen so far via RequestHeader.Timestamp from all ResponseHeader.Timestamps. This is far more reasonable and allows us to have a non-mutable transaction object for the length of a single run of a transaction.

I changed this PR to include a BeginTransaction API call. Here are the pros:

Eliminates confusion in the API as to when a txn should be supplied, whether it should be empty, whether or not isolation or priority should be included even if TxnID is missing, etc. It was getting much too confusing.

BeginTransaction is satisfied immediately by the receiving node--despite having a key which it's targeted at (this is how we build up the locality-aware txn ID), it's not passed to the range leader; all we need to return a valid Txn struct is a clock synchronized with the cluster. This means that BeginTransaction will be as fast as a round trip to the client's gateway node.

Building on the previous point, many transactions will want to fan out parallel writes. In order to do that, you need a txn structure. Previously, we would have had to do a "real" request of some sort which would have involved the latency incurred from either route-to-leader for read or to achieve raft consensus for write.

Cons:

Requires a round trip to gateway node even if txn could have been done as a sequential set of commands where the first could have usefully created the Txn struct as well.

Feels like making the API more explicit weighed most heavily for me. The cost seems minimal given we only hit gateway node.

bdarnell · 2014-09-12T15:37:42Z

LGTM

spencerkimball · 2014-09-13T00:03:47Z

@bdarnell: might want to take another gander at this one. It's changed a lot.

jiangmingyang · 2014-09-13T05:28:16Z

storage/engine/mvcc.go

+		// We can update the current metadata as long as the timestamp is
+		// greater than or equal and if there's a transaction underway,
+		// the current epoch must be greater.
+		if !timestamp.Less(meta.Timestamp) && (meta.Txn == nil || txn.Epoch >= meta.Txn.Epoch) {


don't we need to compare the txnID for the second condition?

No because it's already checked by the if block immediately preceding it. At this if statement, we're already guaranteed that if there's an existing intent, the txn IDs match.

oh, yes, i missed that part. sgtm

jiangmingyang · 2014-09-13T05:32:41Z

LGTM with two comments.

spencerkimball · 2014-09-13T18:35:17Z

@jiangmingyang @bdarnell I added another commit to the PR addressing some additional items. Please take another look.

jiangmingyang · 2014-09-13T19:39:30Z

lgtm for the new changes.

tbg · 2014-09-14T09:04:19Z

storage/range.go

 	reply.Rows = kvs
 	reply.SetGoError(err)
 }

+// BeginTransaction returns an error, as it's handled trivially at
+// any cockroach node and doesn't need to be addressed to a range.
+func (r *Range) BeginTransaction(args *proto.BeginTransactionRequest, reply *proto.BeginTransactionResponse) {


why even have it here then?

Hmm, good point. Will remove.

struct in write intents and in request / response headers. Cleaned up some code in MVCC that was unnecessarily complex and introduced concept of transaction epochs for Put() and for ResolveIntent(). Transaction epochs start at 0 and are incremented every time a transaction is retried. Transactions are retried (instead of aborted) in the event that a value is encountered within the "window of uncertainty" between RequestHeader.Timestamp and RequestHeader.MaxTimestamp, or in the case of an SSI transaction, if the final transaction timestamp is not equal to the initial transaction timestamp. Typically, when a transaction has to restart, it covers all of the same keys on writes and increments the intents' epochs accordingly. However, if there's conditional logic in the transaction, it might end up skipping some keys which were written the previous run--those earlier intents should then not be committed when the intents are resolved after txn commit.

…cation of Transaction and Timestamp. In particular, the Transaction object is now not supplied in the Response except for in the case of BeginTransaction and HeartbeatTransaction requests. Clarified the comments for ResponseHeader to indicate that the request Timestamp must be continually upgraded to be the maximum of all response Timestamps.

- Use write batch for all multi-put ops in MVCC. - Delete previous intent versions when overwriting an intent with a newer timestamp within same transaction. This can happen on transaction retries. - When resolving an intent, make sure to properly update the intent value's timestamp if the commit timestamp is different from the original timestamp the value was written at.

Use full txn struct instead of just txn ID

bdarnell · 2014-09-14T19:21:13Z

LGTM

We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()

spencerkimball assigned jiangmingyang Sep 11, 2014

spencerkimball force-pushed the spencerkimball/store-txn branch from 56d4350 to dfe6f93 Compare September 12, 2014 14:14

bdarnell reviewed Sep 12, 2014
View reviewed changes

spencerkimball force-pushed the spencerkimball/store-txn branch 2 times, most recently from 97b3979 to 44dc8cc Compare September 13, 2014 00:03

jiangmingyang reviewed Sep 13, 2014
View reviewed changes

tbg reviewed Sep 14, 2014
View reviewed changes

spencerkimball force-pushed the spencerkimball/store-txn branch 2 times, most recently from c843fe1 to 954d76a Compare September 14, 2014 14:05

Spencer Kimball added 3 commits September 14, 2014 10:50

spencerkimball force-pushed the spencerkimball/store-txn branch from 954d76a to 5d5ec34 Compare September 14, 2014 14:51

spencerkimball added a commit that referenced this pull request Sep 14, 2014

Merge pull request #55 from cockroachdb/spencerkimball/store-txn

5c66ec3

Use full txn struct instead of just txn ID

spencerkimball merged commit 5c66ec3 into master Sep 14, 2014

spencerkimball deleted the spencerkimball/store-txn branch September 14, 2014 14:51

spencerkimball restored the spencerkimball/store-txn branch September 14, 2014 14:51

spencerkimball deleted the spencerkimball/store-txn branch September 14, 2014 15:25

petermattis mentioned this pull request Jun 5, 2018

storage: dropping a large table will brick a cluster due to compactions #24029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use full txn struct instead of just txn ID #55

Use full txn struct instead of just txn ID #55

spencerkimball commented Sep 11, 2014

spencerkimball commented Sep 11, 2014

bdarnell Sep 12, 2014

spencerkimball Sep 12, 2014

bdarnell commented Sep 12, 2014

spencerkimball commented Sep 13, 2014

jiangmingyang Sep 13, 2014

spencerkimball Sep 13, 2014

jiangmingyang Sep 13, 2014

jiangmingyang commented Sep 13, 2014

spencerkimball commented Sep 13, 2014

jiangmingyang commented Sep 13, 2014

tbg Sep 14, 2014

spencerkimball Sep 14, 2014

bdarnell commented Sep 14, 2014

Use full txn struct instead of just txn ID #55

Use full txn struct instead of just txn ID #55

Conversation

spencerkimball commented Sep 11, 2014

spencerkimball commented Sep 11, 2014

bdarnell Sep 12, 2014

Choose a reason for hiding this comment

spencerkimball Sep 12, 2014

Choose a reason for hiding this comment

bdarnell commented Sep 12, 2014

spencerkimball commented Sep 13, 2014

jiangmingyang Sep 13, 2014

Choose a reason for hiding this comment

spencerkimball Sep 13, 2014

Choose a reason for hiding this comment

jiangmingyang Sep 13, 2014

Choose a reason for hiding this comment

jiangmingyang commented Sep 13, 2014

spencerkimball commented Sep 13, 2014

jiangmingyang commented Sep 13, 2014

tbg Sep 14, 2014

Choose a reason for hiding this comment

spencerkimball Sep 14, 2014

Choose a reason for hiding this comment

bdarnell commented Sep 14, 2014