Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpcc/nodes=3/w=max failed with foreign key violation #35812

Closed
nvanbenschoten opened this issue Mar 15, 2019 · 13 comments
Closed

roachtest: tpcc/nodes=3/w=max failed with foreign key violation #35812

nvanbenschoten opened this issue Mar 15, 2019 · 13 comments
Assignees
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting

Comments

@nvanbenschoten
Copy link
Member

Pulled from #35337 (comment).

SHA:

https://github.com/cockroachdb/cockroach/commits/57e825a7940495b67e0cc7213a5fabc24e12be0e

Failed test:

https://teamcity.cockroachdb.com/viewLog.html?buildId=1176948&tab=buildLog

The test failed on master:
	cluster.go:1251,tpcc.go:126,cluster.go:1589,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1176948-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1350 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 243.7          253.9   3489.7  10737.4  28991.0  49392.1 newOrder
		   43m5s        0           25.0           25.6   1208.0   3221.2   3221.2   3221.2 orderStatus
		   43m5s        0          314.6          255.0   3623.9  27917.3  77309.4 103079.2 payment
		   43m5s        0           23.0           25.5   2080.4   3489.7  11274.3  11274.3 stockLevel
		   43m6s        0           15.0           25.5   4563.4  17179.9  34359.7  34359.7 delivery
		   43m6s        0          248.3          253.9   3221.2  13421.8  47244.6 103079.2 newOrder
		   43m6s        0           23.0           25.6    805.3   2952.8   2952.8   2952.8 orderStatus
		   43m6s        0          180.2          255.0   2818.6  15032.4  40802.2 103079.2 payment
		   43m6s        0           22.0           25.5    771.8   3087.0   3087.0   3087.0 stockLevel
		Error: error in newOrder: ERROR: foreign key violation: value [914 3 3050] not found in order@primary [o_w_id o_d_id o_id] (txn=f995c83d-a90d-4856-9b61-72713ca22db5) (SQLSTATE 23503)
		Error:  exit status 1
		: exit status 1
	cluster.go:1610,tpcc.go:136,tpcc.go:160,test.go:1214: Goexit() was called

Artifacts:

https://drive.google.com/open?id=1bQWTo6DOlNj8ie1cFepZdMNMirRANMq8

A few theories we can immediately ignore:

this was a very long running transaction

We see from the workload logs a cumulative ops/sec for new_order transactions of 253.9. The load ran for 43m6s, so we expect to have performed about 253.9*(43*60+5) = 656,332 new order transactions.

We see from the error that this was order 3050 in warehouse 914 and district 3. Each district begins the workload with 3000 orders. That means that this was the 50th new_order transaction performed for this warehouse/district.

The test ran with 1350 warehouses, which means it ran with 13500 unique districts. Given a uniform distribution of new_order transactions, we expect the 50th order for a given district to take place after 1350*10*49 = 661,500 completed new_order transactions. Since this is close to our estimate for the total number of new_order transactions performed, we can conclude that the victim transaction began very recently.

the transaction was aborted and its abort span was already GCed

The abort span is not removed until an hour after the transaction is last active:

// TxnCleanupThreshold is the threshold after which a transaction is
// considered abandoned and fit for removal, as measured by the
// maximum of its last heartbeat and timestamp. Abort spans for the
// transaction are cleaned up at the same time.
//
// TODO(tschottdorf): need to enforce at all times that this is much
// larger than the heartbeat interval used by the coordinator.
const TxnCleanupThreshold = time.Hour

The load was only running for 43 minutes, so no abort spans should have been GCed.

@nvanbenschoten
Copy link
Member Author

A QueryIntent request being run on a follower replica as a follower read might result in an issue like this. Follower reads were enabled at this SHA:

var FollowerReadsEnabled = settings.RegisterBoolSetting(
"kv.closed_timestamp.follower_reads_enabled",
"allow (all) replicas to serve consistent historical reads based on closed timestamp information",
true,
)

@nvanbenschoten nvanbenschoten self-assigned this Mar 16, 2019
@nvanbenschoten
Copy link
Member Author

Actually quite the opposite, if we saw an inconsistency where a write that the transaction performed was dropped before the transaction came back to read it, the QueryIntent appended to the front of the read should be protecting us and resulting in a RETRY_ASYNC_WRITE_FAILURE error.

So why didn't the QueryIntent throw an error? Perhaps the write was already proven by a previous FK check? I'm not sure whether that is possible though, as the only table with a foreign key on the order table is the new_order table, and there should only be a single batched access to that table, which I assume would result in a single batched fk check. Perhaps this batch got split and the latter group went to a follower who was not sufficiently caught up?

Unrelated note: a new_order transaction will write anywhere from 5 to 15 order_line rows, each of which will point to the same order row. Do we really perform 5-15 separate fk checks? I really hope not.

Unrelated note 2: I think we're missing a foreign key from the new_order table to the order table. The spec says that we should have one.

@tbg
Copy link
Member

tbg commented Mar 18, 2019

So the sequence of events is something like this?

  • txn inserts into order (pipelined write)
  • txn potentially does other things
  • txn comes back and tries to validate the FK (its earlier insert into order)
  • QueryIntent sent to order (successful)
  • read from order shows key is missing?

If we think this has to do with follower reads, maybe it's worth running with a very aggressive closed timestamp duration? I have a half-baked WIP that I think will allow us to run with much smaller closed timestamp durations than today, without permanently wedging longer txns. I didn't test this in any real scenarios though and know that is still has some issues.

Also, this could "just" be a replica inconsistency, though I really hope it isn't (and the test doesn't do anything like restarting nodes or even serious rebalancing).

@tbg tbg added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Mar 18, 2019
@nvanbenschoten
Copy link
Member Author

So the sequence of events is something like this?

Yes, exactly. As Bram pointed out in #26786, each row performs its own fk check, so even though there was only a single SQL statement that should have needed to perform an fk check on the order row, we actually ended up issuing 5-15 individual ScanRequests. This explains how the pipelined order write was proven by a QueryIntent and a later lookup was able to miss the row without a prefixed QueryIntent.

Interestingly, with only a single fk lookup, this kind of issue would have been turned into a retryable RETRY_ASYNC_WRITE_FAILURE. I wonder if we've seen this before but masked behind an increase in these kinds of errors.

The good news is that I dropped kv.closed_timestamp.target_duration down to 5s and hit this failure on 2 of my 4 instances of this, so I think it's very likely that this is due to a bad interaction with follower reads. It's actually not immediately obvious that follower reads within a writing transaction should always preserve a read-your-writes property. I think the implicit assumption that makes it work is that a transaction will never read at a lower timestamp than it has previously written at. I wonder if we are failing to maintain that invariant somehow. Or perhaps this is just a standard closed timestamp bug.

cc. @ajwerner do you have any insight into this?

@ajwerner
Copy link
Contributor

Do you know that a follower read is happening? / Does the repro disappear if you disable it?

We should isolate whether it’s an unsafe follower read vs an interaction with the mpt.

Are there any concurrent splits or lease transfers?

@ajwerner
Copy link
Contributor

Reading above that, it was likely due to a txn with an old time stamp so probably a follower read.

@tbg
Copy link
Member

tbg commented Mar 19, 2019

If a txn got pushed since it wrote some of its intents at a higher timestamp than planned, and would restart on commit, are the fk checks run anyway? If they used the read timestamp they might miss the intent via a follower read.

@nvanbenschoten
Copy link
Member Author

Good point, we do use the OrigTimestamp when determining follower read eligibility, which can absolutely diverge from provisional commit Timestamp:

canUseFollowerRead(clusterID, st, ba.Txn.OrigTimestamp)

That violates the monotonic timestamp requirement mentioned above.

I'm going to make the following change and run again:

+++ b/pkg/ccl/followerreadsccl/followerreads.go
@@ -93,7 +93,7 @@ func canUseFollowerRead(clusterID uuid.UUID, st *cluster.Settings, ts hlc.Timest
 // canSendToFollower implements the logic for checking whether a batch request
 // may be sent to a follower.
 func canSendToFollower(clusterID uuid.UUID, st *cluster.Settings, ba roachpb.BatchRequest) bool {
-       return ba.IsReadOnly() && ba.Txn != nil &&
+       return ba.IsReadOnly() && ba.Txn != nil && !ba.Txn.IsWriting() &&
                canUseFollowerRead(clusterID, st, ba.Txn.OrigTimestamp)
 }

I suspect that doing so will fix this issue immediately. If so, I don't think we have any reason to suspect that this is a more severe closed timestamp issue.

@nvanbenschoten
Copy link
Member Author

The following diff would also work though:

+++ b/pkg/ccl/followerreadsccl/followerreads.go
@@ -94,7 +94,7 @@ func canUseFollowerRead(clusterID uuid.UUID, st *cluster.Settings, ts hlc.Timest
 // may be sent to a follower.
 func canSendToFollower(clusterID uuid.UUID, st *cluster.Settings, ba roachpb.BatchRequest) bool {
        return ba.IsReadOnly() && ba.Txn != nil &&
-               canUseFollowerRead(clusterID, st, ba.Txn.OrigTimestamp)
+               canUseFollowerRead(clusterID, st, ba.Txn.Timestamp)
 }

We can decide which one we'd like to go with once it's confirmed that this fixes the issue.

@nvanbenschoten
Copy link
Member Author

This alone didn't fix the issue. I think that makes sense, given that the diff doesn't actually prevent a writing transaction's batch from evaluating as a follower read, it just reduces the chance that one will be sent to a follower. I'm testing again with an additional check here.

@nvanbenschoten
Copy link
Member Author

That seemed to do the trick!

@tbg
Copy link
Member

tbg commented Mar 19, 2019

Great, one less thing to worry about. Are we going to just file this and disable follower reads for writing txns in the 19.1 time frame?

@nvanbenschoten
Copy link
Member Author

Are we going to just file this and disable follower reads for writing txns in the 19.1 time frame?

Yes, that's the plan. There isn't much of a pressing reason for getting it to work at the risk of other bugs.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 19, 2019
Fixes cockroachdb#35812.

To avoid missing its own writes, a transaction must not evaluate a read
on a follower who has nit caught up to at least its current provisional
commit timestamp. We were violating this both at the DistSender level and
at the Replica level.

Because the ability to perform follower reads in a writing transaction is
fairly unimportant and has these known issues, this commit disallows
follower reads for writing transactions.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 21, 2019
Fixes cockroachdb#35812.

To avoid missing its own writes, a transaction must not evaluate a read
on a follower who has nit caught up to at least its current provisional
commit timestamp. We were violating this both at the DistSender level and
at the Replica level.

Because the ability to perform follower reads in a writing transaction is
fairly unimportant and has these known issues, this commit disallows
follower reads for writing transactions.

Release note: None
craig bot pushed a commit that referenced this issue Mar 21, 2019
35969: kv: disallow follower reads for writing transactions r=nvanbenschoten a=nvanbenschoten

Fixes #35812.

To avoid missing its own writes, a transaction must not evaluate a read
on a follower who has nit caught up to at least its current provisional
commit timestamp. We were violating this both at the DistSender level and
at the Replica level.

Because the ability to perform follower reads in a writing transaction is
fairly unimportant and has these known issues, this commit disallows
follower reads for writing transactions.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #35969 Mar 21, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 21, 2019
Fixes cockroachdb#35812.

To avoid missing its own writes, a transaction must not evaluate a read
on a follower who has nit caught up to at least its current provisional
commit timestamp. We were violating this both at the DistSender level and
at the Replica level.

Because the ability to perform follower reads in a writing transaction is
fairly unimportant and has these known issues, this commit disallows
follower reads for writing transactions.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

3 participants