Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: concurrent txn use detected #75209

Open
tbg opened this issue Jan 20, 2022 · 3 comments
Open

kvcoord: concurrent txn use detected #75209

tbg opened this issue Jan 20, 2022 · 3 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Jan 20, 2022

Describe the problem

We see this fire in roachtests, such as #73713 (comment).

// SendLocked implements the lockedSender interface.
func (gs *txnLockGatekeeper) SendLocked(
ctx context.Context, ba roachpb.BatchRequest,
) (*roachpb.BatchResponse, *roachpb.Error) {
// If so configured, protect against concurrent use of the txn. Concurrent
// requests don't work generally because of races between clients sending
// requests and the TxnCoordSender restarting the transaction, and also
// concurrent requests are not compatible with the span refresher in
// particular since refreshing is invalid if done concurrently with requests
// in flight whose spans haven't been accounted for.
//
// As a special case, allow for async heartbeats to be sent whenever.
if !gs.allowConcurrentRequests && !ba.IsSingleHeartbeatTxnRequest() {
if gs.requestInFlight {
return nil, roachpb.NewError(
errors.AssertionFailedf("concurrent txn use detected. ba: %s", ba))
}
gs.requestInFlight = true
defer func() {
gs.requestInFlight = false
}()
}
// Note the funky locking here: we unlock for the duration of the call and the
// lock again.
gs.mu.Unlock()
defer gs.mu.Lock()
return gs.wrapped.Send(ctx, ba)
}

To Reproduce

See the roachtests.

Expected behavior
The AssertionFailedf branches should be dead code in prod.

cc @cockroachdb/kv

Jira issue: CRDB-12514

@tbg tbg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 20, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jan 20, 2022
@erikgrinaker
Copy link
Contributor

The logic here originally allowed concurrent EndTxn(abort) requests, since these could be sent at any time, but we changed this in #65592 because it could cause intent leaks. There was some discussion at the time about how to handle asynchronous EndTxn(abort) calls, but it was believed that only the txnHeartbeater could send such async calls, and it would now avoid doing so if there were any in-flight requests, so there should not be any concurrent EndTxn(abort) requests. This assumption was later found to be violated in #69966, and while that's been fixed, there may well be other edge cases here.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Aug 24, 2023

still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

No branches or pull requests

3 participants