DB Backend: report explicit error when transactions are used concurrently #37172

camdencheek · 2022-06-13T22:20:09Z

This updates our transaction wrapper to return an explicit error whenever a transaction is used concurrently. Concurrent transaction access is a form of race condition that causes (in my experiments) either a conn busy error, a bad connection error, or a panic. So, instead of getting these errors that are very difficult to debug, this PR ~~throws an error~~ logs aan error with the stack trace that actually describes what's wrong. These errors should get pushed to Sentry, which will allow us to track where this is happening

Stacked on https://github.com/sourcegraph/sourcegraph/pull/37167

Test plan

Added DB tests that exercise the error return.

camdencheek · 2022-06-13T22:20:59Z

internal/database/basestore/store_test.go


 	"github.com/sourcegraph/sourcegraph/internal/database/dbtest"
 	"github.com/sourcegraph/sourcegraph/internal/database/dbutil"
 	"github.com/sourcegraph/sourcegraph/lib/errors"
 )

 func TestTransaction(t *testing.T) {
-	db := dbtest.NewDB(t)
+	db := dbtest.NewRawDB(t)


unrelated, but we don't need a db with a frontend schema in these tests

camdencheek · 2022-06-13T22:28:21Z

internal/database/basestore/handle.go

+}
+
+func (t *lockingTx) lock() error {
+	if !t.mu.TryLock() {


woah, a valid use of the new TryLock()!

Have to admit I'm a bit skeptical of using TryLock() followed by Lock() in the same function (and there's caveats around using TryLock). But I also don't see another way around it that doesn't end up doing the same thing under the hood (semaphore, atomic compare and swap, etc.)

So...Russ Cox is totally correct, including in this case. It is always incorrect to use a transaction concurrently, including after this PR.

However, my concerns are slightly different than Russ's. I want to do anything I can to prevent a panic in production, which might include implementing some imprecise logic that might avoid a handful of panics.

That said, the more important thing here IMO is that we report on incorrect usage, which this does with basically the same consistency as any race detector. This is what TryLock allows us to do: report when the invariant that a transaction should never be used concurrently is violated. A plain Lock cannot do this.

Sure. I was just wondering whether we need the TryLock for example or could use a one-word value instead?

type lockingTx struct { tx *sql.Tx mu sync.Mutex inUse bool logger log.Logger } func (t *lockingTx) lock() { if t.inUse { // For now, log an error, but try to serialize access anyways to try to // keep things slightly safer. err := errors.WithStack(ErrConcurrentTransactionAccess) t.logger.Error("transaction used concurrently", log.Error(err)) } t.mu.Lock() t.inUse = true } func (t *lockingTx) unlock() { t.inUse = false t.mu.Unlock() }

(Edit: ... and that's what I meant with "it all ends up being the same" 😄 because this is not different than what TryLock does somewhere under the hood, but I guess TryLock is now the officially supported version, since I'm not even 100% sure 1-word vars are safe to read concurrently like that.)

It's not safe to read inUse outside the mutex though, right?

camdencheek · 2022-06-13T22:50:44Z

@sourcegraph/code-intel The failures here appear to be actual problems caught by this change. As an example, a transaction is created here, then writeVisibleUploads is called on that tx here, which starts multiple batch writer goroutines on that transaction here.

coury-clark · 2022-06-13T22:56:39Z

@camdencheek I think you meant to ping code intel above?

camdencheek · 2022-06-13T22:56:58Z

Whoops, yep, thanks @coury-clark. Does editing a comment actually ping? Pinging @sourcegraph/code-intel because I don't know. (sorry for the noise Insights team)

camdencheek · 2022-06-14T20:27:45Z

In order to make this PR mergeable without blocking on Code Intel folks (who I think are all at an offsite), I've updated this PR to instead just log an error and attempt to synchronize it (best effort, will not work in many cases). This way, we will at least get sentry errors and log messages that will notify us of other places this is happening.

sourcegraph-bot · 2022-06-14T20:29:14Z

Codenotify: Notifying subscribers in CODENOTIFY files for diff ee784c5...61b0a38.

Notify	File(s)
@efritz	internal/database/basestore/handle.go internal/database/basestore/store_test.go

internal/database/basestore/handle.go

unknwon · 2022-06-15T03:06:44Z

internal/database/basestore/store_test.go

+				}
+				defer func() { err = tx.Done(err) }()
+
+				return tx.Exec(ctx, sqlf.Sprintf(`INSERT INTO store_counts_test VALUES (%s, 42)`, i))


Couldn't we just have the first one sleeps for 5 seconds, and fail on the second start? So we don't need 100 goroutines 😂

Ha, that's a much better idea!

Co-authored-by: Joe Chen <[email protected]>

…ntly (#37172) This updates our transaction wrapper to return an explicit error whenever a transaction is used concurrently. Concurrent transaction access is a form of race condition that causes (in my experiments) either a conn busy error, a bad connection error, or a panic. So, instead of getting these errors that are very difficult to debug, this logs error with the stack trace that actually describes what's wrong. These errors should get pushed to Sentry, which will allow us to track where this is happening

cla-bot bot added the cla-signed label Jun 13, 2022

camdencheek changed the base branch from main to cc/remove-new-untyped-db June 13, 2022 22:20

camdencheek commented Jun 13, 2022

View reviewed changes

camdencheek force-pushed the cc/remove-new-untyped-db branch from 173d292 to ab6c232 Compare June 14, 2022 17:48

Base automatically changed from cc/remove-new-untyped-db to main June 14, 2022 17:58

camdencheek added 2 commits June 14, 2022 12:05

make concurrent transaction usage loud

d63d365

update comment

3eb6b8b

camdencheek force-pushed the cc/safe-transactions branch from f340c94 to 3eb6b8b Compare June 14, 2022 18:05

camdencheek added 3 commits June 14, 2022 13:30

demote error to log message and update tests

f6b62b2

add witty aphorism

ba7ea7a

fix unset logger

23b28b6

camdencheek marked this pull request as ready for review June 14, 2022 20:27

camdencheek requested review from efritz, mrnugget and unknwon June 14, 2022 20:29

unknwon approved these changes Jun 15, 2022

View reviewed changes

Update internal/database/basestore/handle.go

61b0a38

Co-authored-by: Joe Chen <[email protected]>

mrnugget approved these changes Jun 15, 2022

View reviewed changes

camdencheek added the team/search-product Issues owned by the search product team label Jun 15, 2022

use sleeps instead of 100 goroutines

a5d2f3c

camdencheek merged commit 291a3df into main Jun 15, 2022

camdencheek deleted the cc/safe-transactions branch June 15, 2022 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB Backend: report explicit error when transactions are used concurrently #37172

DB Backend: report explicit error when transactions are used concurrently #37172

camdencheek commented Jun 13, 2022 •

edited

Loading

camdencheek Jun 13, 2022

camdencheek Jun 13, 2022

mrnugget Jun 15, 2022

camdencheek Jun 15, 2022 •

edited

Loading

mrnugget Jun 16, 2022 •

edited

Loading

camdencheek Jun 16, 2022

camdencheek commented Jun 13, 2022 •

edited

Loading

coury-clark commented Jun 13, 2022

camdencheek commented Jun 13, 2022 •

edited

Loading

camdencheek commented Jun 14, 2022

sourcegraph-bot commented Jun 14, 2022 •

edited

Loading

unknwon Jun 15, 2022

camdencheek Jun 15, 2022

DB Backend: report explicit error when transactions are used concurrently #37172

DB Backend: report explicit error when transactions are used concurrently #37172

Conversation

camdencheek commented Jun 13, 2022 • edited Loading

Test plan

camdencheek Jun 13, 2022

Choose a reason for hiding this comment

camdencheek Jun 13, 2022

Choose a reason for hiding this comment

mrnugget Jun 15, 2022

Choose a reason for hiding this comment

camdencheek Jun 15, 2022 • edited Loading

Choose a reason for hiding this comment

mrnugget Jun 16, 2022 • edited Loading

Choose a reason for hiding this comment

camdencheek Jun 16, 2022

Choose a reason for hiding this comment

camdencheek commented Jun 13, 2022 • edited Loading

coury-clark commented Jun 13, 2022

camdencheek commented Jun 13, 2022 • edited Loading

camdencheek commented Jun 14, 2022

sourcegraph-bot commented Jun 14, 2022 • edited Loading

unknwon Jun 15, 2022

Choose a reason for hiding this comment

camdencheek Jun 15, 2022

Choose a reason for hiding this comment

camdencheek commented Jun 13, 2022 •

edited

Loading

camdencheek Jun 15, 2022 •

edited

Loading

mrnugget Jun 16, 2022 •

edited

Loading

camdencheek commented Jun 13, 2022 •

edited

Loading

camdencheek commented Jun 13, 2022 •

edited

Loading

sourcegraph-bot commented Jun 14, 2022 •

edited

Loading