kvserver,rac2: turn off raftReceiveQueue.maxLen enforcement in apply_… #136969

sumeerbhola · 2024-12-07T17:15:55Z

…to_all mode

The existing maxLen enforcement is already dubious:

Length does not equal bytes, so offers limited protection from OOMs.
The limit is per replica and not an aggregate.
We run a cooperative system, and historically the sender has respected RaftConfig.RaftMaxInflightBytes, which is a byte limit. The only reason for additional protection on the receiver is when there are rapid repeated leader changes for a large number of ranges for which the receiver has replicas. Even in this case, the behavior is surprising since the receive queue overflows even though the sender has done nothing wrong -- and it is very unlikely that this overflow is actually protecting against an OOM.

With RACv2 in apply_to_all mode, the senders have a 16MiB regular token pool that applies to a whole (store, tenant) pair. This is stricter than the per range defaultRaftMaxInflightBytes (32MiB), both in value, and because it is an aggregate limit. The aforementioned "only reason" is even more unnecessary. So we remove this in the case of apply_to_all.

An alternative would be to have replicaSendStream respect the receiver limit in apply_to_all mode. The complexity there is that replicaSendStream grabs a bunch of tokens equal to the byte size of the send-queue, and expects to send all those messages. To respect a count limit, it will need to quickly return tokens it can't use (since they are a shared resource), which adds complexity to the already complex token management logic.

Fixes #135851

Epic: none

Release note: None

blathers-crl · 2024-12-07T17:15:59Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2024-12-07T17:16:04Z

This change is

sumeerbhola

I'll add tests if the approach looks ok.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)

pav-kv

The approach looks ok.

pkg/kv/kvserver/store.go

pkg/kv/kvserver/store_raft.go

kvoli

The approach looks ok.

Likewise from me.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

sumeerbhola

TFTRs!

Tests are ready.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)

pkg/kv/kvserver/store.go

pkg/kv/kvserver/store_raft.go

kvoli

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)

pkg/kv/kvserver/store_raft_test.go line 233 at r2 (raw file):

	defer leaktest.AfterTest(t)()

	skip.UnderStress(t, "slow test")

nit: duress will also skip under stress:

cockroach/pkg/testutils/skip/skip.go

Line 185 in 46e8e0f

return util.RaceEnabled || Stress() || syncutil.DeadlockEnabled

pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

		checkingMu.Unlock()
		enforceMaxLen = !enforceMaxLen
		time.Sleep(time.Millisecond)

Whats the sleep for?

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli and @pav-kv)

pkg/kv/kvserver/store_raft_test.go line 233 at r2 (raw file):

Previously, kvoli (Austen) wrote…

nit: duress will also skip under stress:

cockroach/pkg/testutils/skip/skip.go

Line 185 in 46e8e0f

return util.RaceEnabled || Stress() || syncutil.DeadlockEnabled

Done

pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Whats the sleep for?

I was trying various ways to make this fail by introducing bugs in the real code. Found that doing a sleep here gives more opportunity to various goroutines to enter their read critical section and call LoadOrCreate, which then ends up being concurrent with the call to SetEnforceMaxLen.

kvoli

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)

kvoli

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)

pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, sumeerbhola wrote…

I was trying various ways to make this fail by introducing bugs in the real code. Found that doing a sleep here gives more opportunity to various goroutines to enter their read critical section and call LoadOrCreate, which then ends up being concurrent with the call to SetEnforceMaxLen.

Ack, that makes sense then. I actually thought it might have been for the opposite reason.

pav-kv

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @sumeerbhola)

pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Ack, that makes sense then. I actually thought it might have been for the opposite reason.

Add a comment to this extent?

pkg/kv/kvserver/store_raft.go line 158 at r3 (raw file):

	q.acc.Init(context.Background(), qs.mon)
	q, loaded = qs.m.LoadOrStore(rangeID, q)
	if !loaded {

style nit: don't indent large blocks

if loaded {
	return q, true
}

// The sampling of ...
for {
	...
}
return q, false

pkg/kv/kvserver/store_raft_test.go line 255 at r3 (raw file):

			rng, _ := randutil.NewTestRand()
			defer wg.Done()
			for {

Add a limit to how much this can spin?

…to_all mode The existing maxLen enforcement is already dubious: - Length does not equal bytes, so offers limited protection from OOMs. - The limit is per replica and not an aggregate. - We run a cooperative system, and historically the sender has respected RaftConfig.RaftMaxInflightBytes, which is a byte limit. The only reason for additional protection on the receiver is when there are rapid repeated leader changes for a large number of ranges for which the receiver has replicas. Even in this case, the behavior is surprising since the receive queue overflows even though the sender has done nothing wrong -- and it is very unlikely that this overflow is actually protecting against an OOM. With RACv2 in apply_to_all mode, the senders have a 16MiB regular token pool that applies to a whole (store, tenant) pair. This is stricter than the per range defaultRaftMaxInflightBytes (32MiB), both in value, and because it is an aggregate limit. The aforementioned "only reason" is even more unnecessary. So we remove this in the case of apply_to_all. An alternative would be to have replicaSendStream respect the receiver limit in apply_to_all mode. The complexity there is that replicaSendStream grabs a bunch of tokens equal to the byte size of the send-queue, and expects to send all those messages. To respect a count limit, it will need to quickly return tokens it can't use (since they are a shared resource), which adds complexity to the already complex token management logic. Fixes cockroachdb#135851 Epic: none Release note: None

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli and @pav-kv)

pkg/kv/kvserver/store_raft.go line 158 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

style nit: don't indent large blocks

if loaded {
	return q, true
}

// The sampling of ...
for {
	...
}
return q, false

Done

pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Add a comment to this extent?

Done

pkg/kv/kvserver/store_raft_test.go line 255 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Add a limit to how much this can spin?

We want these goroutines to keep working while the code below is changing enfoceMaxLen. There shouldn't be two places to decide when to stop since then there will be some time interval where the test is ineffective but still running.

I've added the following comment to make it clear that this loop will stop.

// Loop until the doneCh is closed.

sumeerbhola · 2024-12-13T01:22:29Z

bors r=kvoli,pav-kv

craig · 2024-12-13T02:19:59Z

Build succeeded:

sumeerbhola requested review from pav-kv and kvoli December 7, 2024 17:15

sumeerbhola requested review from a team as code owners December 7, 2024 17:15

sumeerbhola commented Dec 7, 2024

View reviewed changes

sumeerbhola force-pushed the raft_recv_q branch from 061e7c9 to 9654320 Compare December 7, 2024 18:09

pav-kv reviewed Dec 9, 2024

View reviewed changes

pkg/kv/kvserver/store.go Outdated Show resolved Hide resolved

pkg/kv/kvserver/store_raft.go Outdated Show resolved Hide resolved

kvoli reviewed Dec 10, 2024

View reviewed changes

sumeerbhola force-pushed the raft_recv_q branch from 9654320 to da7f553 Compare December 11, 2024 02:18

sumeerbhola requested review from kvoli and pav-kv December 11, 2024 02:19

sumeerbhola commented Dec 11, 2024

View reviewed changes

pkg/kv/kvserver/store.go Outdated Show resolved Hide resolved

pkg/kv/kvserver/store_raft.go Outdated Show resolved Hide resolved

sumeerbhola force-pushed the raft_recv_q branch from da7f553 to 49fbd0e Compare December 11, 2024 16:18

kvoli approved these changes Dec 11, 2024

View reviewed changes

pav-kv approved these changes Dec 11, 2024

View reviewed changes

sumeerbhola force-pushed the raft_recv_q branch from 49fbd0e to 69f06ee Compare December 11, 2024 21:32

sumeerbhola requested review from kvoli and pav-kv December 11, 2024 21:32

sumeerbhola commented Dec 11, 2024

View reviewed changes

kvoli approved these changes Dec 11, 2024

View reviewed changes

pav-kv requested a review from kvoli December 12, 2024 11:17

pav-kv approved these changes Dec 12, 2024

View reviewed changes

sumeerbhola force-pushed the raft_recv_q branch from 69f06ee to ae3299b Compare December 12, 2024 21:09

sumeerbhola requested a review from pav-kv December 12, 2024 21:09

sumeerbhola commented Dec 12, 2024

View reviewed changes

kvoli approved these changes Dec 13, 2024

View reviewed changes

craig bot merged commit a55295b into cockroachdb:master Dec 13, 2024
21 of 22 checks passed

sumeerbhola mentioned this pull request Dec 13, 2024

roachtest: perturbation/metamorphic/backfill failed #137392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver,rac2: turn off raftReceiveQueue.maxLen enforcement in apply_… #136969

kvserver,rac2: turn off raftReceiveQueue.maxLen enforcement in apply_… #136969

sumeerbhola commented Dec 7, 2024

blathers-crl bot commented Dec 7, 2024

cockroach-teamcity commented Dec 7, 2024

sumeerbhola left a comment

pav-kv left a comment

kvoli left a comment

sumeerbhola left a comment

kvoli left a comment

sumeerbhola left a comment

kvoli left a comment

kvoli left a comment

pav-kv left a comment

sumeerbhola left a comment

sumeerbhola commented Dec 13, 2024

craig bot commented Dec 13, 2024

kvserver,rac2: turn off raftReceiveQueue.maxLen enforcement in apply_… #136969

kvserver,rac2: turn off raftReceiveQueue.maxLen enforcement in apply_… #136969

Conversation

sumeerbhola commented Dec 7, 2024

blathers-crl bot commented Dec 7, 2024

cockroach-teamcity commented Dec 7, 2024

sumeerbhola left a comment

Choose a reason for hiding this comment

pav-kv left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

pav-kv left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola commented Dec 13, 2024

craig bot commented Dec 13, 2024