Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,rac2: turn off raftReceiveQueue.maxLen enforcement in apply_… #136969

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

sumeerbhola
Copy link
Collaborator

…to_all mode

The existing maxLen enforcement is already dubious:

  • Length does not equal bytes, so offers limited protection from OOMs.
  • The limit is per replica and not an aggregate.
  • We run a cooperative system, and historically the sender has respected RaftConfig.RaftMaxInflightBytes, which is a byte limit. The only reason for additional protection on the receiver is when there are rapid repeated leader changes for a large number of ranges for which the receiver has replicas. Even in this case, the behavior is surprising since the receive queue overflows even though the sender has done nothing wrong -- and it is very unlikely that this overflow is actually protecting against an OOM.

With RACv2 in apply_to_all mode, the senders have a 16MiB regular token pool that applies to a whole (store, tenant) pair. This is stricter than the per range defaultRaftMaxInflightBytes (32MiB), both in value, and because it is an aggregate limit. The aforementioned "only reason" is even more unnecessary. So we remove this in the case of apply_to_all.

An alternative would be to have replicaSendStream respect the receiver limit in apply_to_all mode. The complexity there is that replicaSendStream grabs a bunch of tokens equal to the byte size of the send-queue, and expects to send all those messages. To respect a count limit, it will need to quickly return tokens it can't use (since they are a shared resource), which adds complexity to the already complex token management logic.

Fixes #135851

Epic: none

Release note: None

@sumeerbhola sumeerbhola requested review from pav-kv and kvoli December 7, 2024 17:15
@sumeerbhola sumeerbhola requested review from a team as code owners December 7, 2024 17:15
Copy link

blathers-crl bot commented Dec 7, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add tests if the approach looks ok.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks ok.

pkg/kv/kvserver/store.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/store_raft.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks ok.

Likewise from me.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Tests are ready.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @pav-kv)

pkg/kv/kvserver/store.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/store_raft.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)


pkg/kv/kvserver/store_raft_test.go line 233 at r2 (raw file):

	defer leaktest.AfterTest(t)()

	skip.UnderStress(t, "slow test")

nit: duress will also skip under stress:

return util.RaceEnabled || Stress() || syncutil.DeadlockEnabled


pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

		checkingMu.Unlock()
		enforceMaxLen = !enforceMaxLen
		time.Sleep(time.Millisecond)

Whats the sleep for?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli and @pav-kv)


pkg/kv/kvserver/store_raft_test.go line 233 at r2 (raw file):

Previously, kvoli (Austen) wrote…

nit: duress will also skip under stress:

return util.RaceEnabled || Stress() || syncutil.DeadlockEnabled

Done


pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Whats the sleep for?

I was trying various ways to make this fail by introducing bugs in the real code. Found that doing a sleep here gives more opportunity to various goroutines to enter their read critical section and call LoadOrCreate, which then ends up being concurrent with the call to SetEnforceMaxLen.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv)

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pav-kv and @sumeerbhola)


pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, sumeerbhola wrote…

I was trying various ways to make this fail by introducing bugs in the real code. Found that doing a sleep here gives more opportunity to various goroutines to enter their read critical section and call LoadOrCreate, which then ends up being concurrent with the call to SetEnforceMaxLen.

Ack, that makes sense then. I actually thought it might have been for the opposite reason.

@pav-kv pav-kv requested a review from kvoli December 12, 2024 11:17
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @sumeerbhola)


pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Ack, that makes sense then. I actually thought it might have been for the opposite reason.

Add a comment to this extent?


pkg/kv/kvserver/store_raft.go line 158 at r3 (raw file):

	q.acc.Init(context.Background(), qs.mon)
	q, loaded = qs.m.LoadOrStore(rangeID, q)
	if !loaded {

style nit: don't indent large blocks

if loaded {
	return q, true
}

// The sampling of ...
for {
	...
}
return q, false

pkg/kv/kvserver/store_raft_test.go line 255 at r3 (raw file):

			rng, _ := randutil.NewTestRand()
			defer wg.Done()
			for {

Add a limit to how much this can spin?

…to_all mode

The existing maxLen enforcement is already dubious:
- Length does not equal bytes, so offers limited protection from OOMs.
- The limit is per replica and not an aggregate.
- We run a cooperative system, and historically the sender has respected
  RaftConfig.RaftMaxInflightBytes, which is a byte limit. The only reason
  for additional protection on the receiver is when there are rapid repeated
  leader changes for a large number of ranges for which the receiver has
  replicas. Even in this case, the behavior is surprising since the receive
  queue overflows even though the sender has done nothing wrong -- and it
  is very unlikely that this overflow is actually protecting against an
  OOM.

With RACv2 in apply_to_all mode, the senders have a 16MiB regular token
pool that applies to a whole (store, tenant) pair. This is stricter than
the per range defaultRaftMaxInflightBytes (32MiB), both in value, and
because it is an aggregate limit. The aforementioned "only reason" is even
more unnecessary. So we remove this in the case of apply_to_all.

An alternative would be to have replicaSendStream respect the receiver
limit in apply_to_all mode. The complexity there is that replicaSendStream
grabs a bunch of tokens equal to the byte size of the send-queue, and
expects to send all those messages. To respect a count limit, it will
need to quickly return tokens it can't use (since they are a shared
resource), which adds complexity to the already complex token management
logic.

Fixes cockroachdb#135851

Epic: none

Release note: None
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli and @pav-kv)


pkg/kv/kvserver/store_raft.go line 158 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

style nit: don't indent large blocks

if loaded {
	return q, true
}

// The sampling of ...
for {
	...
}
return q, false

Done


pkg/kv/kvserver/store_raft_test.go line 284 at r2 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Add a comment to this extent?

Done


pkg/kv/kvserver/store_raft_test.go line 255 at r3 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Add a limit to how much this can spin?

We want these goroutines to keep working while the code below is changing enfoceMaxLen. There shouldn't be two places to decide when to stop since then there will be some time interval where the test is ineffective but still running.

I've added the following comment to make it clear that this loop will stop.

// Loop until the doneCh is closed.

@sumeerbhola
Copy link
Collaborator Author

bors r=kvoli,pav-kv

@craig craig bot merged commit a55295b into cockroachdb:master Dec 13, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rac2: pull mode can cause overflow of raftReceiveQueue
4 participants