kvserver: reject requests when quiescing #51566

andreimatei · 2020-07-17T20:38:47Z

This patch makes the Store reject request once it's stopper is
quiescing. It also makes quiescing wait until all requests have been
drained. It also makes requests get canceled on quiescing so that they
drain faster.

Before this patch, we didn't seem to have good protection against
requests not running after the stopper has been stopped. We've seen this
in some tests, where requests were racing with the engine closing.
Running after the stopper has stopped is generally pretty undefined
behavior, so let's avoid it.
I think the reason why we didn't see a lot of errors from such races is
that we're stopping the gRPC server even before we start quiescing, so
at least for requests coming from remote nodes we had some draining
window.

A gotcha of this commit is that async tasks spun off by requests will
now get canceled when the requests finish. Before, most request's did
not have a cancelable ctx, so the point was moot. This cancelation
behavior is the right thing to do; callers to, say,
stopper.RunAsyncTasks() are in charge of passing a non-cancelable
context if they don't want cancelation.

Release note: None

Some stopper.RunAsyncTask's want to inherit the caller's cancelation, some don't. RunAsyncTask inherits. This patch adds a flavor that doesn't inherit, and uses it for a couple of kvserver tasks. Besides being the right thing to do, this comes in preparation of the next commit where the contexts of all request evaluations will be canceled once the evaluation is done - at which point async tasks that didn't want to be canceled will hurt. Release note: None

Command application sometimes inherits a context from a local request. The local request's ctx might be cancelable. This patch makes the derived application context not inherit the cancelation. Nobody should be checking for said cancelation, but making sure that even if they checked they wouldn't find a cancelation seems like a good safeguard since the consequences of not applying a command are bad. This is in preparation of the next commit which makes all requests evaluate under cancelable contexts. Release note: None

This patch makes the Store reject request once it's stopper is quiescing. It also makes quiescing wait until all requests have been drained. It also makes requests get canceled on quiescing so that they drain faster. Before this patch, we didn't seem to have good protection against requests not running after the stopper has been stopped. We've seen this in some tests, where requests were racing with the engine closing. Running after the stopper has stopped is generally pretty undefined behavior, so let's avoid it. I think the reason why we didn't see a lot of errors from such races is that we're stopping the gRPC server even before we start quiescing, so at least for requests coming from remote nodes we had some draining window. A gotcha of this commit is that async tasks spun off by requests will now get canceled when the requests finish. Before, most request's did not have a cancelable ctx, so the point was moot. This cancelation behavior is the right thing to do; callers to, say, stopper.RunAsyncTasks() are in charge of passing a non-cancelable context if they don't want cancelation. Release note: None

cockroach-teamcity · 2020-07-17T20:38:57Z

This change is

tbg

Thanks for taking this on!
Also, heads up - I will be out for the rest of the week pretty much, so I hope @nvanbenschoten can help you get this merged if we run out of time today.

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @nvanbenschoten)

pkg/kv/kvserver/replica_application_decoder.go, line 147 at r3 (raw file):

			// is not cancelable; nobody should be checking for this ctx's
			// cancelation, but let's not tempt the users.
			cmd.ctx = d.r.AnnotateCtx(opentracing.ContextWithSpan(context.Background(), cmd.sp))

I have mixed feelings about this as well. This will erase profiler labels, and whatever else the context may carry later.

If cancellation were just a property of the context without added semantics, we could just

type uncancelledContext struct {
  context.Context
}

var _ context.Context = (*uncancelledContext)(nil)

func (c *uncanceledContext) Err() error { return nil }
func (c *uncanceledContext) Done() <-chan struct{} { return nil }

but this is just another footgun. For example, the spans in a cancelled context are probably being re-used in another span and you can't touch them. Seeing this, I think we would need a contextutil.Transfer(from, to context.Context) context.Context method that picks out everything that can move over (like tags) and handles that which can't accordingly (fork trace span etc). Idk, there seems to be a bunch of work to do. I guess I'm okay with this in the meantime.

pkg/kv/kvserver/replica_proposal.go, line 235 at r2 (raw file):

	// Compute SHA asynchronously and store it in a map by UUID.

	// Don't inherit the ctx cancelation.

isn't it "cancellation"?

pkg/kv/kvserver/replica_proposal.go, line 236 at r2 (raw file):

	// Don't inherit the ctx cancelation.
	if err := stopper.RunAsyncTaskNoCancelation(ctx, "storage.Replica: computing checksum", func(ctx context.Context) {

I'm lukewarm on this since you could just context.Background() on the callers. I guess this would go away with the options so I don't feel strongly.

pkg/kv/kvserver/store_send.go, line 44 at r4 (raw file):

	ctx context.Context, ba roachpb.BatchRequest,
) (*roachpb.BatchResponse, *roachpb.Error) {
	// Run the request as a task so that it blocks the closing of the stopper

I thought I had cautioned about this approach when we chatted. a) I have slight concerns about the extensive use of WithCancelOnQuiesce in the hot path (@nvanbenschoten can probably tell us whether this matters) which both acquires an exclusive lock and allocates under it. b) we might immediately catch new flakes due to the pervasive addition of aggressive context cancellation. This is good in the long run but bad in the short run. I just want to point this out, we'll need to be vigilant here - I assume you did review all of the tasks in the request path for this.

pkg/util/stop/stopper.go, line 333 at r2 (raw file):

}

// RunAsyncTaskNoCancelation is like RunAsyncTask except that it runs the task

Looking at this again, I think this method is generally a bad idea. You make a new context in the method, but there is no "inventory" of what needs to be carried over into the new context. Instead, this method will silently lose anything not in the log tags. That is not great, I would much rather be explicit about that at the caller. Maybe, with the appropriate weighing of options we would still want the method, but since we're not currently focusing on this API, I would hold off on adding potential footguns to it.

pkg/util/stop/stopper.go, line 337 at r2 (raw file):

//
// TODO(andrei): Another thing that's reasonable for the caller to want is to
// not pass a cancelation, but for the take to get a cancelation on quiescing

task

pkg/util/stop/stopper.go, line 339 at r2 (raw file):

// not pass a cancelation, but for the take to get a cancelation on quiescing
// (WithCancelOnQuiesce). We should refactor this method into a set of options
// to RunAsyncTask.

+1

pkg/util/tracing/tracer.go, line 648 at r3 (raw file):

	if span := opentracing.SpanFromContext(ctx); span != nil {
		if _, noop := span.(*noopSpan); noop {
			// Optimization: avoid ContextWithSpan call if tracing is disabled.

But now you've lost the optimization? You need this check in ForkCtxSpan.

nvanbenschoten · 2020-08-10T03:27:34Z

@andreimatei were we aware of grpc.Server.GracefulStop when writing this patch? It seems like gRPC already provides a way to drain active RPCs.

tbg · 2020-08-11T12:55:32Z

GracefulStop is nice, but due to the local server optimization plus internal callers it would be hard to trust it. That said, we should use this if we can. I think we shut down the gRPC server quite late at the moment though, so this isn't a one liner.

…

On Mon, Aug 10, 2020 at 5:27 AM Nathan VanBenschoten < ***@***.***> wrote: @andreimatei <https://github.com/andreimatei> were we aware of grpc.Server.GracefulStop <https://godoc.org/google.golang.org/grpc#Server.GracefulStop> when writing this patch? It seems like gRPC already provides a way to drain active RPCs. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#51566 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABGXPZDTO633725SRDABOWDR75SLHANCNFSM4O63KXOQ> .

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_application_decoder.go, line 147 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I have mixed feelings about this as well. This will erase profiler labels, and whatever else the context may carry later.

If cancellation were just a property of the context without added semantics, we could just
type uncancelledContext struct {
  context.Context
}

var _ context.Context = (*uncancelledContext)(nil)

func (c *uncanceledContext) Err() error { return nil }
func (c *uncanceledContext) Done() <-chan struct{} { return nil }
but this is just another footgun. For example, the spans in a cancelled context are probably being re-used in another span and you can't touch them. Seeing this, I think we would need a contextutil.Transfer(from, to context.Context) context.Context method that picks out everything that can move over (like tags) and handles that which can't accordingly (fork trace span etc). Idk, there seems to be a bunch of work to do. I guess I'm okay with this in the meantime.

I actually like this uncancelledContext. I don't really see the footgun. You say something about spans being reused, but I don't think that'd ever be the case, at least not without extra synchronization. The simple fact that you know that a ctx is canceled doesn't let you infer anything... Whatever was running under that ctx might still be running. Even when there is some extra synchronization and you know that something has responded to cancelation, we don't reuse spans in non-thread-safe ways because they can always be referenced from async spans (there's an exception for a special immutable span that we always share. Spans generally have internal synchronization where they need it.

So I like this idea and I think we should try it. I'm really bothered by the lack of options in RunAsyncTask - this PR shows 3 examples of tasks that really should not be canceled.
No?

This patch makes the Store reject requests once its stopper is quiescing. Before this patch, we didn't seem to have good protection against requests not running after the stopper has been stopped. We've seen this in some tests, where requests were racing with the engine closing. Running after the stopper has stopped is generally pretty undefined behavior, so let's avoid it. I think the reason why we didn't see a lot of errors from such races is that we're stopping the gRPC server even before we start quiescing, so at least for requests coming from remote nodes we had some draining window. This is a lighter version of cockroachdb#51566. That patch was trying to run requests as tasks, so that they properly synchronize with server shutdown. This patch still allows races between requests that started evaluating the server started quiescing and server shutdown. Touches cockroachdb#51544 Release note: None

andreimatei · 2020-08-14T22:20:55Z

The new #52843 is a lighter version of the shutdown part of this PR.

…eation This commit optimizes the Stopper for task creation by ripping out the existing heavyweight task tracking in production builds. I realized that my biggest concern with most of the proposals (cockroachdb#52843 and cockroachdb#51566) being floated to address cockroachdb#51544 was that they bought more into the inefficient tracking in the Stopper, not that they were doing anything inherently wrong themselves. Before this change, creating a task acquired an exclusive mutex and then wrote to a hashmap. At high levels of concurrency, this would have become a performance chokepoint. After this change, the cost of launching a Task is three atomic increments – one to acquire a read lock, one to register with a WaitGroup, and one to release the read lock. When no one is draining the Stopper, these are all wait-free operations, which means that task creation becomes wait-free. With a change like this, I would feel much more comfortable pushing on Stopper tasks to solve cockroachdb#51544.

andreimatei added 4 commits July 17, 2020 16:35

kvserver: fix stale comment

8aa22c5

Release note: None

andreimatei requested review from bdarnell, tbg and nvanbenschoten July 17, 2020 20:38

tbg reviewed Jul 21, 2020

View reviewed changes

andreimatei mentioned this pull request Jul 22, 2020

sql/colexec: TestAggregatorMultiFunc failed #51332

Closed

andreimatei mentioned this pull request Aug 8, 2020

kv: general test flakiness due to Pebble close error #51544

Closed

andreimatei commented Aug 14, 2020

View reviewed changes

andreimatei mentioned this pull request Aug 14, 2020

kvserver: reject requests when quiescing #52843

Open

nvanbenschoten mentioned this pull request Aug 17, 2020

[DNM] stop: disable task tracking in release builds, optimize task creation #52894

Closed

tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021

nvanbenschoten removed their request for review September 27, 2022 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: reject requests when quiescing #51566

kvserver: reject requests when quiescing #51566

andreimatei commented Jul 17, 2020

cockroach-teamcity commented Jul 17, 2020

tbg left a comment

nvanbenschoten commented Aug 10, 2020

tbg commented Aug 11, 2020 via email

andreimatei left a comment

andreimatei commented Aug 14, 2020

kvserver: reject requests when quiescing #51566

Are you sure you want to change the base?

kvserver: reject requests when quiescing #51566

Conversation

andreimatei commented Jul 17, 2020

cockroach-teamcity commented Jul 17, 2020

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Aug 10, 2020

tbg commented Aug 11, 2020 via email

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei commented Aug 14, 2020