Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intentresolver: use caller's context deadline in intent resolution #37103

Merged

Conversation

ajwerner
Copy link
Contributor

Having a single timeout value for all batches sent by a request batcher has
proven problematic. The first commit changes the behavior to respect the latest
deadline for a request in a batch. If any request has no deadline, the batch
uses no deadline. The second commit deadline sets a timeout using the previous
value applied to all batches to asynchronous intent resolution calls.

This change isn't exactly equivalent to the prior behavior for async intent
resolution. In particular, the timeout applies to all of the requests rather than
to individual requests. Given the timeout, this seems reasonable. If we fail
to resolve all of the intents we want to within 30s, something else is probably
wrong or we're doing too much work.

@ajwerner ajwerner requested a review from a team April 24, 2019 22:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/remove-intent-resolver-timeout branch 5 times, most recently from f1f4030 to ca88a83 Compare April 25, 2019 03:58
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 3 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/internal/client/requestbatcher/batcher.go, line 442 at r2 (raw file):

	// sendDeadline is the latest deadline reported by a request's context.
	// It will be zero valued if any request does not	contain a deadline.

weird spacing here


pkg/internal/client/requestbatcher/batcher_test.go, line 329 at r1 (raw file):

		// This test attempts to verify that a batch with two requests where one
		// carries a timeout leads to the batch being sent without a timeout.
		// There is a hazard that the goroutines to do the send to not happen

"do not happen"

Can we reduce the frequency of this hazard by bumping up the timeout a little bit?


pkg/internal/client/requestbatcher/batcher_test.go, line 347 at r1 (raw file):

			s.respChan <- batchResp{}
		case <-ctx1.Done():
			return

Is this the hazard? If so, let's leave a t.Log so that we have at least a minor chance of catching any regression in the test.

EDIT from r2: yes it looks like it is.


pkg/storage/intentresolver/intent_resolver.go, line 59 at r2 (raw file):

	// asyncIntentResolutionTimeout is the timeout when processing a group of
	// intents asynchronously. The timeout prevents intent async intent

"intent async intent"


pkg/storage/intentresolver/intent_resolver.go, line 207 at r2 (raw file):

		Stopper:         c.Stopper,
		Sender:          c.DB.NonTransactionalSender(),
		BatchTimeout:    gcTimeout,

Does this build after the first commit? If not, we should fix that. Having a failing build, even in intermediate commits, can mess up git bisects.

…t deadline

Having a single timeout value for all batches sent by a request batcher has
proven problematic. This commit changes the behavior to respect the latest
deadline for a request in a batch. If any request has no deadline, the batch
uses no deadline.

Release note: None
This commit respects the caller's deadline when sending intent resolution
batches. It also sets a timeout using the previous value applied to all
batches to asynchronous intent resolution calls.

Fixes cockroachdb#36953.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/remove-intent-resolver-timeout branch from ca88a83 to 9b8a2af Compare April 25, 2019 12:56
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/internal/client/requestbatcher/batcher_test.go, line 329 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"do not happen"

Can we reduce the frequency of this hazard by bumping up the timeout a little bit?

Sure done.


pkg/internal/client/requestbatcher/batcher_test.go, line 347 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this the hazard? If so, let's leave a t.Log so that we have at least a minor chance of catching any regression in the test.

EDIT from r2: yes it looks like it is.

Done.


pkg/storage/intentresolver/intent_resolver.go, line 207 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this build after the first commit? If not, we should fix that. Having a failing build, even in intermediate commits, can mess up git bisects.

Fixed to remove the setting in the first commit.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 3 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)

@ajwerner
Copy link
Contributor Author

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Apr 25, 2019
37103: intentresolver: use caller's context deadline in intent resolution r=nvanbenschoten a=ajwerner

Having a single timeout value for all batches sent by a request batcher has
proven problematic. The first commit changes the behavior to respect the latest
deadline for a request in a batch. If any request has no deadline, the batch
uses no deadline. The second commit deadline sets a timeout using the previous
value applied to all batches to asynchronous intent resolution calls.

This change isn't exactly equivalent to the prior behavior for async intent
resolution. In particular, the timeout applies to all of the requests rather than
to individual requests. Given the timeout, this seems reasonable. If we fail
to resolve all of the intents we want to within 30s, something else is probably
wrong or we're doing too much work. 


Co-authored-by: Andrew Werner <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 25, 2019

Build succeeded

@craig craig bot merged commit 9b8a2af into cockroachdb:master Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants