-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): implement RetryChunkDeadline for grpc writes #11476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, a few minor things
storage/storage.go
Outdated
@@ -2420,6 +2420,7 @@ type retryConfig struct { | |||
policy RetryPolicy | |||
shouldRetry func(err error) bool | |||
maxAttempts *int | |||
maxDuration time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like this isn't passed in by anything except writer currently. Maybe add a comment to note that?
storage/storage.go
Outdated
@@ -2420,6 +2420,7 @@ type retryConfig struct { | |||
policy RetryPolicy | |||
shouldRetry func(err error) bool | |||
maxAttempts *int | |||
maxDuration time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's call it maxRetryDuration
since it doesn't actually limit the total duration of calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
It stutters a bit (retryConfig.maxRetryDuration
) but that was my initial instinct as well. It's clearer anyway since retryConfig
can have any name.
return (errors.As(err, &httpErr) && httpErr.Code == httpStatusCode) || | ||
(errors.As(err, &grpcErr) && grpcErr.GRPCStatus().Code() == grpcStatusCode) | ||
(errors.As(err, &grpcErr) && grpcErr.GRPCStatus().Code() == grpcStatusCode) || | ||
status.Code(err) == grpcStatusCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need 3 cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testbench errors are Status errors but not apierror.APIError
s, so the added test fails when checking the status code without this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. That seems like an issue with wrapping at some layer. But in any case L6974 should work for all grpc errors (both real service and testbench)
storage/invoke.go
Outdated
if lastErr == nil { | ||
return true, errors.New("storage: request not sent, choose a larger value for the retry deadline") | ||
} | ||
return true, fmt.Errorf("storage: retry timed out after %v attempts; last error: %w", attempts, lastErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This err should include the value of the timeout as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Branched from #11475
Adds a timer to the upload buffer. This is equivalent to the http implementation.
Changed from implementing directly in the upload code to implementing in
invoke.go
. This makes it extensible (if we ever want to add a similar deadline to other calls), but also more prone to affecting other ops (just to be clear, I don't think it will given testing/how it is written). Reason for changing it is because we need to be able to stop the retries while still wrapping the error. If we do it a level above, if we return a "deadline done, here is last error" wrapping the error, it will continue to retry if the last error is retriable.