Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): implement RetryChunkDeadline for grpc writes #11476

Merged
merged 8 commits into from
Jan 29, 2025

Conversation

BrennaEpp
Copy link
Contributor

@BrennaEpp BrennaEpp commented Jan 21, 2025

Branched from #11475

Adds a timer to the upload buffer. This is equivalent to the http implementation.

Changed from implementing directly in the upload code to implementing in invoke.go. This makes it extensible (if we ever want to add a similar deadline to other calls), but also more prone to affecting other ops (just to be clear, I don't think it will given testing/how it is written). Reason for changing it is because we need to be able to stop the retries while still wrapping the error. If we do it a level above, if we return a "deadline done, here is last error" wrapping the error, it will continue to retry if the last error is retriable.

@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Jan 21, 2025
@BrennaEpp BrennaEpp marked this pull request as ready for review January 28, 2025 07:43
@BrennaEpp BrennaEpp requested review from a team as code owners January 28, 2025 07:43
Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, a few minor things

@@ -2420,6 +2420,7 @@ type retryConfig struct {
policy RetryPolicy
shouldRetry func(err error) bool
maxAttempts *int
maxDuration time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like this isn't passed in by anything except writer currently. Maybe add a comment to note that?

@@ -2420,6 +2420,7 @@ type retryConfig struct {
policy RetryPolicy
shouldRetry func(err error) bool
maxAttempts *int
maxDuration time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's call it maxRetryDuration since it doesn't actually limit the total duration of calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
It stutters a bit (retryConfig.maxRetryDuration) but that was my initial instinct as well. It's clearer anyway since retryConfig can have any name.

return (errors.As(err, &httpErr) && httpErr.Code == httpStatusCode) ||
(errors.As(err, &grpcErr) && grpcErr.GRPCStatus().Code() == grpcStatusCode)
(errors.As(err, &grpcErr) && grpcErr.GRPCStatus().Code() == grpcStatusCode) ||
status.Code(err) == grpcStatusCode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need 3 cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testbench errors are Status errors but not apierror.APIErrors, so the added test fails when checking the status code without this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That seems like an issue with wrapping at some layer. But in any case L6974 should work for all grpc errors (both real service and testbench)

if lastErr == nil {
return true, errors.New("storage: request not sent, choose a larger value for the retry deadline")
}
return true, fmt.Errorf("storage: retry timed out after %v attempts; last error: %w", attempts, lastErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This err should include the value of the timeout as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@BrennaEpp BrennaEpp enabled auto-merge (squash) January 29, 2025 21:02
@BrennaEpp BrennaEpp merged commit 03575d7 into googleapis:main Jan 29, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants