Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a retry throttle #1689

Merged
merged 6 commits into from
Oct 26, 2023
Merged

Add a retry throttle #1689

merged 6 commits into from
Oct 26, 2023

Conversation

glbrntt
Copy link
Collaborator

@glbrntt glbrntt commented Oct 25, 2023

Motivation:

To implement retries and hedging, transports need to be able to throttle attempts to svoid overloading servers. The uses a token based system where successful requests, ones which end with non-retryable status code, add tokens and those which fail remove tokens from the system. Successful requests add 1 token, failed requests remove tokenRatio tokens (typically less than 1).

Modification:

  • Implement a retry throttle
  • Add a requirement to the ClientTransport

Result:

Retries can be throttled.

Motivation:

To implement retries and hedging, transports need to be able to throttle
attempts to svoid overloading servers. The uses a token based system
where successful requests, ones which end with non-retryable status
code, add tokens and those which fail remove tokens from the system.
Successful requests add 1 token, failed requests remove `tokenRatio`
tokens (typically less than 1).

Modification:

- Implement a retry throttle
- Add a requirement to the `ClientTransport`

Result:

Retries can be throttled.
@glbrntt glbrntt added the semver/none No version bump required. label Oct 25, 2023
@glbrntt glbrntt requested a review from gjcairo October 25, 2023 16:05
Package.swift Outdated
@@ -46,6 +46,10 @@ let packageDependencies: [Package.Dependency] = [
url: "https://github.com/apple/swift-nio-extras.git",
from: "1.4.0"
),
.package(
url: "https://github.com/apple/swift-atomics.git",
from: "1.1.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a 1.2.0 release last week to support Swift 5.9 - should we depend on that version instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. It's actually not needed at all, I started implementing this with atomics but realised you can't clamp atomically so ended up with a lock...

Sources/GRPCCore/Transport/RetryThrottle.swift Outdated Show resolved Hide resolved

/// The number of tokens the throttle currently has.
///
/// If this value is less than or equal to half of ``maximumTokens`` then RPCs will not be retried
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If this value is less than or equal to half of ``maximumTokens`` then RPCs will not be retried
/// If this value is not above the retry threshold, then RPCs will not be retried

@usableFromInline
func recordSuccess() {
self.scaledTokensAvailable.withLockedValue { value in
value = Swift.min(self.scaledMaximumTokens, value &+ self.scaledTokenRatio)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is the Swift. namespace prefix needed?

Sources/GRPCCore/Transport/RetryThrottle.swift Outdated Show resolved Hide resolved
@glbrntt glbrntt requested a review from gjcairo October 26, 2023 10:54
// computation in integer space.

/// The number of tokens available, multiplied by 1000.
private let scaledTokensAvailable: LockedValueBox<Int>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit odd to me on the first glance since RetryThrottle is a struct but it looks like we are using it with reference semantics across tasks here. I guess we don't want users to put this into a LockedValueBox themselves? We should probably call this out in the docs.

I couldn't tell from this PR what lead to the decision that we have this in a lock inside a struct so some explanation would be appreciated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good question. So each client transport needs to instantiate a retry throttle and then just hold on to it. The call layer needs access and mutate the throttle from across threads so it needs to be thread safe. IMO it doesn't make sense for the transport to have to do the locking which is why it's here. I don't think users would reach for a lock anyway: there are no public mutating functions for them to call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all of this makes a lot more sense when you see the larger picture. Right now the type and the semantics seem a bit confusing to me but I will see in the follow up PRs how it is used :)

}

/// Returns whether retries and hedging are permitted at this time.
public var isRetryPermitted: Bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments says retries and hedging but you the property is called isRetryPermitted. Should we use a word that unifies the two?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the grpc terminology here which I appreciate is a bit misleading... Hedging is just pre-emptive retries (we think it's going to fail so retry before we know it's failed). The grpc documentation refers to "regular" retries and hedging collectively as "retries".

@glbrntt glbrntt enabled auto-merge (squash) October 26, 2023 13:52
@glbrntt glbrntt merged commit f37ce30 into grpc:main Oct 26, 2023
13 checks passed
@glbrntt glbrntt added the v2 A change for v2 label Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/none No version bump required. v2 A change for v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants