-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a retry throttle #1689
Add a retry throttle #1689
Conversation
Motivation: To implement retries and hedging, transports need to be able to throttle attempts to svoid overloading servers. The uses a token based system where successful requests, ones which end with non-retryable status code, add tokens and those which fail remove tokens from the system. Successful requests add 1 token, failed requests remove `tokenRatio` tokens (typically less than 1). Modification: - Implement a retry throttle - Add a requirement to the `ClientTransport` Result: Retries can be throttled.
Package.swift
Outdated
@@ -46,6 +46,10 @@ let packageDependencies: [Package.Dependency] = [ | |||
url: "https://github.com/apple/swift-nio-extras.git", | |||
from: "1.4.0" | |||
), | |||
.package( | |||
url: "https://github.com/apple/swift-atomics.git", | |||
from: "1.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a 1.2.0 release last week to support Swift 5.9 - should we depend on that version instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot. It's actually not needed at all, I started implementing this with atomics but realised you can't clamp atomically so ended up with a lock...
|
||
/// The number of tokens the throttle currently has. | ||
/// | ||
/// If this value is less than or equal to half of ``maximumTokens`` then RPCs will not be retried |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If this value is less than or equal to half of ``maximumTokens`` then RPCs will not be retried | |
/// If this value is not above the retry threshold, then RPCs will not be retried |
@usableFromInline | ||
func recordSuccess() { | ||
self.scaledTokensAvailable.withLockedValue { value in | ||
value = Swift.min(self.scaledMaximumTokens, value &+ self.scaledTokenRatio) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is the Swift.
namespace prefix needed?
Co-authored-by: Gustavo Cairo <[email protected]>
// computation in integer space. | ||
|
||
/// The number of tokens available, multiplied by 1000. | ||
private let scaledTokensAvailable: LockedValueBox<Int> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit odd to me on the first glance since RetryThrottle
is a struct but it looks like we are using it with reference semantics across tasks here. I guess we don't want users to put this into a LockedValueBox
themselves? We should probably call this out in the docs.
I couldn't tell from this PR what lead to the decision that we have this in a lock inside a struct so some explanation would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good question. So each client transport needs to instantiate a retry throttle and then just hold on to it. The call layer needs access and mutate the throttle from across threads so it needs to be thread safe. IMO it doesn't make sense for the transport to have to do the locking which is why it's here. I don't think users would reach for a lock anyway: there are no public mutating functions for them to call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess all of this makes a lot more sense when you see the larger picture. Right now the type and the semantics seem a bit confusing to me but I will see in the follow up PRs how it is used :)
} | ||
|
||
/// Returns whether retries and hedging are permitted at this time. | ||
public var isRetryPermitted: Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments says retries and hedging but you the property is called isRetryPermitted
. Should we use a word that unifies the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm following the grpc terminology here which I appreciate is a bit misleading... Hedging is just pre-emptive retries (we think it's going to fail so retry before we know it's failed). The grpc documentation refers to "regular" retries and hedging collectively as "retries".
Motivation:
To implement retries and hedging, transports need to be able to throttle attempts to svoid overloading servers. The uses a token based system where successful requests, ones which end with non-retryable status code, add tokens and those which fail remove tokens from the system. Successful requests add 1 token, failed requests remove
tokenRatio
tokens (typically less than 1).Modification:
ClientTransport
Result:
Retries can be throttled.