Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix token bucket not being set for both standard and adaptive retry modes #3964

Merged
merged 5 commits into from
Jan 14, 2025

Conversation

aajtodd
Copy link
Contributor

@aajtodd aajtodd commented Jan 8, 2025

Motivation and Context

awslabs/aws-sdk-rust#1234

Description

PR adds a new interceptor registered as part of the default retry plugin components that ensures a token bucket is always present and available to the retry strategy. The buckets are partitioned off the retry partition (which defaults to the service name and is already set by the default plugin). We use a static variable in the runtime for this which means that token buckets can and will apply to every single client that uses the same retry partition. The implementation tries to avoid contention on this new global lock by only consulting it if the retry partition is overridden after client creation.

For AWS SDK clients I've updated the default retry partition clients are created with to include the region when set. Now the default partition for a client will be {service}-{region} (e.g. sts-us-west-2) rather than just the service name (e.g. sts). This partitioning is a little more granular and closer to what we want/expect as failures in one region should not cause throttling to another (and vice versa for success in one should not increase available quota in another).

I also updated the implementation to follow the SEP a little more literally/closely as far as structure which fixes some subtle bugs. State is updated in one place and we ensure that the token bucket is always consulted (before the token bucket could be skipped in the case of adaptive retries returning a delay and the adaptive rate limit was updated in multiple branches).

Testing

Checklist

  • [x ] For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "client," "server," or both in the applies_to key.
  • [ x] For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "aws-sdk-rust" in the applies_to key.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

github-actions bot commented Jan 8, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

github-actions bot commented Jan 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

github-actions bot commented Jan 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

github-actions bot commented Jan 9, 2025

A new generated diff is ready to view.

A new doc preview is ready to view.

Copy link

A new generated diff is ready to view.

A new doc preview is ready to view.

@aajtodd aajtodd marked this pull request as ready for review January 10, 2025 17:27
@aajtodd aajtodd requested review from a team as code owners January 10, 2025 17:27
Comment on lines +182 to +185
let default_retry_partition = match config.region() {
Some(region) => #{Cow}::from(format!("{default_retry_partition}-{}", region)),
None => #{Cow}::from(default_retry_partition),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let default_retry_partition = match config.region() {
Some(region) => #{Cow}::from(format!("{default_retry_partition}-{}", region)),
None => #{Cow}::from(default_retry_partition),
};
let default_retry_partition = match config.region() {
Some(region) => #{Cow}::from(format!("{default_retry_partition}-{region}")),
None => #{Cow}::from(default_retry_partition),
};

Comment on lines +226 to +229
fun usesRegion(codegenContext: ClientCodegenContext) =
codegenContext.getBuiltIn(AwsBuiltIns.REGION) != null ||
ServiceIndex.of(codegenContext.model)
.getEffectiveAuthSchemes(codegenContext.serviceShape).containsKey(SigV4Trait.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever have to worry about models supporting sigv4a but not sigv4? How do retry partitions interact with Sigv4a regions (if at all)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this same thought when reading/extracting this function. I think the possibility is certainly coming but wasn't sure.

.expect("success");

let log_contents = logs_rx.contents();
assert!(log_contents.contains("token bucket for RetryPartition { name: \"dontcare-us-west-2\" } added to config bag"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider updating the orchestrator to note the retry partition used for each retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, is there any advantage over logging it here?

Copy link
Contributor

@ysaito1001 ysaito1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank for straightening the impls of should_attempt_retry and calculate_backoff!

@aajtodd aajtodd added this pull request to the merge queue Jan 14, 2025
Merged via the queue into main with commit 1ddbc53 Jan 14, 2025
43 of 44 checks passed
@aajtodd aajtodd deleted the fix-retry-tb branch January 14, 2025 15:44
landonxjames pushed a commit that referenced this pull request Jan 14, 2025
…odes (#3964)

<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here -->
awslabs/aws-sdk-rust#1234

<!--- Describe your changes in detail -->

PR adds a new interceptor registered as part of the default retry plugin
components that ensures a token bucket is _always_ present and available
to the retry strategy. The buckets are partitioned off the retry
partition (which defaults to the service name and is already set by the
default plugin). We use a `static` variable in the runtime for this
which means that token buckets can and will apply to every single client
that uses the same retry partition. The implementation tries to avoid
contention on this new global lock by only consulting it if the retry
partition is overridden after client creation.

For AWS SDK clients I've updated the default retry partition clients are
created with to include the region when set. Now the default partition
for a client will be `{service}-{region}` (e.g. `sts-us-west-2`) rather
than just the service name (e.g. `sts`). This partitioning is a little
more granular and closer to what we want/expect as failures in one
region should not cause throttling to another (and vice versa for
success in one should not increase available quota in another).

I also updated the implementation to follow the SEP a little more
literally/closely as far as structure which fixes some subtle bugs.
State is updated in one place and we ensure that the token bucket is
always consulted (before the token bucket could be skipped in the case
of adaptive retries returning a delay and the adaptive rate limit was
updated in multiple branches).

<!--- Please describe in detail how you tested your changes -->
<!--- Include details of your testing environment, and the tests you ran
to -->
<!--- see how your change affects other areas of the code, etc. -->

<!--- If a checkbox below is not applicable, then please DELETE it
rather than leaving it unchecked -->
- [x ] For changes to the smithy-rs codegen or runtime crates, I have
created a changelog entry Markdown file in the `.changelog` directory,
specifying "client," "server," or both in the `applies_to` key.
- [ x] For changes to the AWS SDK, generated SDK code, or SDK runtime
crates, I have created a changelog entry Markdown file in the
`.changelog` directory, specifying "aws-sdk-rust" in the `applies_to`
key.

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants