Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra txn backoff tuning parameters for sstxn #59

Merged
merged 1 commit into from
Apr 6, 2021

Conversation

wallyworld
Copy link
Member

@wallyworld wallyworld commented Apr 6, 2021

Server side transactions run much faster (orders of magnitude) than client side ones, and so the possibility for contention is much greater when running many concurrent transactions. With only 3 retries, it's possible and likely that concurrent txns will never succeed.

This PR adds extra tuning parameters to the txn retry logic and also makes those parameters configurable so callers like Juju can tune them as needed.

The default retry count for SSTXN is now 50 and there's a small delay (plus fuzz) introduced between each retry.

Tested by running go tests in the juju state package. The retry count was somewhat adjusted empirically until the tests passed.

@wallyworld
Copy link
Member Author

@@ -286,7 +289,11 @@ func (s *txnSuite) TestExcessiveContention(c *gc.C) {
}
err := s.txnRunner.Run(buildTxn)
c.Assert(err, gc.Equals, jujutxn.ErrExcessiveContention)
c.Assert(maxAttempt, gc.Equals, 2)
if s.supportsSST {
c.Assert(maxAttempt, gc.Equals, 19)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the magic number 19 come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's 20 (retry count for sstxn) minus 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as 2 is (3 minus 1) fr client side txn

@wallyworld
Copy link
Member Author

$$merge$$

@jujubot jujubot merged commit c8f3706 into juju:master Apr 6, 2021
Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the concept of sleep+fuzz is great. I think it would be clearer if expressed as time.Durations rather than integers/floats.


// defaultRetryFuzzMillis is the default fudge factor used when pausing between
// unsuccessful transaction operations.
defaultRetryFuzzMillis = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe for Fuzz we usually use a percentage (like 20%) vs a specific count.
I also wonder whether these should be 1*time.Millisecond rather than putting Millis into the name


serverSideTransactions bool
nrRetries int
retryBackoffMillis float32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could then be time.Durations

// Increase the backoff time for each attempt.
int(tr.retryBackoffMillis*1000)*attempt+
// Include a random amount of time as well.
int(tr.retryFuzzMillis*fuzz*1000))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our fuzz is also usually +/- not just strictly longer than the base retry.
I'm not terribly concerned about 2ms delay, but it is good to have consistency around the patterns.

A lot of this feels like it is clearer if you started with time.Duration rather than Micro* (Millis*1000.)

jujubot added a commit that referenced this pull request Apr 7, 2021
#60

Changes based on comments on #59.
- make the retry parameter a duration instead of a float.
- fuzz is a +/- percentage.

Also add a unit test for the backoff calculation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants