-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extra txn backoff tuning parameters for sstxn #59
Conversation
543e9d0
to
fd2dc32
Compare
@@ -286,7 +289,11 @@ func (s *txnSuite) TestExcessiveContention(c *gc.C) { | |||
} | |||
err := s.txnRunner.Run(buildTxn) | |||
c.Assert(err, gc.Equals, jujutxn.ErrExcessiveContention) | |||
c.Assert(maxAttempt, gc.Equals, 2) | |||
if s.supportsSST { | |||
c.Assert(maxAttempt, gc.Equals, 19) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the magic number 19 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's 20 (retry count for sstxn) minus 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just as 2 is (3 minus 1) fr client side txn
fd2dc32
to
87ab674
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concept of sleep+fuzz is great. I think it would be clearer if expressed as time.Durations rather than integers/floats.
|
||
// defaultRetryFuzzMillis is the default fudge factor used when pausing between | ||
// unsuccessful transaction operations. | ||
defaultRetryFuzzMillis = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe for Fuzz we usually use a percentage (like 20%) vs a specific count.
I also wonder whether these should be 1*time.Millisecond rather than putting Millis into the name
|
||
serverSideTransactions bool | ||
nrRetries int | ||
retryBackoffMillis float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could then be time.Durations
// Increase the backoff time for each attempt. | ||
int(tr.retryBackoffMillis*1000)*attempt+ | ||
// Include a random amount of time as well. | ||
int(tr.retryFuzzMillis*fuzz*1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our fuzz is also usually +/- not just strictly longer than the base retry.
I'm not terribly concerned about 2ms delay, but it is good to have consistency around the patterns.
A lot of this feels like it is clearer if you started with time.Duration rather than Micro* (Millis*1000.)
Server side transactions run much faster (orders of magnitude) than client side ones, and so the possibility for contention is much greater when running many concurrent transactions. With only 3 retries, it's possible and likely that concurrent txns will never succeed.
This PR adds extra tuning parameters to the txn retry logic and also makes those parameters configurable so callers like Juju can tune them as needed.
The default retry count for SSTXN is now 50 and there's a small delay (plus fuzz) introduced between each retry.
Tested by running go tests in the juju state package. The retry count was somewhat adjusted empirically until the tests passed.