-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smartconnpool: do not allow connections to starve #17675
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Vicent Marti <[email protected]>
eddc5d7
to
e00df5f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17675 +/- ##
==========================================
+ Coverage 67.75% 67.77% +0.01%
==========================================
Files 1587 1587
Lines 255780 255806 +26
==========================================
+ Hits 173310 173369 +59
+ Misses 82470 82437 -33 ☔ View full report in Codecov by Sentry. |
Can confirm we're not seeing it in our CI either 👍. Thanks @vmg! |
Signed-off-by: Dirkjan Bussink <[email protected]>
@@ -161,7 +161,6 @@ type ConnPool[C Connection] struct { | |||
// The pool must be ConnPool.Open before it can start giving out connections | |||
func NewPool[C Connection](config *Config[C]) *ConnPool[C] { | |||
pool := &ConnPool[C]{} | |||
pool.freshSettingsStack.Store(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was -1 being used as some kind of sentinel value? Or was this completely unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a sentinel value to mark when no Setting connection had been returned to the pool yet, but it was not an interesting optimization, so I removed it.
) (#17685) Signed-off-by: Dirkjan Bussink <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
) (#17684) Signed-off-by: Dirkjan Bussink <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
) (#17683) Signed-off-by: Dirkjan Bussink <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
Description
This is a potential fix for the issue described in #17662. The bug report has a very comprehensive explanation of the bug (thank you!!), so I'll stick to explaining the fix here.
I've considered introducing stronger synchronization to the pool to prevent the race but decided against it because the race does not affect correctness, as the connections in the pool would only starve in situations where the service just stops receiving new requests over time. As long as the pool periodically receives connection requests, no connection can starve because the clients who raced with other clients and are stuck in the waitlist will wake up right away when the new connections come in. I don't think persistently slowing down the pool to handle for this corner case is worth it.
Hence, my proposed solution reuses the expiry loop (the periodic loop that checks that the connections in the wailist haven't expired naturally by the client) to detect connections that could potentially be starving. This will not be a common occurrence, but when it happens, we can simply force a cycle of the pool (pull from the stacks and hand over to the waiting connections) and that will always awake any starving connections.
The pros of this approach is that it adds zero overhead to the normal get/put path for the pool.
The cons of this approach is that a starving connection can starve for up to
100ms
(which is the frequency I've set for the starving loop).Open to alternative takes for this fix, ideally ones without much overhead to the get/put path.
cc @arthurschreiber @mhamza15 @harshit-gangal @deepthi
Related Issue(s)
connection pool timed out
errors when there is a spike in borrowed/waiting connections due to race condition #17662Checklist
Deployment Notes