-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache event fanout & reversetunnel improvements #3536
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style commends, since I'm not very familiar with the code yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass on the first half, will do another pass shortly after
1d99284
to
9ecf9a5
Compare
https://github.com/golang/go/wiki/CodeReviewComments#interfaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will continue tomorrow.
if err != nil && err != io.EOF { | ||
p.log.Warnf("Proxy transport failed: %v %T.", trace.DebugReport(err), err) | ||
} | ||
case <-errorCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like before we were reporting errors other than io.EOF
. What errors were you seeing here other than io.EOF
that were not relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed because this was just filling the logs with errors related to "use of closed network connection".
if count < 1 { | ||
count = 1 | ||
} | ||
p.wp.Set(key, uint64(count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the count be picked up in the tick
function and won't it be more accurate because it will have reduce count by the number of proxies that are expired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expiry is checked on a pretty long interval (30 seconds, though it is configurable). If we only updated counts at that point, the AgentPool would be very slow to react to new proxies being available (also, tests would take significantly longer).
d0b0082
to
0e80232
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fspmarshall approved with minor nitpicks left here and there, please proceed with backport and ping me so I can load test.
ecf227c
to
5384883
Compare
"github.com/gravitational/teleport/lib/utils/workpool" | ||
) | ||
|
||
type Lease = workpool.Lease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use a type alias here?
if it's just to shorten the code below, remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Lease
type is the only part of the workpool
API which isn't wrapped. Alias unifies the API and eliminates the need for consumer to import the workpool
package, which is otherwise an implementation detail.
// SetDefaults set default values for Config. | ||
func (c *Config) SetDefaults() { | ||
if c.ProxyExpiry < 1 { | ||
c.ProxyExpiry = 3 * time.Minute | ||
} | ||
if c.TickRate < 1 { | ||
c.TickRate = 30 * time.Second | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement this as a constructor func NewConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usable zero value with defaults set at time of usage is more consistent with how configuration structs are handled elsewhere in Teleport. Not 100% certain how the convention got started, but it helps with deserialization and with this handy pattern:
thing := NewThing(ThingConfig {
Foo: "bar",
// ...
})
if strings.HasSuffix(name, p.key.Cluster) { | ||
t := strings.TrimSuffix(name, p.key.Cluster) | ||
if strings.HasSuffix(t, ".") { | ||
name = strings.TrimSuffix(t, ".") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
name = strings.TrimSuffix(name, fmt.Sprintf(".%s", p.key.Cluster))
it will be a noop if there's no such suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trimming Cluster
and "."
separately avoids unnecessary string allocation.
6d78612
to
188e22c
Compare
- cache now perforams in-memory fanout of events, eliminating spurious event generation due to cache init/reset. - removed old unused logic from reversetunnel agents. - replaced seekpool with simpler ttl-cache and semaphore-like lease system. - add jittered backoff to agent connection attempts to reduce "thundering herd" effect. - improved reversetunnel logging. - improved LB usage in tests.
188e22c
to
de246cf
Compare
The PR includes multiple fixes intended to improve reliability of the caching and reversetunnel systems:
cache now perforams in-memory fanout of events, eliminating
spurious event generation due to cache init/reset.
removed old unused logic from reversetunnel agents.
seekpool now directly controls active agent count via newsemaphore-like lease system.
seekpool functionality simplified and split into two separate components; a "workpool" construct which functions like a per-cluster semaphore with variable targets, and a "tracker" which manages a record of which proxies might exist and which proxies are currently held by active agents.
add per-cluster jittered backoff to agent connection attempts toreduce "thundering herd" effect.
removed per-proxy and per-cluster backoff constructs in favor of a single jittered backoff at the AgentPool level (easier to maintain and test).
improved reversetunnel logging.
improved LB usage in tests.
Fixes #3468