-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent http.Server
timeout configurations
#30248
Conversation
@@ -189,7 +189,9 @@ func NewTLSServer(cfg TLSServerConfig) (*TLSServer, error) { | |||
cfg: cfg, | |||
httpServer: &http.Server{ | |||
Handler: tracingHandler, | |||
ReadHeaderTimeout: apidefaults.DefaultIOTimeout, | |||
ReadTimeout: apidefaults.DefaultIOTimeout, | |||
ReadHeaderTimeout: defaults.ReadHeadersTimeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the more aggressive timeout causing test flakiness, or worse performance issues at scale. 1 second is almost always too short for our CI environment where many test cases are running in parallel.
What do you think about starting with 2 seconds, and leaving this in master for the v14 performance tests before backporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. From experience I'd much rather start larger and err on the side of caution and slowly reduce the timeout over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about starting with 2 seconds, and leaving this in master for the v14 performance tests before backporting?
Sounds like a great plan to me. In commit ff22401ecd6b5c95793e8c03cba3ea77e1c642e2 I updated the configured timeout to 10 seconds. I feel like 2 seconds is likely fine, but we are coming from 30 seconds so even at 10, I feel like this is a notable net gain.
Maybe in Teleport 15 we go to 2 seconds (no concerns with moving slow to make sure we don't cause impacts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen two test failures, I worry this could be increasing the flakiness.
The timeouts seem pretty reasonable though, does it make sense to reduce test concurrency so that tests can complete faster? @zmb3 what are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests? It's probably Go 1.21 and not your change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seen twice:
=== Failed
=== FAIL: lib/auth TestAutoRotation (1.75s)
tls_test.go:418:
Error Trace: /__w/teleport/teleport/lib/auth/tls_test.go:418
Error: Error "write tcp 127.0.0.1:47596->127.0.0.1:32991: write: broken pipe" does not contain "certificate"
Test: TestAutoRotation
I assumed it might be the write timeout causing the connection to be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#30253
may or may not be your change (most likely not) but i don't think we have found the root cause for the above flaky test yet
lib/auth/webauthn/httpserver/main.go
Outdated
@@ -84,8 +86,15 @@ func run() error { | |||
http.HandleFunc("/register/1", s.register1) | |||
http.HandleFunc("/register/2", s.register2) | |||
|
|||
srv := &http.Server{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine but it doesn't really matter, this is just an example program and is not part of Teleport.
ff22401
to
d252012
Compare
8b3d186
to
e796c10
Compare
This commit builds on the work from #30151 in the following ways: * A couple additional server configuartions that were missing timeouts are now covered * Timeouts are now configured in a consistent way. This means: - Configuring the `ReadTimeout` which was not covered by only setting `ReadHeaderTimeout` - Set `ReadHeaderTimeout` to be the more aggressive (1 second) `defaults.ReadHeadersTimeout` - Set a `WriteTimeout` in cases of potential large responses
e796c10
to
addcb6a
Compare
This PR removes the `ReadTimeout` and `WriteTimeout` settings from `kube/proxy.Server`. The revert is required because both settings were terminating watch streams earlier and causing several when parsing the long lived data stream. Signed-off-by: Tiago Silva <[email protected]>
This PR removes the `ReadTimeout` and `WriteTimeout` settings from `kube/proxy.Server`. The revert is required because both settings were terminating watch streams earlier and causing several when parsing the long lived data stream. Signed-off-by: Tiago Silva <[email protected]>
This PR removes the `ReadTimeout` and `WriteTimeout` settings from `kube/proxy.Server`. The revert is required because both settings were terminating watch streams earlier and causing several when parsing the long lived data stream. Signed-off-by: Tiago Silva <[email protected]>
This PR removes the `ReadTimeout` and `WriteTimeout` settings from `kube/proxy.Server`. The revert is required because both settings were terminating watch streams earlier and causing several when parsing the long lived data stream. Signed-off-by: Tiago Silva <[email protected]>
@jentfoo FYI this change broke not just Kube Access that @tigrato fixed in #31945 but application access as well (it took us a week to troubleshoot and pinpoint the cause). I removed all timeouts set by this PR in application access request path in #34843. I think we should carefully reevaluate all other places where this PR introduced timeouts as well, for example I see it also sets them in local proxy which I think may be prone to the same issue. TBH I'm tempted to just roll this back entirely. |
Looks like this was also the cause of #34201 |
This PR builds on the work from #30151 in the following ways:
ReadTimeout
which was not covered by only settingReadHeaderTimeout
ReadHeaderTimeout
to be the more aggressive (1 second)defaults.ReadHeadersTimeout
WriteTimeout
in cases of potential large responses