Consistent `http.Server` timeout configurations #30248

jentfoo · 2023-08-09T22:34:11Z

This PR builds on the work from #30151 in the following ways:

A couple additional server configuartions that were missing timeouts are now covered
Timeouts are now configured in a consistent way. This means:
- Configuring the ReadTimeout which was not covered by only setting ReadHeaderTimeout
- Set ReadHeaderTimeout to be the more aggressive (1 second) defaults.ReadHeadersTimeout
- Set a WriteTimeout in cases of potential large responses

zmb3 · 2023-08-09T22:53:14Z

lib/auth/middleware.go

@@ -189,7 +189,9 @@ func NewTLSServer(cfg TLSServerConfig) (*TLSServer, error) {
 		cfg: cfg,
 		httpServer: &http.Server{
 			Handler:           tracingHandler,
-			ReadHeaderTimeout: apidefaults.DefaultIOTimeout,
+			ReadTimeout:       apidefaults.DefaultIOTimeout,
+			ReadHeaderTimeout: defaults.ReadHeadersTimeout,


I'm a bit concerned about the more aggressive timeout causing test flakiness, or worse performance issues at scale. 1 second is almost always too short for our CI environment where many test cases are running in parallel.

What do you think about starting with 2 seconds, and leaving this in master for the v14 performance tests before backporting?

+1. From experience I'd much rather start larger and err on the side of caution and slowly reduce the timeout over time.

What do you think about starting with 2 seconds, and leaving this in master for the v14 performance tests before backporting?

Sounds like a great plan to me. In commit ff22401ecd6b5c95793e8c03cba3ea77e1c642e2 I updated the configured timeout to 10 seconds. I feel like 2 seconds is likely fine, but we are coming from 30 seconds so even at 10, I feel like this is a notable net gain.

Maybe in Teleport 15 we go to 2 seconds (no concerns with moving slow to make sure we don't cause impacts)

I have seen two test failures, I worry this could be increasing the flakiness.

The timeouts seem pretty reasonable though, does it make sense to reduce test concurrency so that tests can complete faster? @zmb3 what are your thoughts?

Which tests? It's probably Go 1.21 and not your change

Seen twice:

=== Failed === FAIL: lib/auth TestAutoRotation (1.75s) tls_test.go:418: Error Trace: /__w/teleport/teleport/lib/auth/tls_test.go:418 Error: Error "write tcp 127.0.0.1:47596->127.0.0.1:32991: write: broken pipe" does not contain "certificate" Test: TestAutoRotation

I assumed it might be the write timeout causing the connection to be closed.

#30253
may or may not be your change (most likely not) but i don't think we have found the root cause for the above flaky test yet

zmb3 · 2023-08-09T22:57:00Z

lib/auth/webauthn/httpserver/main.go

@@ -84,8 +86,15 @@ func run() error {
 	http.HandleFunc("/register/1", s.register1)
 	http.HandleFunc("/register/2", s.register2)

+	srv := &http.Server{


This is fine but it doesn't really matter, this is just an example program and is not part of Teleport.

lib/auth/webauthn/httpserver/main.go

lib/srv/alpnproxy/local_proxy.go

This commit builds on the work from #30151 in the following ways: * A couple additional server configuartions that were missing timeouts are now covered * Timeouts are now configured in a consistent way. This means: - Configuring the `ReadTimeout` which was not covered by only setting `ReadHeaderTimeout` - Set `ReadHeaderTimeout` to be the more aggressive (1 second) `defaults.ReadHeadersTimeout` - Set a `WriteTimeout` in cases of potential large responses

This PR removes the `ReadTimeout` and `WriteTimeout` settings from `kube/proxy.Server`. The revert is required because both settings were terminating watch streams earlier and causing several when parsing the long lived data stream. Signed-off-by: Tiago Silva <[email protected]>

r0mant · 2023-11-21T16:17:53Z

@jentfoo FYI this change broke not just Kube Access that @tigrato fixed in #31945 but application access as well (it took us a week to troubleshoot and pinpoint the cause). I removed all timeouts set by this PR in application access request path in #34843.

I think we should carefully reevaluate all other places where this PR introduced timeouts as well, for example I see it also sets them in local proxy which I think may be prone to the same issue. TBH I'm tempted to just roll this back entirely.

zmb3 · 2023-11-22T08:53:56Z

Looks like this was also the cause of #34201

tigrato · 2023-11-22T10:19:54Z

@r0mant @zmb3 check if this one doesn't apply as well #33768

jentfoo requested review from r0mant, zmb3, greedy52, codingllama, strideynet and rosstimothy August 9, 2023 22:34

jentfoo self-assigned this Aug 9, 2023

github-actions bot added application-access kubernetes-access machine-id size/sm labels Aug 9, 2023

github-actions bot requested a review from jimbishopp August 9, 2023 22:34

zmb3 approved these changes Aug 9, 2023

View reviewed changes

jentfoo force-pushed the jent/http_timeouts branch 2 times, most recently from ff22401 to d252012 Compare August 10, 2023 21:27

codingllama approved these changes Aug 10, 2023

View reviewed changes

lib/auth/webauthn/httpserver/main.go Outdated Show resolved Hide resolved

lib/srv/alpnproxy/local_proxy.go Show resolved Hide resolved

public-teleport-github-review-bot bot removed request for r0mant, jimbishopp, greedy52 and strideynet August 10, 2023 21:35

jentfoo force-pushed the jent/http_timeouts branch from 8b3d186 to e796c10 Compare August 10, 2023 22:05

strideynet approved these changes Aug 11, 2023

View reviewed changes

rosstimothy approved these changes Aug 11, 2023

View reviewed changes

jentfoo added 3 commits August 11, 2023 08:20

defaults.go: Update ReadHeadersTimeout to 10 seconds

e972c12

alpnproxy/local_proxy.go: Move http timeouts to the top

addcb6a

jentfoo force-pushed the jent/http_timeouts branch from e796c10 to addcb6a Compare August 11, 2023 14:20

jentfoo added this pull request to the merge queue Aug 11, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 11, 2023

jentfoo added this pull request to the merge queue Aug 11, 2023

Merged via the queue into master with commit b7f571b Aug 11, 2023

jentfoo deleted the jent/http_timeouts branch August 11, 2023 16:23

tigrato mentioned this pull request Sep 15, 2023

Teleport 14 Test Plan #31122

Closed

r0mant mentioned this pull request Nov 21, 2023

Remove read/write timeouts to unbreak app access #34843

Merged

tigrato mentioned this pull request Dec 22, 2023

add http2 defaults for ReadIdleTimeout and PingTimeout #36000

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent `http.Server` timeout configurations #30248

Consistent `http.Server` timeout configurations #30248

jentfoo commented Aug 9, 2023

zmb3 Aug 9, 2023

rosstimothy Aug 9, 2023

jentfoo Aug 10, 2023 •

edited

Loading

jentfoo Aug 10, 2023

zmb3 Aug 10, 2023

jentfoo Aug 10, 2023

greedy52 Aug 11, 2023 •

edited

Loading

zmb3 Aug 9, 2023

r0mant commented Nov 21, 2023

zmb3 commented Nov 22, 2023

tigrato commented Nov 22, 2023

Consistent http.Server timeout configurations #30248

Consistent http.Server timeout configurations #30248

Conversation

jentfoo commented Aug 9, 2023

zmb3 Aug 9, 2023

Choose a reason for hiding this comment

rosstimothy Aug 9, 2023

Choose a reason for hiding this comment

jentfoo Aug 10, 2023 • edited Loading

Choose a reason for hiding this comment

jentfoo Aug 10, 2023

Choose a reason for hiding this comment

zmb3 Aug 10, 2023

Choose a reason for hiding this comment

jentfoo Aug 10, 2023

Choose a reason for hiding this comment

greedy52 Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

zmb3 Aug 9, 2023

Choose a reason for hiding this comment

r0mant commented Nov 21, 2023

zmb3 commented Nov 22, 2023

tigrato commented Nov 22, 2023

Consistent `http.Server` timeout configurations #30248

Consistent `http.Server` timeout configurations #30248

jentfoo Aug 10, 2023 •

edited

Loading

greedy52 Aug 11, 2023 •

edited

Loading