Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session upload completer grace period bug #11348

Closed
Joerger opened this issue Mar 22, 2022 · 0 comments · Fixed by #11551
Closed

Session upload completer grace period bug #11348

Joerger opened this issue Mar 22, 2022 · 0 comments · Fixed by #11551
Assignees
Labels
audit-log Issues related to Teleports Audit Log bug

Comments

@Joerger
Copy link
Contributor

Joerger commented Mar 22, 2022

With the changes made in #4045, after a 24 hour grace period, a session's event stream is uploaded, whether the session is closed or not. This logic was built assuming that a after 24 hours a session must be abandoned. However, a session can remain open indefinitely, and the grace period logic misbehaves in this edge case.

u.log.Debugf("Upload %v grace period is over. Trying to complete.", upload.ID)
if err := u.cfg.Uploader.CompleteUpload(ctx, upload, parts); err != nil {
return trace.Wrap(err)
}

With some of the recent changes made to session uploading and moderated sessions, the grace period's misbehavior has some varying effects depending on the version.

v8.3.4

If the server is closed during a session before the grace period, upon restart the server will successfully upload the session after the grace period, as expected.

But if the session remains open after the grace period upload, then the following problems arise:

  • If the session exits immediately, the session recording becomes unplayable. Hitting Play for the session in the webui leads to the message - Recording for this session is not available.
    • See the debug logs at the bottom for this case
  • If the server is restarted, the session recording doesn't show up in the webui (because it couldn't ensure session leave event?), but is on disk. I believe the recording on disk is only the grace period upload, anything after is lost.
  • If a new session chunk is made and attempts to write it to disk, it will fail and the session will freeze. Once closed, the session recording is does not show up in the webui, but is on disk, similar to the case above.
  • If less than a full chunk is made, then it will truncate the uploaded recording to the length of the additional chunk. So a portion of the initial session chunk is playable...

What should happen

When the grace period is reached on a pending session upload:

  • The pending upload parts should be uploaded (currently works)
  • The session should be forced to close, if it isn't already.
    • Although ungraceful, I don't see another way to do this without turning it into a "hybrid" sync/async session writer.
    • Alternatively we could alter the grace period logic to only complete the upload if the session is closed, but this may ruin the current separation of concerns.
      • This might be possible by improving the logic around ensuring the session.leave event.

Ideally, users would use the node-sync session streaming option, which would help to avoid this disruptive grace period behavior altogether, outside of the intended scenario where a session stream fails to upload at the end of the session's life time (network issue, etc.).

Debug logs - v8.3.4 - session exits immediately after grace period upload completes

...
2022-03-24T10:01:32-07:00 INFO [AUDIT]     session.start addr.local:127.0.0.1:3022 addr.remote:127.0.0.1:36534 cluster_name:example.com code:T2000I ei:0 event:session.start login:bjoerger namespace:default server_addr:127.0.0.1:3022 server_hostname:server01 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 server_labels:map[arch:x86_64 cluster:example.com env:staging hostname:bjoerger-ThinkPad-X1-Extreme] session_recording:node sid:15a2c1a2-9913-48ba-ae53-651be25842ff size:280:25 time:2022-03-24T17:01:32.016Z uid:2be6d0cc-8122-4e22-9a2a-6a25d16dfeb1 user:dev events/emitter.go:325

# grace period ends (set to 10 seconds)

2022-03-24T10:01:42-07:00 DEBU [AUTH:COMP] Upload 0bb39db8-95b0-4b5e-a15a-e87b17e82905 grace period is over. Trying to complete. events/complete.go:140
2022-03-24T10:01:42-07:00 DEBU [AUTH:COMP] Completed upload Upload(session=15a2c1a2-9913-48ba-ae53-651be25842ff, id=0bb39db8-95b0-4b5e-a15a-e87b17e82905, initiated=2022-03-24 10:01:32.011363435 -0700 PDT). events/complete.go:144
2022-03-24T10:01:42-07:00 DEBU [AUTH:COMP] Found 1 active uploads, completed 1. events/complete.go:186
2022-03-24T10:01:42-07:00 DEBU [UPLOAD]    Scanned 1 uploads, started 1 in /home/bjoerger/gravitational/teleport-config/local/storage/data/dir/main/log/upload/streaming/default. filesessions/fileasync.go:289
2022-03-24T10:01:42-07:00 DEBU [AUTH:GRPC] CreateAuditStream connection from f58992c2-576d-42bd-be86-498aa37bd3e3.example.com. auth/grpcserver.go:169
2022-03-24T10:01:42-07:00 DEBU [AUTH:GRPC] Created stream: <nil>. auth/grpcserver.go:212
2022-03-24T10:01:42-07:00 INFO [AUDIT]     session.upload cluster_name:example.com code:T2005I ei:2.147483647e+09 event:session.upload sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:42.872Z url:file:///home/bjoerger/gravitational/teleport-config/local/storage/data/dir/main/log/records/multi/15a2c1a2-9913-48ba-ae53-651be25842ff events/emitter.go:325
2022-03-24T10:01:42-07:00 DEBU [AUTH:GRPC] Completed stream: <nil>. auth/grpcserver.go:259
2022-03-24T10:01:42-07:00 DEBU [AUTH:GRPC] Flushed and closed the stream. auth/grpcserver.go:190
2022-03-24T10:01:42-07:00 DEBU [UPLOAD]    Session upload completed. duration:6.473477ms session-id:15a2c1a2-9913-48ba-ae53-651be25842ff filesessions/fileasync.go:443

# grace period upload completes

# exit session after upload

2022-03-24T10:01:49-07:00 DEBU [SESSION:N] Encountered a fatal error Copying from PTY to writer error:[read /dev/ptmx: input/output error] srv/sess.go:766
2022-03-24T10:01:49-07:00 DEBU [NODE]      Exec request ("/home/bjoerger/gravitational/teleport/build/teleport") complete: 0 id:8 local:127.0.0.1:3022 login:bjoerger remote:127.0.0.1:36534 teleportUser:dev regular/sshserver.go:1377
2022-03-24T10:01:49-07:00 INFO [SESSION:N] Closing party cc89464a-2e7a-4848-99ab-16d54b84307e srv/sess.go:1392
2022-03-24T10:01:49-07:00 INFO [NODE]      Removing party ServerContext(127.0.0.1:36534->127.0.0.1:3022, user=bjoerger, id=8) party(id=cc89464a-2e7a-4848-99ab-16d54b84307e) from session 15a2c1a2-9913-48ba-ae53-651be25842ff id:8 local:127.0.0.1:3022 login:bjoerger remote:127.0.0.1:36534 teleportUser:dev srv/sess.go:1077
2022-03-24T10:01:49-07:00 DEBU [NODE]      Releasing associated resources - context has been closed. id:8 local:127.0.0.1:3022 login:bjoerger remote:127.0.0.1:36534 teleportUser:dev srv/monitor.go:253
2022-03-24T10:01:49-07:00 INFO [AUDIT]     session.data addr.local:127.0.0.1:3022 addr.remote:127.0.0.1:36534 code:T2006I ei:2.147483646e+09 event:session.data login:bjoerger namespace:default rx:9690 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:49.206Z tx:5879 uid:42a63caa-7db4-46cb-9a46-28ea720d8dce user:dev events/emitter.go:325
2022-03-24T10:01:49-07:00 INFO [SESSION:N] Session 15a2c1a2-9913-48ba-ae53-651be25842ff will be garbage collected. srv/sess.go:321
2022-03-24T10:01:49-07:00 INFO [SESSION:N] Party member cc89464a-2e7a-4848-99ab-16d54b84307e left session 15a2c1a2-9913-48ba-ae53-651be25842ff. srv/sess.go:1222
2022-03-24T10:01:49-07:00 DEBU [SSH:PROXY] Closed connection 127.0.0.1:36534. sshutils/server.go:471
2022-03-24T10:01:49-07:00 DEBU [PROXY]     Client 127.0.0.1:36534 disconnected. id:6 local:127.0.1.1:3080 login:bjoerger remote:127.0.0.1:36534 teleportUser:dev regular/sshserver.go:1364
2022-03-24T10:01:49-07:00 DEBU [NODE]      Subsystem proxySubsys(cluster=default/example.com, host=server01, port=0) finished with result: read tcp 127.0.0.1:60512->127.0.0.1:3022: use of closed network connection. regular/sshserver.go:1587
2022-03-24T10:01:49-07:00 DEBU [SSH:NODE]  Closed connection 127.0.0.1:36534. sshutils/server.go:471
2022-03-24T10:01:49-07:00 DEBU [PROXY]     Releasing associated resources - context has been closed. id:6 local:127.0.1.1:3080 login:bjoerger remote:127.0.0.1:36534 teleportUser:dev srv/monitor.go:253
2022-03-24T10:01:49-07:00 WARN             Failed to upload part. error:[
ERROR REPORT:
Original Error: *trace.NotFoundError open /home/bjoerger/gravitational/teleport-config/local/storage/data/dir/main/log/upload/streaming/default/multi/0bb39db8-95b0-4b5e-a15a-e87b17e82905/15a2c1a2-9913-48ba-ae53-651be25842ff/1.part: no such file or directory
Stack Trace:
	/home/bjoerger/gravitational/teleport/lib/events/filesessions/filestream.go:83 github.com/gravitational/teleport/lib/events/filesessions.(*Handler).UploadPart
	/home/bjoerger/gravitational/teleport/lib/events/stream.go:674 github.com/gravitational/teleport/lib/events.(*sliceWriter).startUpload.func1
	/home/bjoerger/.tools/go/src/runtime/asm_amd64.s:1581 runtime.goexit
User Message: open /home/bjoerger/gravitational/teleport-config/local/storage/data/dir/main/log/upload/streaming/default/multi/0bb39db8-95b0-4b5e-a15a-e87b17e82905/15a2c1a2-9913-48ba-ae53-651be25842ff/1.part: no such file or directory] events/stream.go:612
2022-03-24T10:01:49-07:00 INFO [AUDIT]     session.data addr.local:127.0.0.1:3022 addr.remote:127.0.0.1:36534 cluster_name:example.com code:T2006I ei:2.147483646e+09 event:session.data login:bjoerger namespace:default rx:9690 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:49.206Z tx:5879 uid:42a63caa-7db4-46cb-9a46-28ea720d8dce user:dev events/emitter.go:325
2022-03-24T10:01:49-07:00 DEBU [SESSION:N] Session has encountered 2 slow writes out of 30. Check disk and network on this server. events/auditwriter.go:384
2022-03-24T10:01:49-07:00 INFO [SESSION:N] Closing session 15a2c1a2-9913-48ba-ae53-651be25842ff. srv/sess.go:632
2022-03-24T10:01:49-07:00 DEBU [SESSION:N] Session has encountered 2 slow writes out of 30. Check disk and network on this server. events/auditwriter.go:384
2022-03-24T10:01:49-07:00 INFO [AUDIT]     session.leave cluster_name:example.com code:T2003I ei:28 event:session.leave namespace:default server_addr:127.0.0.1:3022 server_hostname:server01 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 server_labels:map[arch:x86_64 cluster:example.com env:staging hostname:bjoerger-ThinkPad-X1-Extreme] sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:49.206Z uid:9bc972c9-0825-430f-8fc1-05b48d2c91cf user:dev events/emitter.go:325
2022-03-24T10:01:49-07:00 DEBU [TERM:LOCA] Closed PTY srv/term.go:300
2022-03-24T10:01:49-07:00 DEBU [SESSION:N] Stopping poll and sync of terminal size to all parties. srv/sess.go:1176
2022-03-24T10:01:49-07:00 INFO [AUDIT]     session.leave cluster_name:example.com code:T2003I ei:28 event:session.leave namespace:default server_addr:127.0.0.1:3022 server_hostname:server01 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 server_labels:map[arch:x86_64 cluster:example.com env:staging hostname:bjoerger-ThinkPad-X1-Extreme] sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:49.206Z uid:9bc972c9-0825-430f-8fc1-05b48d2c91cf user:dev events/emitter.go:325
2022-03-24T10:01:49-07:00 INFO [AUDIT]     session.end cluster_name:example.com code:T2004I ei:29 enhanced_recording:false event:session.end interactive:true namespace:default participants:[dev] server_addr:127.0.0.1:3022 server_hostname:server01 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 server_labels:map[arch:x86_64 cluster:example.com env:staging hostname:bjoerger-ThinkPad-X1-Extreme] session_recording:node session_start:2022-03-24T17:01:32.013503299Z session_stop:2022-03-24T17:01:49.206600381Z sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:49.207Z uid:77ec8d9d-b73a-4d20-a8cd-21e4194f005e user:dev events/emitter.go:325
2022-03-24T10:01:49-07:00 INFO [AUDIT]     session.end cluster_name:example.com code:T2004I ei:29 enhanced_recording:false event:session.end interactive:true namespace:default participants:[dev] server_addr:127.0.0.1:3022 server_hostname:server01 server_id:f58992c2-576d-42bd-be86-498aa37bd3e3 server_labels:map[arch:x86_64 cluster:example.com env:staging hostname:bjoerger-ThinkPad-X1-Extreme] session_recording:node session_start:2022-03-24T17:01:32.013503299Z session_stop:2022-03-24T17:01:49.206600381Z sid:15a2c1a2-9913-48ba-ae53-651be25842ff time:2022-03-24T17:01:49.207Z uid:77ec8d9d-b73a-4d20-a8cd-21e4194f005e user:dev events/emitter.go:325

# Attempt to play session recording

2022-03-24T10:03:59-07:00 DEBU [AUDIT]     GetSessionEvents. afterN:0 printEvents:true sid:15a2c1a2-9913-48ba-ae53-651be25842ff events/auditlog.go:896
2022-03-24T10:03:59-07:00 DEBU [AUDIT]     Recording 15a2c1a2-9913-48ba-ae53-651be25842ff is already downloaded and unpacked to /home/bjoerger/gravitational/teleport-config/local/storage/data/dir/main/log/playbacks/sessions/default/15a2c1a2-9913-48ba-ae53-651be25842ff.tar. events/auditlog.go:662
2022-03-24T10:03:59-07:00 DEBU [WEB]       Unable to find events for session 15a2c1a2-9913-48ba-ae53-651be25842ff. error:[
ERROR REPORT:
Original Error: *trace.NotFoundError session &#34;15a2c1a2-9913-48ba-ae53-651be25842ff&#34; not found
Stack Trace:
	/home/bjoerger/gravitational/teleport/lib/events/auditlog.go:586 github.com/gravitational/teleport/lib/events.readSessionIndex
	/home/bjoerger/gravitational/teleport/lib/events/auditlog.go:535 github.com/gravitational/teleport/lib/events.(*AuditLog).readSessionIndex
	/home/bjoerger/gravitational/teleport/lib/events/auditlog.go:916 github.com/gravitational/teleport/lib/events.(*AuditLog).GetSessionEvents
	/home/bjoerger/gravitational/teleport/lib/auth/auth_with_roles.go:2170 github.com/gravitational/teleport/lib/auth.(*ServerWithRoles).GetSessionEvents
	/home/bjoerger/gravitational/teleport/lib/auth/apiserver.go:2167 github.com/gravitational/teleport/lib/auth.(*APIServer).getSessionEvents
	/home/bjoerger/gravitational/teleport/lib/auth/apiserver.go:305 github.com/gravitational/teleport/lib/auth.(*APIServer).withAuth.func1
	/home/bjoerger/gravitational/teleport/lib/httplib/httplib.go:67 github.com/gravitational/teleport/lib/httplib.MakeHandlerWithErrorWriter.func1
	/home/bjoerger/gravitational/teleport/vendor/github.com/julienschmidt/httprouter/router.go:387 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
	/home/bjoerger/gravitational/teleport/lib/httplib/httplib.go:184 github.com/gravitational/teleport/lib/httplib.RewritePaths.func1
	/home/bjoerger/.tools/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
	/home/bjoerger/gravitational/teleport/lib/auth/middleware.go:557 github.com/gravitational/teleport/lib/auth.(*Middleware).ServeHTTP
	/home/bjoerger/gravitational/teleport/vendor/github.com/gravitational/oxy/ratelimit/tokenlimiter.go:118 github.com/gravitational/oxy/ratelimit.(*TokenLimiter).ServeHTTP
	/home/bjoerger/gravitational/teleport/vendor/github.com/gravitational/oxy/connlimit/connlimit.go:75 github.com/gravitational/oxy/connlimit.(*ConnLimiter).ServeHTTP
	/home/bjoerger/.tools/go/src/net/http/server.go:2879 net/http.serverHandler.ServeHTTP
	/home/bjoerger/.tools/go/src/net/http/server.go:1930 net/http.(*conn).serve
	/home/bjoerger/.tools/go/src/runtime/asm_amd64.s:1581 runtime.goexit
Caught:
	/home/bjoerger/gravitational/teleport/lib/httplib/httplib.go:142 github.com/gravitational/teleport/lib/httplib.ConvertResponse
	/home/bjoerger/gravitational/teleport/lib/auth/clt.go:284 github.com/gravitational/teleport/lib/auth.(*Client).Get
	/home/bjoerger/gravitational/teleport/lib/auth/clt.go:1363 github.com/gravitational/teleport/lib/auth.(*Client).GetSessionEvents
	/home/bjoerger/gravitational/teleport/lib/web/apiserver.go:2316 github.com/gravitational/teleport/lib/web.(*Handler).siteSessionEventsGet
	/home/bjoerger/gravitational/teleport/lib/web/apiserver.go:2479 github.com/gravitational/teleport/lib/web.(*Handler).WithClusterAuth.func1
	/home/bjoerger/gravitational/teleport/lib/httplib/httplib.go:67 github.com/gravitational/teleport/lib/httplib.MakeHandlerWithErrorWriter.func1
	/home/bjoerger/gravitational/teleport/vendor/github.com/julienschmidt/httprouter/router.go:387 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
	/home/bjoerger/.tools/go/src/net/http/server.go:2090 net/http.StripPrefix.func1
	/home/bjoerger/.tools/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
	/home/bjoerger/gravitational/teleport/lib/web/apiserver.go:444 github.com/gravitational/teleport/lib/web.NewHandler.func1
	/home/bjoerger/.tools/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
	/home/bjoerger/gravitational/teleport/vendor/github.com/julienschmidt/httprouter/router.go:448 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
	/home/bjoerger/gravitational/teleport/lib/web/apiserver.go:215 github.com/gravitational/teleport/lib/web.(*WebAPIHandler).ServeHTTP
	/home/bjoerger/gravitational/teleport/vendor/github.com/gravitational/oxy/ratelimit/tokenlimiter.go:118 github.com/gravitational/oxy/ratelimit.(*TokenLimiter).ServeHTTP
	/home/bjoerger/gravitational/teleport/vendor/github.com/gravitational/oxy/connlimit/connlimit.go:75 github.com/gravitational/oxy/connlimit.(*ConnLimiter).ServeHTTP
	/home/bjoerger/.tools/go/src/net/http/server.go:2879 net/http.serverHandler.ServeHTTP
	/home/bjoerger/.tools/go/src/net/http/server.go:1930 net/http.(*conn).serve
	/home/bjoerger/.tools/go/src/runtime/asm_amd64.s:1581 runtime.goexit
User Message: session &#34;15a2c1a2-9913-48ba-ae53-651be25842ff&#34; not found
] web/apiserver.go:2318
``

Related to https://github.com/gravitational/teleport/issues/10983
@Joerger Joerger added the bug label Mar 22, 2022
@Joerger Joerger self-assigned this Mar 22, 2022
@zmb3 zmb3 added the audit-log Issues related to Teleports Audit Log label Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-log Issues related to Teleports Audit Log bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants