Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the amount of time session trackers linger #44910

Closed
wants to merge 1 commit into from

Conversation

zmb3
Copy link
Collaborator

@zmb3 zmb3 commented Aug 1, 2024

When a session completes, the corresponding session tracker is put into a terminated state and its expiry is updated so that it will naturally expire shortly in the future.

Teleport's upload completer runs periodically and looks for uploads for which there is not a session tracker. If it finds any, it assumes there was an error and that no more data will be uploaded so it completes or aborts the upload.

There is a race condition here - if the TTL of the terminated session tracker is shorter than the time it takes for the session to be uploaded, then the upload completer may mistakenly determine that a session tracker which just recently expired (and is still being uploaded) needs to be completed, causing the upload to be cut short.

Increase the TTL of terminated sessions to drastically reduce the chances of this race occurring. Note that the web UI's active sessions page already filters out trackers with a state of terminated, so there should not be any user-visible change here.

When a session completes, the correspondingsession tracker is put
into a terminated state and its expiry is updated so that it will
naturally expire shortly in the future.

Teleport's upload completer runs periodically and looks for uploads
for which there is not a session tracker. If it finds any, it assumes
there was an error and that no more data will be uploaded so it
completes or aborts the upload.

There is a race condition here - if the TTL of the terminated session
tracker is shorter than the time it takes for the session to be
uploaded, then the upload completer may mistakenly determine that
a session tracker which just recently expired (and is still being
uploaded) needs to be completed, causing the upload to be cut short.

Increase the TTL of terminated sessions to drastically reduce the
chances of this race occurring. Note that the web UI's active sessions
page already filters out trackers with a state of terminated, so there
should not be any user-visible change here.
Copy link

github-actions bot commented Aug 1, 2024

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@github-actions github-actions bot requested review from fheinecke and ryanclark August 1, 2024 02:47
@github-actions github-actions bot added audit-log Issues related to Teleports Audit Log size/sm labels Aug 1, 2024
@zmb3 zmb3 requested a review from sclevine August 1, 2024 02:49
Copy link

github-actions bot commented Aug 1, 2024

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

@espadolini
Copy link
Contributor

Why not restore the upload grace period part of #11551 instead? Wouldn't that give us roughly the same behavior? We deliberately shortened the TTL from up to one hour to 3 minutes in #17981, and this PR would extend it more than that instead.

@zmb3
Copy link
Collaborator Author

zmb3 commented Aug 1, 2024

That's another option @espadolini, but the semantics aren't quite the same.

If we restore the 24-hour grace period we'll prevent the issue for any sessions less than 24 hours, but the issue will remain for sessions that last longer than 24 hours (the completer will try to pick it up as soon as the session tracker is gone, even if the upload is in progress).

Sessions that do last this long are likely to have more data and take longer to upload, making the impact of the issue even greater.

WDYT?

I think this approach probably makes more sense if we can address your concern about session tracker buildup by manually deleting the trackers if we know an upload completed successfully rather than waiting for them to expire..

@espadolini
Copy link
Contributor

Superseded by a different approach (grace time based on upload creation time and last part uploaded time) in #44941

@espadolini espadolini closed this Aug 2, 2024
@espadolini espadolini deleted the zmb3/session-upload branch August 2, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-log Issues related to Teleports Audit Log size/sm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants