-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the amount of time session trackers linger #44910
Conversation
When a session completes, the correspondingsession tracker is put into a terminated state and its expiry is updated so that it will naturally expire shortly in the future. Teleport's upload completer runs periodically and looks for uploads for which there is not a session tracker. If it finds any, it assumes there was an error and that no more data will be uploaded so it completes or aborts the upload. There is a race condition here - if the TTL of the terminated session tracker is shorter than the time it takes for the session to be uploaded, then the upload completer may mistakenly determine that a session tracker which just recently expired (and is still being uploaded) needs to be completed, causing the upload to be cut short. Increase the TTL of terminated sessions to drastically reduce the chances of this race occurring. Note that the web UI's active sessions page already filters out trackers with a state of terminated, so there should not be any user-visible change here.
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
That's another option @espadolini, but the semantics aren't quite the same. If we restore the 24-hour grace period we'll prevent the issue for any sessions less than 24 hours, but the issue will remain for sessions that last longer than 24 hours (the completer will try to pick it up as soon as the session tracker is gone, even if the upload is in progress). Sessions that do last this long are likely to have more data and take longer to upload, making the impact of the issue even greater. WDYT? I think this approach probably makes more sense if we can address your concern about session tracker buildup by manually deleting the trackers if we know an upload completed successfully rather than waiting for them to expire.. |
Superseded by a different approach (grace time based on upload creation time and last part uploaded time) in #44941 |
When a session completes, the corresponding session tracker is put into a terminated state and its expiry is updated so that it will naturally expire shortly in the future.
Teleport's upload completer runs periodically and looks for uploads for which there is not a session tracker. If it finds any, it assumes there was an error and that no more data will be uploaded so it completes or aborts the upload.
There is a race condition here - if the TTL of the terminated session tracker is shorter than the time it takes for the session to be uploaded, then the upload completer may mistakenly determine that a session tracker which just recently expired (and is still being uploaded) needs to be completed, causing the upload to be cut short.
Increase the TTL of terminated sessions to drastically reduce the chances of this race occurring. Note that the web UI's active sessions page already filters out trackers with a state of terminated, so there should not be any user-visible change here.