-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot start Teleport with "distant" DynamoDB backend #31690
Comments
The experience of using Teleport when a backend Delete takes longer than 300ms is going to be somewhat poor regardless, but maybe we could release the lock in the background? Or just give a generous enough time for the lock release to handle all wired connections across the planet, I suppose. There's other uses of RunWhileLocked in the codebase, none of which alter |
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
The default release timeout is now a minute to allow slow/distant connections to the backend to complete releasing the lock. Closes #31690
Expected behavior:
Teleport starts and runs when configured with a DynamoDB backend in a far part of the world.
Current behavior:
Teleport does not start, logging a fatal error -
Bug details:
The root of this problem is that
InitCluster()
is run underRunWhileLocked()
(https://github.com/gravitational/teleport/blob/v14.0.0-beta.1/lib/auth/init.go#L281) with a default lock release timeout of 300ms (https://github.com/gravitational/teleport/blob/v14.0.0-beta.1/lib/backend/helpers.go#L181). When the AWS region is "far" away, this is not enough time for the lock to be released, so the lock release fails even though the process was successfully initalised.In the above sample configuration, I have the region as
ap-southeast-2
(Sydney, Australia). I am in Melbourne, Australia so this works for me. But if I useus-west-2
(Oregon), I get the above failure trace. The RTT from Australia to the US across the pacific is typically about 200ms at best, so a 300ms lock release timeout appears to be too short for that.The error message output for this failure gives no indication at all of what the error is. I needed to run teleport with
DEBUG=1
to get a stack trace of the failure, and apply my understanding of "context deadline exeeded" as a Go programmer to be able to figure out what was going on. A non-developer user probably wouldn't have a chance of diagnosing this from the reported error.The text was updated successfully, but these errors were encountered: