Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Vault Agent: Add configuration for maximum connection attempts #8135

Closed
wants to merge 5 commits into from

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Jan 10, 2020

WIP

Vault Agent can be configured to attempt to authenticate with a Vault cluster that is not responsive. When this occurs, the goroutine running the Auth Method can enter an infinite loop of attempting to connect and receiving tcp dial errors like so:

2020-01-10T16:33:13.848-0600 [INFO]  auth.handler: authenticating
2020-01-10T16:33:13.849-0600 [ERROR] auth.handler: error authenticating: error="Put http://127.0.0.1:8200/v1/auth/approle/login: dial tcp 127.0.0.1:8200: connect: connection refused" backoff=2.496237872

This repeats until either Vault becomes responsive or the Agent process is terminated. Note that this is different than a failed auth attempt or responses from a sealed Vault; the dial error occurs when no connection can be made at the tcp layer. Because of that fact the existing VAULT_CLIENT_TIMEOUT and VAULT_MAX_RETRIES configuration options are never even considered, and Agent attempts to authenticate indefinitely.

This pull request adds an additional configuration, max-connection-attempts which is specific to Vault Agent. This option can limit the number of failed connection attempts before exiting the Agent process. The default is 0, which indicates "unlimited", so that the current behavior is unchanged and this change is backwards compatible.

The maximum connection attempts can be set with the -max-connection-attempts command line flag, the VAULT_MAX_CONNECTION_ATTEMPTS environment variable, or the max_connection_attempts configuration variable, in that order of precedence.


Remaining items:

  • Clean up / add code comments
  • Agent web documentation

command/agent.go Outdated
Name: "max-connection-attempts",
Target: &c.flagMaxConnectionAttempts,
Default: 0,
EnvVar: "VAULT_MAX_CONNECTION_ATTEMPTS",
Copy link
Contributor

@kalafut kalafut Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is connection attempts preferred over a timeout (duration) as a configuration? As a user I know what "60s" means, but I don't know how long a connection attempt takes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair question. The original request for this was for something akin to VAULT_MAX_RETRIES. A timeout could likely work just as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout works for me!

ah.logger.Info("starting auth handler")
defer func() {
am.Shutdown()
close(ah.OutputCh)
close(ah.DoneCh)
close(ah.TemplateTokenCh)
ah.logger.Info("auth handler stopped")
if connTimer != nil {
if !connTimer.Stop() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this example in the Go docs but I don't really understand it, and it feels like we could hang in the channel receive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants