Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar Overwrites Secrets Generated By Init Container #162

Closed
borchero opened this issue Jul 30, 2020 · 9 comments
Closed

Sidecar Overwrites Secrets Generated By Init Container #162

borchero opened this issue Jul 30, 2020 · 9 comments
Labels
bug Something isn't working

Comments

@borchero
Copy link

Describe the bug
I am currently experiencing an issue where I request dynamic database credentials from Vault. The init container initially fetches the credentials and correctly stores them. When the main container starts up, it reads the contents of that file and can connect to the database. However, the sidecar refreshes the secret and apparently uses a new lease for obtaining database credentials. The file is therefore overwritten, only the new (unused) credentials are continuously extended and the client loses access after an hour (the credential's TTL).

To Reproduce

  1. Use the following manifest to deploy Drone
  2. After the database credential's TTL, the application loses access to the database
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: test
  template:
    metadata:
      labels:
        app.kubernetes.io/name: test
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: default.test
        vault.hashicorp.com/agent-init-first: "true"
        vault.hashicorp.com/ca-cert: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        vault.hashicorp.com/agent-requests-cpu: 10m
        vault.hashicorp.com/agent-inject-secret-env-db: postgres/creds/postgres
        vault.hashicorp.com/agent-inject-template-env-db: |
          {{ with secret "postgres/creds/postgres" }}
          export DRONE_DATABASE_DATASOURCE=postgres://{{ .Data.username }}:{{ .Data.password }}@postgres.example.com:5432/postgres
          {{ end }}
    spec:
      serviceAccountName: service-account
      containers:
      - name: server
        image: drone:1.9.0
        command:
        - sh
        - -c
        - source /vault/secrets/env-db && /bin/drone-server
        ports:
        - name: http
          containerPort: 80
          protocol: TCP

Expected behavior
The sidecar container should use the same token as the init container for obtaining database credentials.

Environment

  • GKE (v1.17.7-gke.15)
  • vault-k8s version: v.1.4.2
  • Vault uses TLS via a certificate signed by the Kubernetes CA
@borchero borchero added the bug Something isn't working label Jul 30, 2020
@borchero
Copy link
Author

As a quick fix I decided to add the following annotation:

vault.hashicorp.com/agent-pre-populate: "false"

This way, the init container is not run anymore, however, my application containers fails a couple times until the sidecar initially fetched the secrets. Surely not an ideal solution.

@tvoran
Copy link
Member

tvoran commented Aug 1, 2020

Hi @borchero, we've run into this issue in similar setups. One (albeit hacky) workaround is to add a delay before your application runs, something like:

command: ["/bin/sh"]
args: ["-c", "sleep 5; ./app"]

@borchero
Copy link
Author

borchero commented Aug 2, 2020

Thanks for your suggestion. I think my temporary solution is safer though (as there is no "right" duration to sleep since fetching the actual credentials could obviously take longer). Do you already have a fix on your roadmap? Otherwise, I might open a PR in the near future.

@tvoran
Copy link
Member

tvoran commented Aug 6, 2020

We've discussed some possible solutions, and sidecar container ordering may end up solving this for us (kubernetes/enhancements#753), but PRs are welcome!

@schmurfy
Copy link

schmurfy commented Jun 3, 2021

I got hit by this, for a while now we had a random weird issue where some services would start and then throw a bunch of errors after 1 hour when they failed to connect to consul. I finally found the time to setup the audit log for Vault and after that quickly found that not one but two consul tokens where issued on each pod startup, looking at the vault-agent and vault-agent-init logs I noticed they both created the token file which led me to this issue.

To be honest I just don't get the defaults, you can get use Vault for two main reasons:

  • stored static secrets: in that case the init container is fine and I would argue you only need vault-agent-init
  • dynamic secret (consul token in my case), in that case what is the point of having an init container knowing that in all cases whatever credentials it will get will be overwritten and never renewed ? In that case you only need vault-agent

Instead of having bot containers why not just choose one and disable the other by default and add a big warning in the documentation ? It took me far too much time to diagnose this issue and I even doubted Vault as a whole where in fact it ended up being only a dubious default configuration choice and a lack of documentation issue.

If the default was to not have the init container it would have taken me far less time to notice that the file wasn't available yet and in fact I fixed that issue even before it could even occurred by waiting for the file to appear, it just requires basic kubernetes knowledge to know that all containers in a pod will start at their own pace with no relation to the others.

I hope a solution is found to avoid others this kind of disagreements, it is already a complex enough system to setup as it is.

PS: the 1 hour above was the TTL of the consul token

@tvoran
Copy link
Member

tvoran commented Jun 4, 2021

Hi folks, this issue was addressed by the addition of persistent caching between init and sidecar containers (GH-229).

More docs here and here.

@schmurfy
Copy link

schmurfy commented Jun 4, 2021

@tvoran I did not notice this but my point still holds since its default is false, if you just get started you should have some sane defaults which make everything works and then you can look at the numerous options to tweak your usage.
Looking at each and every option should not be mandatory.

@tvoran
Copy link
Member

tvoran commented Jun 5, 2021

@schmurfy Sure, and I think in the future we will default persistent caching to true in the injector, since it fixes this annoying bug. It's at false currently since it's a relatively new feature. Thanks for your thoughts!

@tvoran tvoran closed this as completed Jun 5, 2021
@aiman-alsari
Copy link

can we change this to default to true now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants