Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA mode for injector #238

Closed
martinhaus opened this issue Mar 23, 2020 · 11 comments
Closed

HA mode for injector #238

martinhaus opened this issue Mar 23, 2020 · 11 comments
Labels
injector Area: mutating webhook service question A general question about usage

Comments

@martinhaus
Copy link

Is there a specific reason why injector deployment is set to a single replica?
Wouldn't it be better to have multiple replicas with PDB specified?

Right now when the cluster is scaling and injector pod gets rescheduled on a different node, other pods can have problems starting up since they won't be to load secrets while the injector is spawning.

@tvoran tvoran added injector Area: mutating webhook service question A general question about usage labels Mar 27, 2020
@zystem
Copy link

zystem commented Apr 28, 2020

Injector is critical component. But helm for it did not support "Node-Selectors", "Tolerations" and "Affinity"

@pajel
Copy link

pajel commented Apr 28, 2020

Upvoting. Would be great to have the replicas of the "vault-agent-injector" deployment customisable. Thanks.

@gopisaba
Copy link

I had similar issue today, some of the pods tried to restart at the same which includes vault-agent-injector pod. Since the service was down at that time, the mutation webhook failed to inject the init container to the other pods. They all started crashing as they were unable to find the secrets. Once the vault-agent-injector service is up (after seconds), restarting other containers started working.

vault-agent-injector is critical component and we must have multiple replicas, node-selector, tolerations and affinity.

@camilorivera
Copy link

Same, HA for the injector should be a priority

@drpebcak
Copy link

drpebcak commented Sep 1, 2020

Definitely should be a priority. Causes all sort of weird issues when the injector pod is rescheduled at the same time as others.

@rcjames
Copy link

rcjames commented Sep 25, 2020

An explanation for why this is not configurable was offered in #331 (comment). This still seems like a crucial feature though, so hopefully the bug will be addressed soon.

More details about referenced bug: hashicorp/vault-k8s#141

@mitchellmaler
Copy link

mitchellmaler commented Nov 10, 2020

The way I got around this was using cert-manager and its CA Injector. This allowed cert manager to generate the cert for the webhook and also patch the MutatingWebhookConfiguration automatically with the cabundle. I would be curious if this could be an acceptable enhancement to the helm chart to allow using cert-manager and adding the annotation the the webhook config instead of recreating this functionality?

@orirawlings
Copy link
Contributor

The way my team worked around this was to just block pod creation until the vault-agent-injector is available once again, since generally outages due to rescheduling the vault-agent-injector pod are quite brief. We did this by ensuring that we could configure the failurePolicy on the MutatingWebhookConfiguration. See #400

Even if you have multiple replicas and a PDB, there are still scenarios where you might lose all your replicas at once, for example, due to the unexpected failure of multiple nodes (i.e. any event where PDB is not consulted). Instituting a failurePolicy is a good way to ensure that the API server isn't allowing any misconfigured pods to be created.

@tvoran
Copy link
Member

tvoran commented Jan 6, 2021

Hi folks, in v0.9.0 we added support for multiple injector replicas, and we're adding documentation on the config options in this PR: hashicorp/vault#10659

@mitchellmaler I like the idea of leveraging cert-manger for this too. At the very least I could see that setup being a nice addition to our website docs.

@MeijerM1
Copy link

MeijerM1 commented Dec 3, 2021

Multiple replicas simply isn't enough to guarantee HA. The ability to configure PBD and Affinity (already possible) are necessary to make sure an injector is available at all times.

@tvoran
Copy link
Member

tvoran commented Jan 7, 2022

@MeijerM1 Yep, agreed. A configurable PDB was added in #653 and will go out with the next release.

The multiple replica support was improved in v0.16.0, and cert-manager support was improved in v0.15.0 with an example documented here: https://www.vaultproject.io/docs/platform/k8s/helm/examples/injector-tls-cert-manager

Closing for now. Thanks for all the input folks!

@tvoran tvoran closed this as completed Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
injector Area: mutating webhook service question A general question about usage
Projects
None yet
Development

No branches or pull requests