-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Force disable database secrets engine #5293
Comments
I hit this and got pretty stuck. I was hitting two problems. The first was I couldn't revoke leases. The vault CLI however has a -f force flag and a -prefix flag (required for force). That allow me to delete all the leases.
That will probably work for most people. However, I had a second problem. I had two vault clusters that replicate between each other, and my VAULT_ADDR was pointed at the secondary replication cluster, not the primary. Once I pointed at the primary I was able to clean out the leases and disable the database engine properly. |
Hi @fraajad! I think @scallister described the solution very well. You can use I will close this issue for now since I don't see a reason for an additional command which basically would do the same like Cheers, |
I have stumbled on this issue after unsuccessfully trying to disable some unreachable mysql and mongo secret engines (These are of the kind deprecated in 0.7.1):
This is a pretty big inconvenience, because when mongo retries multiple times to revoke credentials, it reaches the configured throughput of the DynamoDB backend and becomes unresponsive. Eventually the maximum retries is reached and the vault stabilizes, but I have pretty hefty downtimes unless I scale Dynamodb esponentially. |
By reading the logs I have found out that, while going in timeout, each run would revoke some leases. |
So we did this, and it eventually kinda works, but it took literally days for the |
Thanks for coming back to chime in on this! We can re-open this, for sure, and re-evaluate possible ways to help ameliorate the pain. |
There is also another approach to dealing with this in terms of removing the mount reference via recovery-mode - that could be scripted with See the Support KB they have: Recovery using recovery-mode - Disabling & Deleting Mounts You can just do the disable portion and worry about the deletions incrementally and in smaller portions via In the case integrated storage / raft - when you've stopped the service (for recovery) - you can also use other bbolt compatible utilities (like: boltdb) to perform the deletion directly on boltdb file which I believe is likely the fastest approach to dealing with the recursion. The only draw back is that you'd either need to perform this on all the nodes or first down scale to a single leader only cluster - perform action - then scale back up. Still could be minutes as opposed to hours or days. The issue here is two fold:
What's more assuming you have Anyway I'd been keen to hear if any of these notes here may have helped anyone else. |
Is this feature available? Can we delete the secret engine and it's leases when the backend database is deleted or the connection with backend database cannot be established? |
A few seconds? That sounds extraordinarily optimistic to me. You're talking about restarting services, providing unseal/recovery keys, running various commands, and then getting Vault restarted back into production mode. IMO that's a number of minutes even for a well-practiced, experienced Vault operator. Also, whilst I really appreciate that Vault does have options such as recovery mode and sys/raw, they are incredibly powerful tools that are risky to use unless you have intimate knowledge of Vault internals. They really shouldn't be the go-to answer for what is a not particularly rare operational issue. I think there's a far simpler way forward here ... the Then, the other part of the problem is to make it easier for Vault operators to discover what they need to, when a secrets engine disable fails. The easier option would be a more detailed, verbose error message. The nicer, more polished, option, would be an option for enabling forced lease revocation as a built-in part of the secrets engine disable operation, as per the original feature request here. |
This is also an excellent point referring to the API: |
I feel like the intent of my comment may not have been understood. The point I was trying to make, is that the existing |
See recent conversation in #9420 for an example of a user who was blocked because of slowly-failing revocation attempts when using |
My most common experience of this is with PKI certificates where consumer surpass 5 million or 10 million certs that they have no account of; neither having any sense of how long all those certs took to generate nor how far back the oldest may be. Separating by role or even different mounts for different TLD / sub-domains for example can help. While generally there are progression indicators when disabling mounts (with PKI on Thinking aloud these sorts of clean-up and even export type activities may be better pursued by standby nodes (1 or 2 of out a set) or offline entirely via snapshots that a next leader could then negotiate on allowing for others to replicate accordingly. |
I'm not really convinced that's applicable to this issue, though. The challenges of PKI are somewhat different to databases. To my mind, this issue is really tracking two different things, neither of which are shared with the PKI secrets engine:
|
Hi folks, is this still an issue in recent versions of Vault? Can we clarify the current state of the issue so we can bubble it up as needed? Thanks! |
Since it's been awhile with no response, I'm going to close this issue. Please reopen if there's more to add here. |
Is your feature request related to a problem? Please describe.
I ran into a few panics from database credential revocation and wish it was easier to recover from them. First #4846 while on 0.10.4 so I upgrade to 0.11.0 and immediately hit #5262. I was able to get Vault running by turning the linux clock back in time but then had trouble resolving the issue so I could get a running system again. I didn't see the fix in #5262 at the time, and all I could come up with was creating a build of vault that had revocation removed and using it to disable the database mount.
Describe the solution you'd like
If there was something like
vault lease revoke -force
for the database mount ievault secrets disable -force database
that could destroy the mount without doing the revocation it would help to remove the mount when it is not working or misconfigured.Describe alternatives you've considered
Its possible that once all the unexpected returns are accounted for this won't be an issue that comes up anymore.
Explain any additional use-cases
While testing the database secret engine a user sometimes puts the wrong configs in and just wants to return to a clean slate without caring about what is left on the test DB.
The text was updated successfully, but these errors were encountered: