Feature Request: Force disable database secrets engine #5293

fraajad · 2018-09-06T20:00:56Z

Is your feature request related to a problem? Please describe.
I ran into a few panics from database credential revocation and wish it was easier to recover from them. First #4846 while on 0.10.4 so I upgrade to 0.11.0 and immediately hit #5262. I was able to get Vault running by turning the linux clock back in time but then had trouble resolving the issue so I could get a running system again. I didn't see the fix in #5262 at the time, and all I could come up with was creating a build of vault that had revocation removed and using it to disable the database mount.

Describe the solution you'd like
If there was something like vault lease revoke -force for the database mount ie vault secrets disable -force database that could destroy the mount without doing the revocation it would help to remove the mount when it is not working or misconfigured.

Describe alternatives you've considered
Its possible that once all the unexpected returns are accounted for this won't be an issue that comes up anymore.

Explain any additional use-cases
While testing the database secret engine a user sometimes puts the wrong configs in and just wants to return to a clean slate without caring about what is left on the test DB.

The text was updated successfully, but these errors were encountered:

scallister · 2019-07-26T17:43:36Z

I hit this and got pretty stuck. I was hitting two problems. The first was I couldn't revoke leases. The vault CLI however has a -f force flag and a -prefix flag (required for force). That allow me to delete all the leases.

vault lease revoke -f -prefix database/creds/

That will probably work for most people. However, I had a second problem. I had two vault clusters that replicate between each other, and my VAULT_ADDR was pointed at the secondary replication cluster, not the primary. Once I pointed at the primary I was able to clean out the leases and disable the database engine properly.

michelvocks · 2019-12-04T13:34:49Z

Hi @fraajad!

I think @scallister described the solution very well. You can use vault lease revoke -f -prefix to revoke all leases from a specific secret engine without caring about existing users in the remote database. With vault list sys/leases/lookup you are also able to browse existing leases and see if and where these exist.

I will close this issue for now since I don't see a reason for an additional command which basically would do the same like vault lease revoke. Feel free to open a new issue if you think otherwise.

Cheers,
Michel

stepps · 2020-02-19T18:11:59Z

I have stumbled on this issue after unsuccessfully trying to disable some unreachable mysql and mongo secret engines (These are of the kind deprecated in 0.7.1):

vault lease revoke -f -prefix mongo-r9/
Warning! Force-removing leases can cause Vault to become out of sync with secret engines!
Error force revoking leases with prefix mongo-r9/creds: context deadline exceeded

This is a pretty big inconvenience, because when mongo retries multiple times to revoke credentials, it reaches the configured throughput of the DynamoDB backend and becomes unresponsive.

Eventually the maximum retries is reached and the vault stabilizes, but I have pretty hefty downtimes unless I scale Dynamodb esponentially.

stepps · 2020-02-20T14:48:40Z

By reading the logs I have found out that, while going in timeout, each run would revoke some leases.
From here it was easy to put the revoke command in a loop and delete the secret engine I was stuck on

blairdrummond · 2022-01-20T21:20:44Z

So we did this, and it eventually kinda works, but it took literally days for the vault lease revoke -f -prefix secret_engine/ loop to remove all the secrets. Would it be possible to re-open this and have the engine disabled and abandon revoking the leases? This is really an issue in cases where the remote system was decommissioned.

heatherezell · 2022-01-21T19:40:53Z

So we did this, and it eventually kinda works, but it took literally days for the vault lease revoke -f -prefix secret_engine/ loop to remove all the secrets. Would it be possible to re-open this and have the engine disabled and abandon revoking the leases? This is really an issue in cases where the remote system was decommissioned.

Thanks for coming back to chime in on this! We can re-open this, for sure, and re-evaluate possible ways to help ameliorate the pain.

aphorise · 2022-09-01T20:44:55Z

There is also another approach to dealing with this in terms of removing the mount reference via recovery-mode - that could be scripted with jq and other tool chains like that if you have the unseal / recovery keys. While this may encounter a down-time of a few seconds to load into recovery mode and perform the needed actions - it's more predicable IMO.

See the Support KB they have: Recovery using recovery-mode - Disabling & Deleting Mounts

You can just do the disable portion and worry about the deletions incrementally and in smaller portions via vault delete ... if you do not have other recursive methods available to you store natively (like in consul kv delete -recurse vault/...).

In the case integrated storage / raft - when you've stopped the service (for recovery) - you can also use other bbolt compatible utilities (like: boltdb) to perform the deletion directly on boltdb file which I believe is likely the fastest approach to dealing with the recursion. The only draw back is that you'd either need to perform this on all the nodes or first down scale to a single leader only cluster - perform action - then scale back up. Still could be minutes as opposed to hours or days.

The issue here is two fold:

How the failure of forced revocations are dealt with by plugin or if that's implemented correctly and even if so what happens when the store refuses to delete it?
Deletion can not performed in a single go due to the recursion thus needing to be sort of background and performed in capped batches (taking even longer and second to other activity).

What's more assuming you have raw_storage_mode enabled (that might be bit of an issue) then another approach can also be force deleting the correlating /sys/raw/... path of secrets as much as possible where there'd be nothing preventing the disabling of that empty mount.

Anyway I'd been keen to hear if any of these notes here may have helped anyone else.

AbdullahAlShaad · 2023-06-06T12:07:17Z

Is this feature available? Can we delete the secret engine and it's leases when the backend database is deleted or the connection with backend database cannot be established?

maxb · 2023-06-06T12:40:53Z

While this may encounter a down-time of a few seconds to load into recovery mode and perform the needed actions

A few seconds? That sounds extraordinarily optimistic to me. You're talking about restarting services, providing unseal/recovery keys, running various commands, and then getting Vault restarted back into production mode. IMO that's a number of minutes even for a well-practiced, experienced Vault operator.

Also, whilst I really appreciate that Vault does have options such as recovery mode and sys/raw, they are incredibly powerful tools that are risky to use unless you have intimate knowledge of Vault internals.

They really shouldn't be the go-to answer for what is a not particularly rare operational issue.

I think there's a far simpler way forward here ... the revoke-force operation we have today attempts to revoke the lease, and then deletes it anyway if the revocation fails. This means that if the problem with the environment is such that each revocation attempt fails slowly, it is a user-unfriendly solution. That would address the "it is slow" part of the problem.

Then, the other part of the problem is to make it easier for Vault operators to discover what they need to, when a secrets engine disable fails. The easier option would be a more detailed, verbose error message. The nicer, more polished, option, would be an option for enabling forced lease revocation as a built-in part of the secrets engine disable operation, as per the original feature request here.

aphorise · 2023-06-06T14:55:33Z

... the revoke-force operation we have today attempts to revoke the lease, and then deletes it anyway if the revocation fails.

This is also an excellent point referring to the API: /sys/leases/revoke-force/:prefix or CLI: vault lease revoke -force -prefix .... Put simply perform a force revoke on the related mount (either by path, role, etc) and thereafter attempt to disable the mount which will be massively faster than the default behavior that's to attempt to try both of those for you with the last disable step being contingent on the first portion (revocations) succeeding.

maxb · 2023-06-06T15:06:53Z

This is also an excellent point referring to the API: /sys/leases/revoke-force/:prefix or CLI: vault lease revoke -force -prefix .... Put simply perform a force revoke on the related mount (either by path, role, etc) and thereafter attempt to disable the mount which will be massively faster than the default behavior that's to attempt to try both of those for you with the last disable step being contingent on the first portion (revocations) succeeding.

I feel like the intent of my comment may not have been understood. The point I was trying to make, is that the existing revoke-force operation is not a fully satisfactory solution, because it may take an extreme amount of time if each revocation to be processed fails slowly.

maxb · 2023-06-07T09:53:01Z

See recent conversation in #9420 for an example of a user who was blocked because of slowly-failing revocation attempts when using revoke-force. In that case, the resolution was to manipulate other Vault configuration to turn the slow failures into fast failures.

aphorise · 2023-06-08T14:50:16Z

My most common experience of this is with PKI certificates where consumer surpass 5 million or 10 million certs that they have no account of; neither having any sense of how long all those certs took to generate nor how far back the oldest may be. Separating by role or even different mounts for different TLD / sub-domains for example can help. While generally there are progression indicators when disabling mounts (with PKI on TRACE level) in my opinion it's not reasonable to expect a recursive process to eventually complete concurrent to other loads and activities are on-going at the same time.

Thinking aloud these sorts of clean-up and even export type activities may be better pursued by standby nodes (1 or 2 of out a set) or offline entirely via snapshots that a next leader could then negotiate on allowing for others to replicate accordingly.

maxb · 2023-06-08T15:44:33Z

I'm not really convinced that's applicable to this issue, though. The challenges of PKI are somewhat different to databases.

To my mind, this issue is really tracking two different things, neither of which are shared with the PKI secrets engine:

vault secrets disable on a database engine is quite prone to failing in a non-obvious way which requires an administrator to discover and execute some flavour of vault lease revoke command. It would be potentially nice to users to give them EITHER easier discovery of what they need to do OR even a new switch for the vault secrets disable command which just takes care of it.
Revoking database leases can involve reaching out to external services, so a single lease revocation can take an extremely long time - easily running in to Vault request processing timeouts. IMO, it would be nice to offer a new enhanced flavour of force-revoking leases, which totally skips even the attempt to execute backend specific destruction logic.

heatherezell · 2024-03-21T22:55:17Z

Hi folks, is this still an issue in recent versions of Vault? Can we clarify the current state of the issue so we can bubble it up as needed? Thanks!

raskchanky · 2024-10-01T18:01:35Z

Since it's been awhile with no response, I'm going to close this issue. Please reopen if there's more to add here.

catsby added bug Used to indicate a potential bug secret/database labels Nov 8, 2019

catsby added the version/0.10.x label Nov 18, 2019

michelvocks closed this as completed Dec 4, 2019

heatherezell reopened this Jan 21, 2022

heatherezell added community-sentiment Tracking high-profile issues from the community performance and removed bug Used to indicate a potential bug labels Jan 21, 2022

blairdrummond mentioned this issue Jan 21, 2022

Vault terraform shows errors for AAW-DEV-CC-00 CI StatCan/aaw#822

Closed

weitzjdevk mentioned this issue Feb 27, 2022

Cannot delete database secret engine, once a dynamic user/password was issued #14296

Closed

heatherezell added the waiting-for-response label Mar 21, 2024

raskchanky closed this as completed Oct 1, 2024

github-actions bot removed the waiting-for-response label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Force disable database secrets engine #5293

Feature Request: Force disable database secrets engine #5293

fraajad commented Sep 6, 2018

scallister commented Jul 26, 2019

michelvocks commented Dec 4, 2019

stepps commented Feb 19, 2020 •

edited

Loading

stepps commented Feb 20, 2020

blairdrummond commented Jan 20, 2022

heatherezell commented Jan 21, 2022

aphorise commented Sep 1, 2022 •

edited

Loading

AbdullahAlShaad commented Jun 6, 2023

maxb commented Jun 6, 2023

aphorise commented Jun 6, 2023 •

edited

Loading

maxb commented Jun 6, 2023

maxb commented Jun 7, 2023 •

edited

Loading

aphorise commented Jun 8, 2023

maxb commented Jun 8, 2023

heatherezell commented Mar 21, 2024

raskchanky commented Oct 1, 2024

Feature Request: Force disable database secrets engine #5293

Feature Request: Force disable database secrets engine #5293

Comments

fraajad commented Sep 6, 2018

scallister commented Jul 26, 2019

michelvocks commented Dec 4, 2019

stepps commented Feb 19, 2020 • edited Loading

stepps commented Feb 20, 2020

blairdrummond commented Jan 20, 2022

heatherezell commented Jan 21, 2022

aphorise commented Sep 1, 2022 • edited Loading

AbdullahAlShaad commented Jun 6, 2023

maxb commented Jun 6, 2023

aphorise commented Jun 6, 2023 • edited Loading

maxb commented Jun 6, 2023

maxb commented Jun 7, 2023 • edited Loading

aphorise commented Jun 8, 2023

maxb commented Jun 8, 2023

heatherezell commented Mar 21, 2024

raskchanky commented Oct 1, 2024

stepps commented Feb 19, 2020 •

edited

Loading

aphorise commented Sep 1, 2022 •

edited

Loading

aphorise commented Jun 6, 2023 •

edited

Loading

maxb commented Jun 7, 2023 •

edited

Loading