Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backups fail if PVCs are in Lost status #225

Closed
blakebarnett opened this issue Nov 29, 2017 · 9 comments
Closed

Backups fail if PVCs are in Lost status #225

blakebarnett opened this issue Nov 29, 2017 · 9 comments
Assignees
Labels
Milestone

Comments

@blakebarnett
Copy link
Contributor

We had someone "clean up" some EBS volumes without removing the PVC in Kubernetes, this makes backups fail with:

ark-1204325235-v0qkl ark time="2017-11-29T00:00:17Z" level=error msg="backup failed" error="[persistentvolumes \"pvc-fc7e9262-6046-11e7-bb8b-06ad180bac7f\" not found, persistentvolumes \"pvc-0954630d-6047-11e7-bb8b-06ad180bac7f\" not found, persistentvolumes \"pvc-1665da49-6047-11e7-bb8b-06ad180bac7f\" not found, persistentvolumes \"pvc-2890bfd6-6047-11e7-bb8b-06ad180bac7f\" not found]" key=heptio-ark/midnight-daily-20171129000002 logSource="/go/src/github.com/heptio/ark/pkg/controller/backup_controller.go:258"

It seems like it should be non-fatal?

@ncdc
Copy link
Contributor

ncdc commented Nov 30, 2017

@skriss I'm guessing the flow here is the PVC action is adding the PV to the list of additional items, and then Ark can't find it. Do you think we should consider doing warnings & errors for backups, like we have for restores?

@whereismyjetpack
Copy link

whereismyjetpack commented Mar 22, 2018

Hi! looks like I, too, suffer form this issue. this PVC was stuck in pending status

time="2018-03-22T17:00:25Z" level=error msg="error executing custom action" backup=heptio-ark/hourly-20180322170024 error="rpc error: code = Unavailable desc = transport is closing" group=v1 groupResource=persistentvolumeclaims logSource="pkg/backup/item_backupper.go:220" name=microblog-develop-database-files namespace=default
time="2018-03-22T17:00:26Z" level=info msg="Backup completed with errors: error executing custom action (groupResource=persistentvolumeclaims, namespace=default, name=microblog-develop-database-files): rpc error: code = Unavailable desc = transport is closing" backup=heptio-ark/hourly-20180322170024 logSource="pkg/backup/backup.go:270"

@ncdc
Copy link
Contributor

ncdc commented May 8, 2018

@skriss let's double check if it's still possible to delete a PV that has a PVC bound to it.

@skriss
Copy link
Contributor

skriss commented May 8, 2018

The "storage object protection" feature which will prevent PVs/PVCs that are in use from being deleted is currently in beta and will be moved to GA a/o 1.11 -- see kubernetes/kubernetes#62870.

@ncdc
Copy link
Contributor

ncdc commented May 10, 2018

The feature gate is on by default in 1.10. Let's do a manual test to make sure we can't get in this state any more. If so, I recommend we close the issue.

@ncdc ncdc added this to the v1.0.0 milestone Jun 26, 2018
@ncdc
Copy link
Contributor

ncdc commented Jun 26, 2018

If a backup tries to process an item and it fails, that should not mark the backup as Failed (related: #286, #305). We can consider this issue resolved when we've dealt with #286 (comment).

@skriss
Copy link
Contributor

skriss commented Oct 10, 2018

Given that we've had numerous people run into this - what do you all think about modifying pkg/backup/backup_pv_action.go to only return a PV to back up if the PVC's status is Bound?

cc @ncdc @wwitzel3 @carlisia @rosskukulinski @nrb

@rosskukulinski
Copy link
Contributor

I think that's probably fine. What are the conditions/situations where a PVC status will not be Bound?

@ncdc
Copy link
Contributor

ncdc commented Oct 10, 2018

Typo in PV name in the PVC, somebody manually deleted the PV but left the PVC - stuff like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants