Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ark Backup create fails on Azure 1.7.9 cluster #674

Closed
sjdweb opened this issue Jul 13, 2018 · 7 comments
Closed

Ark Backup create fails on Azure 1.7.9 cluster #674

sjdweb opened this issue Jul 13, 2018 · 7 comments

Comments

@sjdweb
Copy link

sjdweb commented Jul 13, 2018

What steps did you take and what happened:
Installed latest Ark on an Azure ACS 1.7.9 cluster to migrate to AKS.

Followed the Azure guide to install Ark components and cloud credentials which has worked for us 4-5 times prior without problems.

Command being run:

$ ark backup create shiftstaging2v10 --include-namespaces default,shift-resourcing --selector 'app notin (kube-lego,nginx-ingress,socketio-redis-redis,logdna-agent)'

When checking status of the backup, we see

$ ark backup describe shiftstaging2v10
Name:         shiftstaging2v10
Namespace:    heptio-ark
Labels:       <none>
Annotations:  <none>

Phase:  Failed

Namespaces:
  Included:  default, shift-resourcing
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app notin (kube-lego,logdna-agent,nginx-ingress,socketio-redis-redis)

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2018-07-13 14:02:34 +0100 BST
Completed:  <n/a>

Expiration:  2018-08-12 14:02:34 +0100 BST

Validation errors:  <none>

Persistent Volumes: <none included>

In the logs for the Ark pod (with --log-level set to debug):

time="2018-07-13T13:02:34Z" level=debug msg="Running processBackup" key=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:210"
time="2018-07-13T13:02:34Z" level=debug msg="Getting backup" key=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:216"
time="2018-07-13T13:02:34Z" level=debug msg="Cloning backup" key=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:237"
time="2018-07-13T13:02:34Z" level=debug msg="Backup has not expired yet, skipping" backup=heptio-ark/shiftstaging2v10 expiration="2018-08-12 13:02:34 +0000 UTC" logSource="pkg/controller/gc_controller.go:129"
time="2018-07-13T13:02:34Z" level=debug msg="Running backup" key=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:274"
time="2018-07-13T13:02:34Z" level=info msg="Starting backup" backup=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:339"
time="2018-07-13T13:02:34Z" level=debug msg="starting plugin" args="[/ark run-plugin backupitemaction pv]" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:34Z" level=debug msg="waiting for RPC address" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="plugin address" address=/tmp/plugin036824749 logSource="pkg/plugin/logrus_adapter.go:74" network=unix pluginName=ark
time="2018-07-13T13:02:35Z" level=debug msg="starting plugin" args="[/ark run-plugin backupitemaction pod]" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="waiting for RPC address" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="starting plugin" args="[/ark run-plugin backupitemaction serviceaccount]" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="plugin address" address=/tmp/plugin913366131 logSource="pkg/plugin/logrus_adapter.go:74" network=unix pluginName=ark
time="2018-07-13T13:02:35Z" level=debug msg="waiting for RPC address" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="An error occurred: the server could not find the requested resource" logSource="pkg/plugin/logrus_adapter.go:74" pluginName=ark
time="2018-07-13T13:02:35Z" level=debug msg="plugin process exited" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="plugin process exited" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=debug msg="plugin process exited" logSource="pkg/plugin/logrus_adapter.go:74" path=/ark
time="2018-07-13T13:02:35Z" level=error msg="backup failed" error="plugin exited before we could connect" error.file="manager.go:171" error.function=getPluginInstance key=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:280"
time="2018-07-13T13:02:35Z" level=debug msg="Updating backup's final status" key=heptio-ark/shiftstaging2v10 logSource="pkg/controller/backup_controller.go:287"
time="2018-07-13T13:02:35Z" level=debug msg="Backup has not expired yet, skipping" backup=heptio-ark/shiftstaging2v10 expiration="2018-08-12 13:02:34 +0000 UTC" logSource="pkg/controller/gc_controller.go:129"

What did you expect to happen:
ark backup create should succeed.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Ark version (use ark version):
    v0.9.0 (tried also on 0.8.0)
  • Kubernetes version (use kubectl version):
    1.7.9
  • Kubernetes installer & version:
    Azure ACS hosted
  • Cloud provider or hardware configuration:
    Azure ACS
  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="16.04.3 LTS (Xenial Xerus)"
@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

Hi @sjdweb, thanks for the report, and for providing the commands you ran. This may be somewhat related to #660.

v0.7.1 may be an option, at least to create the backups. Then you could restore on the AKS cluster with v0.9.0.

@sjdweb
Copy link
Author

sjdweb commented Jul 13, 2018

Hi @nrb - thanks for the prompt response!

Should I update the deployment manifests of Ark to use the 0.7.1 images?

@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

@sjdweb Yeah, that should work. Did you try v0.8.3 on the deployment manifests too, or on the client side? We've seen 0.8.3 resolve problems with 1.7.x.

@sjdweb
Copy link
Author

sjdweb commented Jul 13, 2018

Just tried v0.8.3 and the backup completed! Thanks.

v0.7.1 is quite a step backwards with the separate namespaces - so would advise v0.8.3.

@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

Yes, it is. I suggested it because you mentioned v0.8.0 didn't work, but I think you were just using the client, correct?

@sjdweb
Copy link
Author

sjdweb commented Jul 13, 2018

Yes - as the deployment manifest uses the latest image it would have been the 0.8.0 client and 0.9.0 server.

@nrb
Copy link
Contributor

nrb commented Jul 13, 2018

I'll go ahead and close this issue. We can track a better resolution in #660.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants