-
Notifications
You must be signed in to change notification settings - Fork 294
Migrating from kube-aws 0.14 to 0.15 issues #1837
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I think this is important enough to |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@andersosthus and I have test migrated a few clusters from kube-aws 0.14.3 to 0.15.1/0.15.2, and we have discovered a few issues
v0.15.x branch: Make cloud-controller-manager an experimental feature #1832 introduced a problem with the cloud-controller-manager v0.15.1 - Cloud-Controller-Manager - configure-cloud-routes is not disabled #1833. The issue was fixed in Added back in --configure-cloud-routes=false #1834 and included in the kube-aws 0.15.2 release, thanks @davidmccormick
Using
etcd.memberIdentityProvider: eni
introduces a problem when cleaning up the etcd stack after migration, because the control-plane stack imports the Etcd0PrivateIP, Etcd1PrivateIP and Etcd2PrivateIP exports. These exports are not part of the rendered etcd CloudFormation stacks in 0.15. A temporary workaround is to edit the etcd.json.tmpl after dong a render stack, to temporary add in the missing exports. This will make sure that that update can continue, and then the added values can be removed and a new updated issued.export-existing-etcd-state.service
responsible for exporting from the old etcd cluster and preparing the export files on disk in /var/run/coreos/etcdadm/snapshots takes so long that the CF-stack might do a rollback. Even on a close to "empty" cluster the migration can take many minutes. Migrating a very small cluster with only a few resource took 45 minutes.The etcd stack rollback timeout is based on the CreateTimeout of the controller https://github.com/kubernetes-incubator/kube-aws/blob/b34d9b69069321111d3ca3e24c53fdba8ccecd2c/builtin/files/stack-templates/etcd.json.tmpl#L365, which is a little confusing. You will actually have to increase the
controller.createTimeout
to increase the etcd wait time.CloudFormation does not allow for more then 60 minutes wait time, so I fear that the etcd migration process will not work for lager clusters.
Using a WaitCondition https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-waitcondition.html that receives a heartbeat signal from the migration script might be a functional approach.
The text was updated successfully, but these errors were encountered: