Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Issue reconciling node routes with self hosted canal #1336

Closed
cknowles opened this issue May 27, 2018 · 7 comments
Closed

Issue reconciling node routes with self hosted canal #1336

cknowles opened this issue May 27, 2018 · 7 comments

Comments

@cknowles
Copy link
Contributor

cknowles commented May 27, 2018

I've been trying to use self hosted canal enabled by default using version 9c735f9. k8s v1.9.7. It seems now I'm getting these logs from the controller manager although the cluster itself appears to be working still.

kube-controller-manager-ip-10-0-22-230.eu-west-1.compute.internal kube-controller-manager E0527 10:17:37.615993 1 route_controller.go:116] Couldn't reconcile node routes: error listing routes: found multiple matching AWS route tables for AWS cluster: kubernetes

The only changes made between this and previous cluster are switching self hosting on plus the separate etcd stack that comes with the latest version. It's an entirely new cluster and I've deleted the previous version of the cluster.

We do have another cluster in the same VPC but this has never been a problem before.

I had a look through every route table and subnet, all seem tagged appropriately. I was looking for potentially related issues and found kubernetes/kubernetes#12449 (comment).

Anyone else having similar issues?

@cknowles
Copy link
Contributor Author

Just trying out the --cluster-name flag for controller manager, it seems we've never set it in kube-aws.

@cknowles
Copy link
Contributor Author

@davidmccormick, I wondered if you had any ideas about this or seeing similar in your clusters? It appears the cluster still works which is odd but I'm concerned running another cluster in the same VPC will break them. The cluster name flag above did not fix it. Our other cluster in the same VPC appears to be working fine and has a similar setup. The differences are k8s 1.7.x to 1.9.x and the self hosted network enable. I'm not sure if the --allocate-node-cidrs and --cluster-cidr flags of controller manager could affect this, I'm still digging into the controller manager code.

@davidmccormick
Copy link
Contributor

Hi, sorry for the slow reply! Yes I see these messages on my clusters too and have also seen it before and thought it benign. What I think is happening is that the AWS cloud plugin is trying to update the routing tables - which is only required for the AWS native networking backend which we are not using because we are using flannel (the AWS backend is limited by routing table entries). Previously there was no way to turn this off, I don’t know if that has now changed in the AWS plugin. I’m out of the office this week but happy to have more of a look next week.

@cknowles
Copy link
Contributor Author

cknowles commented May 28, 2018

@davidmccormick no worries, thanks for the confirmation. I was thinking it might be benign also considering that the cluster still appears to work. I am wondering though why flannel on the 1.7.3 cluster in the same VPC works without these errors, kube-aws is generating the subnets and route tables in both cases. I was checking upstream and the code appears to be the same for controller manager. So best I can determine right now is this is introduced when I switched on the self hosted network just can't see why that would be.

@davidmccormick
Copy link
Contributor

Without looking at the code I would guess as you did that the change is in being asked to allocate node cidrs that it wasn’t doing under the legacy flannel install.

@cknowles
Copy link
Contributor Author

Seems like the controller manager flag --configure-cloud-routes and kubernetes/kubernetes#25602 are relevant. I just set this flag to false on the dev cluster in question and the message has disappeared plus the cluster appears to all be functionality still. My understanding is not deep enough to say whether we should be setting that flag to false or not for self hosted network.

@davidmccormick
Copy link
Contributor

davidmccormick commented May 28, 2018 via email

cknowles pushed a commit to cknowles/kube-aws that referenced this issue May 28, 2018
Fixes kubernetes-retired#1336 by adding `--configure-cloud-routes=false` when `--allocate-node-cidrs=true`

Added `--cluster-name` to ensure it's set correctly on controller manager.

Grouped mandatory and optional flags together.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants