Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

aws loadbalancer service created on cluster built with tectonic only available in one az #786

Closed
rhenretta opened this issue May 19, 2017 · 7 comments
Assignees
Labels
Milestone

Comments

@rhenretta
Copy link

I had some back and forth with the people over in the kubernetes repo because I wasn't convinced that it was a tectonic issue, but apparently it is.

kubernetes/kubernetes#28586 (comment)

What is happening is when I create a service type loadbalancer on the tectonic managed cluster, it ends up creating the load balancer in only 1 AZ, so nodes in other AZs show up as out of service. I can manually add the AZs to the ELB, but that defeats the purpose.

I'm running version 1.6.2

Service is create as such:

apiVersion: v1
kind: Service
metadata:
  name: taggenerator-staging
  namespace: taggenerator
  annotations: 
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  selector:
    app: TagGenerator-Staging
  ports:
    - name: web-http
      protocol: TCP
      port: 80
      targetPort: 80
    - name: service-http
      protocol: TCP
      port: 8080
      targetPort: 8080

The relevant parts of the ELB are:

$ aws elb describe-load-balancers 
{
    "LoadBalancerDescriptions": [
        {
            "Subnets": [
                "subnet-94a3eca8"
            ], 
            "ListenerDescriptions": [
                {
                    "Listener": {
                        "InstancePort": 30177, 
                        "LoadBalancerPort": 8080, 
                        "Protocol": "TCP", 
                        "InstanceProtocol": "TCP"
                    }, 
                    "PolicyNames": []
                }, 
                {
                    "Listener": {
                        "InstancePort": 30019, 
                        "LoadBalancerPort": 80, 
                        "Protocol": "TCP", 
                        "InstanceProtocol": "TCP"
                    }, 
                    "PolicyNames": []
                }
            ], 
            "HealthCheck": {
                "HealthyThreshold": 2, 
                "Interval": 10, 
                "Target": "TCP:30019", 
                "Timeout": 5, 
                "UnhealthyThreshold": 6
            }, 
            "BackendServerDescriptions": [], 
            "Instances": [
                {
                    "InstanceId": "i-0b51b05e919ca20e4"
                }, 
                {
                    "InstanceId": "i-0b590a3b795678a35"
                }, 
                {
                    "InstanceId": "i-0fe6037ca0aa43279"
                }
            ], 
            "Policies": {
                "LBCookieStickinessPolicies": [], 
                "AppCookieStickinessPolicies": [], 
                "OtherPolicies": []
            }, 
            "AvailabilityZones": [
                "us-east-1e"
            ], 
            "Scheme": "internet-facing", 
        }
    ]
}

So, while there are 3 instances, only 2 are in service because the third is in another AZ. My cluster has 4 AZs that nodes can live on, but only us-east-1e is available. If all my nodes end up in different AZs, then the service will fail.

@robszumski
Copy link
Member

Hmm, very surprised that this is happening as Tectonic invokes the cloud provider with defaults, and I imagine that it makes sense to enable cross-AZ by default.

We will need to investigate further, thanks for reporting.

@robszumski
Copy link
Member

Seems like this might be related to these VPC tags:

    KubernetesCluster = "<cluster-name>"
    Name              = "eu-west-1a.<cluster-name>"

@robszumski
Copy link
Member

@s-urbaniak Did this break with the changes in #469 maybe?

@rhenretta
Copy link
Author

I believe this issue goes back quite a ways. I can't be 100% sure without creating a 1.5.6 cluster again, but I'm pretty sure I saw the issue in 1.5.6 as well, but didn't spend any time looking into it until now. I don't know if the person who opened the referenced kubernetes issue was running tectonic installer, but he opened his issue a year ago, and his output was exactly same as what I am seeing.

@zlangbert
Copy link

I have seen this issue in both Tectonic 1.5 and 1.6 when running in an existing VPC. Kubernetes seems to expect the subnets to be tagged for cluster ownership or it won't find them. I have been manually tagging the subnets as a workaround.

@rhenretta
Copy link
Author

rhenretta commented May 22, 2017

And, I have an existing VPC, so there you have it.
I added the following to the subnet tags:

{
    "Key": "kubernetes.io/cluster/tectonic",
    "Value": "owned"
}

and it is now working. Thanks for the workaround

@robszumski robszumski added this to the Sprint 3: Continued Test Automation milestone May 22, 2017
@s-urbaniak s-urbaniak self-assigned this Jun 29, 2017
@Quentin-M
Copy link
Contributor

Quentin-M commented Jun 29, 2017

I have just created a LoadBalancer service using the manifest you provided above, on Tectonic master (pretty close to v1.6.6-tectonic.1, that will be released soon). The associated ELB does spread across AZs as expected and points to all masters+workers.

Because I have masters and workers in two AZs, the ELB itself is also in two AZs.
I did not specify any extra tag.
Feel free to re-open if necessary.

{
    "LoadBalancerDescriptions": [
        {
            "Subnets": [
                "subnet-701a2914",
                "subnet-dac98782"
            ],
            "CanonicalHostedZoneNameID": "Z368ELLRRE2KJ0",
            "CanonicalHostedZoneName": "a1b00db5e5d1111e7866b02e60cabddf-586860273.us-west-1.elb.amazonaws.com",
            "ListenerDescriptions": [
                {
                    "Listener": {
                        "InstancePort": 32239,
                        "LoadBalancerPort": 80,
                        "Protocol": "TCP",
                        "InstanceProtocol": "TCP"
                    },
                    "PolicyNames": []
                },
                {
                    "Listener": {
                        "InstancePort": 30539,
                        "LoadBalancerPort": 8080,
                        "Protocol": "TCP",
                        "InstanceProtocol": "TCP"
                    },
                    "PolicyNames": []
                }
            ],
            "HealthCheck": {
                "HealthyThreshold": 2,
                "Interval": 10,
                "Target": "TCP:32239",
                "Timeout": 5,
                "UnhealthyThreshold": 6
            },
            "VPCId": "vpc-d14b2eb5",
            "BackendServerDescriptions": [],
            "Instances": [
                {
                    "InstanceId": "i-00fc91107a46bf895"
                },
                {
                    "InstanceId": "i-05280d6b69534837f"
                },
                {
                    "InstanceId": "i-05615c988aaa9c400"
                },
                {
                    "InstanceId": "i-0717003ceb46f0880"
                },
                {
                    "InstanceId": "i-0b0f8637fe3b0a5ea"
                }
            ],
            "DNSName": "a1b00db5e5d1111e7866b02e60cabddf-586860273.us-west-1.elb.amazonaws.com",
            "SecurityGroups": [
                "sg-7b455f1c"
            ],
            "Policies": {
                "LBCookieStickinessPolicies": [],
                "AppCookieStickinessPolicies": [],
                "OtherPolicies": []
            },
            "LoadBalancerName": "a1b00db5e5d1111e7866b02e60cabddf",
            "CreatedTime": "2017-06-29T21:22:51.800Z",
            "AvailabilityZones": [
                "us-west-1a",
                "us-west-1c"
            ],
            "Scheme": "internet-facing",
            "SourceSecurityGroup": {
                "OwnerAlias": "846518947292",
                "GroupName": "k8s-elb-a1b00db5e5d1111e7866b02e60cabddf"
            }
        }
    ]
}

@Quentin-M Quentin-M self-assigned this Jun 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants