Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubCtl with --service-discovery failed to join a clusterID, since kubefedctl gets wrong subdomain #194

Closed
manosnoam opened this issue Feb 23, 2020 · 7 comments
Assignees
Labels
bug Something isn't working lighthouse
Milestone

Comments

@manosnoam
Copy link

manosnoam commented Feb 23, 2020

SubCtl join (after deploying with --service-discovery) failed to join a clusterID "admin" (the default cluster context name when using OCP installer), since kubefedctl received ServiceAccount as admin- ("admin" with an added dash), which is not a qualified sub-domain:

$ cd /home/nmanos/go/src/github.com/submariner-io/submariner-operator
$ ./bin/subctl deploy-broker --service-discovery --clusterid admin

 • Deploying broker  ...
 ✓ Deploying broker
 • Creating broker-info.subm file  ...
 ✓ Creating broker-info.subm file
 ✓ A new IPSEC PSK will be generated for broker-info.subm
 • Deploying Service Discovery controller  ...
 ✓ Deploying Service Discovery controller
 ✓ Added Lighthouse entry in the openshift-dns role
 ✓ Disabled the cluster version operator
 ✓ Added Lighthouse entry in the openshift-dns-operator role
 ✓ Updated DNS operator deployment
 ✓ Restarted the DNS operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: kubefed-operator
 ✓ Created operator service account
 ✓ Deployed the operator successfully
 ✓ Created lighthouse CRDs
 ✓ Created lighthouse controller

$ kubefedctl federate namespace default --kubefed-namespace kubefed-operator
I0223 13:11:21.391968    1350 federate.go:451] Resource to federate is a namespace. Given namespace will itself be the container for the federated namespace
I0223 13:11:21.598223    1350 federate.go:480] Successfully created FederatedNamespace "default/default" from Namespace

$ export KUBECONFIG=/home/nmanos/automation/ocp-install/nmanos-cluster-a/auth/kubeconfig
$ oc get crds

clusters.submariner.io                                      2020-02-23T11:08:54Z
endpoints.submariner.io                                     2020-02-23T11:08:54Z
multiclusterservices.lighthouse.submariner.io               2020-02-23T11:10:36Z

$ oc config get-contexts
CURRENT   NAME    CLUSTER            AUTHINFO   NAMESPACE
*         admin   nmanos-cluster-a   admin      

$ ./bin/subctl join --clusterid admin ./broker-info.subm --ikeport 501 --nattport 4501
* ./broker-info.subm says broker is at: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
* There are 2 labeled nodes in the cluster:
  - ip-10-0-140-162.ec2.internal
  - ip-10-0-34-202.ec2.internal
 ✓ Deploying the Submariner operator 
 ✓ Deploying multi cluster service discovery 
 ✓ Updated existing Lighthouse entry in the openshift-dns role
 ✓ Updated existing Lighthouse entry in the openshift-dns-operator role
 ✓ Updated DNS operator deployment
 ✓ Restarted the DNS operator
 ✓ Created lighthouse CRDs
 ✗ Joining to Kubefed control plane 

Error joining to Kubefed control plane: kubefedctl join failed: exit status 255
F0223 13:37:39.021430    3532 join.go:126] Error: ServiceAccount "admin-" is invalid:
metadata.name: Invalid value: "admin-": 
a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

subctl version: dev
built from git commit id: 8c619fe06f3213c456093e10e48b967863e4d5a2

@manosnoam manosnoam added the bug Something isn't working label Feb 23, 2020
@manosnoam manosnoam changed the title OCP default cluster context "admin" cannot be used as clusterID SubCtl with --service-discovery failed to join a clusterID, since kubefedctl gets wrong subdomain Feb 23, 2020
@skitt
Copy link
Member

skitt commented Feb 24, 2020

This happens when the host cluster name is empty.

@mangelajo mangelajo added this to the v0.1.1 milestone Feb 24, 2020
@manosnoam
Copy link
Author

How can one set hostClusterName - is it in kubeconfig ?

server: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
  name: nmanos-cluster-a
contexts:
- context:
    cluster: nmanos-cluster-a
    user: admin
  name: admin
current-context: admin
kind: Config
preferences: {}
users:
- name: admin
  user:

Also note, that we can't use the latest kubefedctl - #192

@aswinsuryan
Copy link
Contributor

@manosnoam Did this work after passing the broker context?

@manosnoam
Copy link
Author

manosnoam commented Mar 1, 2020

@aswinsuryan
It does not work if the current KUBECONFIG points to one Cluster (e.g. Cluster with context "nmanos-cluster-b"), while the broker-cluster-context value is different (e.g. "nmanos-cluster-a"), since it's not the same KUBECONFIG:

# Switch to Cluster B
$ export KUBECONFIG=/home/nmanos/automation/ocp-install/ocpup/.config/cl1/auth/kubeconfig

$ ./bin/subctl join --clusterid nmanos-cluster-b --broker-cluster-context nmanos-cluster-a ./broker-info.subm

* ./broker-info.subm says broker is at: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
* There are 2 labeled nodes in the cluster:
  - nmanos-cl1-7jtkw-worker-46swj
  - nmanos-cl1-7jtkw-worker-wvn5d
 • Deploying the Submariner operator  ...
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Deployed the operator successfully
 • Deploying multi cluster service discovery  ...
 ✓ Deploying multi cluster service discovery
 ✓ Updated existing Lighthouse entry in the openshift-dns role
 ✓ Updated existing Lighthouse entry in the openshift-dns-operator role
 ✓ Updated DNS operator deployment
 ✓ Restarted the DNS operator
 ✓ Created lighthouse CRDs
 • Joining to Kubefed control plane  ...
 ✗ Joining to Kubefed control plane
Error joining to Kubefed control plane: kubefedctl join failed: exit status 255
F0301 15:20:55.200872   27121 join.go:126] Error: context "nmanos-cluster-a" does not exist

Only when the clusterid is the same as the broker-cluster-context (i.e. same kubeconfig), then it works.

@aswinsuryan
Copy link
Contributor

@aswinsuryan
It does not work if the current KUBECONFIG points to one Cluster (e.g. Cluster with context "nmanos-cluster-b"), while the broker-cluster-context value is different (e.g. "nmanos-cluster-a"), since it's not the same KUBECONFIG:

# Switch to Cluster B
$ export KUBECONFIG=/home/nmanos/automation/ocp-install/ocpup/.config/cl1/auth/kubeconfig

$ ./bin/subctl join --clusterid nmanos-cluster-b --broker-cluster-context nmanos-cluster-a ./broker-info.subm

* ./broker-info.subm says broker is at: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
* There are 2 labeled nodes in the cluster:
  - nmanos-cl1-7jtkw-worker-46swj
  - nmanos-cl1-7jtkw-worker-wvn5d
 • Deploying the Submariner operator  ...
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Deployed the operator successfully
 • Deploying multi cluster service discovery  ...
 ✓ Deploying multi cluster service discovery
 ✓ Updated existing Lighthouse entry in the openshift-dns role
 ✓ Updated existing Lighthouse entry in the openshift-dns-operator role
 ✓ Updated DNS operator deployment
 ✓ Restarted the DNS operator
 ✓ Created lighthouse CRDs
 • Joining to Kubefed control plane  ...
 ✗ Joining to Kubefed control plane
Error joining to Kubefed control plane: kubefedctl join failed: exit status 255
F0301 15:20:55.200872   27121 join.go:126] Error: context "nmanos-cluster-a" does not exist

Only when the clusterid is the same as the broker-cluster-context (i.e. same kubeconfig), then it works.
Now the deploy broker is able to take the context as an argument.

The README is updated with this change.
#209

@aswinsuryan aswinsuryan self-assigned this Mar 2, 2020
@mangelajo
Copy link
Contributor

@aswinsuryan
It does not work if the current KUBECONFIG points to one Cluster (e.g. Cluster with context "nmanos-cluster-b"), while the broker-cluster-context value is different (e.g. "nmanos-cluster-a"), since it's not the same KUBECONFIG:

# Switch to Cluster B
$ export KUBECONFIG=/home/nmanos/automation/ocp-install/ocpup/.config/cl1/auth/kubeconfig

$ ./bin/subctl join --clusterid nmanos-cluster-b --broker-cluster-context nmanos-cluster-a ./broker-info.subm

* ./broker-info.subm says broker is at: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
* There are 2 labeled nodes in the cluster:
  - nmanos-cl1-7jtkw-worker-46swj
  - nmanos-cl1-7jtkw-worker-wvn5d
 • Deploying the Submariner operator  ...
 ✓ Deploying the Submariner operator
 ✓ Created operator CRDs
 ✓ Created operator namespace: submariner-operator
 ✓ Created operator service account and role
 ✓ Deployed the operator successfully
 • Deploying multi cluster service discovery  ...
 ✓ Deploying multi cluster service discovery
 ✓ Updated existing Lighthouse entry in the openshift-dns role
 ✓ Updated existing Lighthouse entry in the openshift-dns-operator role
 ✓ Updated DNS operator deployment
 ✓ Restarted the DNS operator
 ✓ Created lighthouse CRDs
 • Joining to Kubefed control plane  ...
 ✗ Joining to Kubefed control plane
Error joining to Kubefed control plane: kubefedctl join failed: exit status 255
F0301 15:20:55.200872   27121 join.go:126] Error: context "nmanos-cluster-a" does not exist

Only when the clusterid is the same as the broker-cluster-context (i.e. same kubeconfig), then it works.
Now the deploy broker is able to take the context as an argument.

The README is updated with this change.
#209

We also need to make sure this parameter is set where necessary when deploying the cluster, or otherwise offer the user a meaningful error.

@aswinsuryan can we add a patch for that?

aswinsuryan added a commit to aswinsuryan/submariner-operator that referenced this issue Mar 4, 2020
Added chcek for broker-cluster-context parameter

Signed-off-by: Aswin Surayanarayanan <[email protected]>
aswinsuryan added a commit to aswinsuryan/submariner-operator that referenced this issue Mar 4, 2020
Added check for broker-cluster-context parameter

Signed-off-by: Aswin Surayanarayanan <[email protected]>
@aswinsuryan
Copy link
Contributor

aswinsuryan commented Mar 4, 2020

Added the check -> #228

aswinsuryan added a commit to aswinsuryan/submariner-operator that referenced this issue Mar 4, 2020
Added check for broker-cluster-context parameter

Signed-off-by: Aswin Surayanarayanan <[email protected]>
aswinsuryan added a commit to aswinsuryan/submariner-operator that referenced this issue Mar 5, 2020
Added check for broker-cluster-context parameter

Signed-off-by: Aswin Surayanarayanan <[email protected]>
aswinsuryan added a commit to aswinsuryan/submariner-operator that referenced this issue Mar 5, 2020
Added check for broker-cluster-context parameter

Signed-off-by: Aswin Surayanarayanan <[email protected]>
mangelajo pushed a commit that referenced this issue Mar 5, 2020
Added check for broker-cluster-context parameter

Signed-off-by: Aswin Surayanarayanan <[email protected]>
(cherry picked from commit fa79517)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lighthouse
Projects
None yet
Development

No branches or pull requests

4 participants